Akridata Data Explorer consists of two major components
A web portal accessible through a web browser.
A command line tool named adectl and set of feature extraction docker images. This software is installed on a user provisioned Linux or Mac machine that will read data from one of the supported cloud data stores or local file system, extract features and upload these features to the web portal. Additionally, adectl supports ingesting externally generated features and catalog information provided through comma separated value (CSV) files.
Your data and catalog information stay where they are
Only extracted features and thumbnails are uploaded to web portal. Full data and catalog are not copied.
Local mode setup
For evaluation purposes, the software running on the web portal can be installed on the user provisioned Linux/Mac machine where adectl is installed. This setup supports limited scale and capabilities while ensuring that extracted features and thumbnails stay within the user provisioned machine.
Self-hosting
The software components powering the web portal are deployed in a Kubernetes cluster and hence architecturally friendly to be self hosted on any cloud/on-prem Kubernetes cluster. Please contact us for more details.
User Roles
The following user roles are available
Organization Admin: This role is a super user who has access to all capabilities on the web portal. This role has the following capabilities not available without this role.
Registering secrets (credentials) to access data and catalog stores.
Registering data stores (Container) andExternal catalog by providing necessary URL and secrets.
An entity that represents a data store hosted on cloud(AWS S3, Azure blob store, Google Cloud Storage) or a local file system.
An entity that represents a data store hosted on cloud(AWS S3, Azure blob store, Google Cloud Storage) or a local file system.
A user hosted relational database that is used to augment the catalog information generated by Akridata feature extraction process. The contents of this catalog are not imported but fetched on a need basis within an authenticated user session.
An entity that represents a data store hosted on cloud(AWS S3, Azure blob store, Google Cloud Storage) or a local file system.
A user hosted relational database that is used to augment the catalog information generated by Akridata feature extraction process. The contents of this catalog are not imported but fetched on a need basis within an authenticated user session.
An entity that represents a data store hosted on cloud(AWS S3, Azure blob store, Google Cloud Storage) or a local file system.
A user hosted relational database that is used to augment the catalog information generated by Akridata feature extraction process. The contents of this catalog are not imported but fetched on a need basis within an authenticated user session.
An entity that represents a data store hosted on cloud(AWS S3, Azure blob store, Google Cloud Storage) or a local file system.
A user hosted relational database that is used to augment the catalog information generated by Akridata feature extraction process. The contents of this catalog are not imported but fetched on a need basis within an authenticated user session.
An entity that represents a data store hosted on cloud(AWS S3, Azure blob store, Google Cloud Storage) or a local file system.
A user hosted relational database that is used to augment the catalog information generated by Akridata feature extraction process. The contents of this catalog are not imported but fetched on a need basis within an authenticated user session.
User and group management.
User: This role maps to a data engineer/data scientist persona who is responsible for
Defining datasets - Specify Dataset and the choice of pre-processing and featurization to be run on the objects in the dataset.
A dataset is an entity that specifies a selector on the contents of the container. A dataset can be of Image or Video type. The selector is a glob pattern like *.png
A dataset is an entity that specifies a selector on the contents of the container. A dataset can be of Image or Video type. The selector is a glob pattern like *.png
A dataset is an entity that specifies a selector on the contents of the container. A dataset can be of Image or Video type. The selector is a glob pattern like *.png
A dataset is an entity that specifies a selector on the contents of the container. A dataset can be of Image or Video type. The selector is a glob pattern like *.png
Creating different types of data analysis jobs to curate data objects and create aResultset
A resultset is a list of curated objects that have been curated using the explore, refine, analyze and compare capabilities.
A resultset is a list of curated objects that have been curated using the explore, refine, analyze and compare capabilities.
.
A resultset is a list of curated objects that have been curated using the explore, refine, analyze and compare capabilities.
Finance Admin: This role maps to finance team person responsible for keeping track of invoices and payments.
Terminology
Container
A container describes a storage location from where data is ingested into the system. A container can be a S3 bucket, Azure blob store, Google Cloud Store or a directory on the local file system. The container is registered through the web UI with user providing the details like the end point URL, credentials etc.
Local Container
If data to be ingested is present on the local file system, then explicit container creation is not required.
Dataset
A dataset is an entity that specifies a selector on the contents of the container. A dataset can be of Image or Video type.
For example, an S3 bucket has two directories CAMERA-FRONT and CAMERA-BACK with images from front and back cameras respectively and each of these camera images have different feature extraction model that is most appropriate. For such a case, you can define two datasets with glob pattern CAMERA-FRONT/**/*.jpg and CAMERA-BACK/**/*.jpg respectively to logically group the images from two cameras.
Pipelines
A pipeline is an abstraction that captures the ingest processing routines. A pipeline typically has feature extraction, thumbnail generation and feature summarization stages.
The pipeline is triggered through the following command.
adectl run -n <dataset-name> -i <directory-with-input-objects>
The above processing registers features and produces catalog. The catalog is accessed using the 'Catalog' button on the dataset card as shown below.
dataset-catalog-access
Coming Soon - Pipeline customization
The default pipeline may not be best fit for data across all domains. The pipeline abstraction will be extended to support user provided featurizer in upcoming releases.
Data Visualization Job
Once data is ingested, a data visualization job can be created by browsing the catalog through the 'Catalog' button in the dataset card as shown in the previous section.
Catalog browsing to create a data visualization job
Visualization job before submission
Visualization Job
Resultset
Data visualization UI provides capabilities to explore, drill down and curate the data using cluster views, nearest neighbour searches and similarity searches. The curated subset of the data objects is referred to as a resultset. A resultset can be downloaded to a local directory or exported to a S3 bucket, Azure blob store, Google Cloud Store for downstream processing like labelling or machine learning training.
Clustering and Embedding
When a job is submitted, the low dimensional representations and coresets are used to cluster the data objects to enable exploration and curation.
An entity that represents a data store hosted on cloud(AWS S3, Azure blob store, Google Cloud Storage) or a local file system.
A user hosted relational database that is used to augment the catalog information generated by Akridata feature extraction process. The contents of this catalog are not imported but fetched on a need basis within an authenticated user session.
A dataset is an entity that specifies a selector on the contents of the container. A dataset can be of Image or Video type. The selector is a glob pattern like *.png
A resultset is a list of curated objects that have been curated using the explore, refine, analyze and compare capabilities.
Was this article helpful?
Thank you for your feedback! Our team will get back to you