Pipelines and Dockers
- 15 Feb 2023
- 1 Minute to read
- Print
- DarkLight
- PDF
Pipelines and Dockers
- Updated on 15 Feb 2023
- 1 Minute to read
- Print
- DarkLight
- PDF
Article summary
Did you find this summary helpful?
Thank you for your feedback
Data registration occurs through the execution of a data ingestion pipeline. A data processing pipeline consists of the following modules.
- Image Preprocessor: Any processing on the image/frame before feeding to the featurizer.
- Featurizer: Featurizes each image/frame, typically using a deep neural network(DNN).
- Thumbnail generator: Generates a compact representation of the image/frame that will be displayed on the Data Explorer UI.
- Attribute generator: Generates a CSV file with attributes for each image/frame, and these attributes will be ingested into the Data Explorer catalog. There can be more than one attribute generator attached to a pipeline.
Customizing pipelines
The custom pipelines allow the construction of a pipeline with one or more of the above modules provided by the user in the form of a docker image that complies with the Interface specification expected by each type of module. Some use cases that can be achieved with this support are as below:
- Bring your own featurizer(BYOF) that is tuned for your dataset.
- Domain-specific pre-processors that improve the behaviour of the featurizer.
- Custom logic to extract attributes from each image/frame. For e.g., a lens dirt detection program can be packaged as an attribute generator docker, and the result of this program will be stored by Data explorer as an attribute against each frame/image that is available for querying and joining with other internal and external catalog tables.
The section below provides a brief overview of the entities involved in defining pipelines, with later articles describing the details.
- Docker repositories: A docker repository hosted at Dockerhub or AWS ECR(Elastic Container Registry) that is owned and managed by your organization. Only OrganizationAdmin role users are allowed to create docker repositories.
- Docker Images: A docker image that provides a preprocessor, featurizer, thumbnail generator or attribute generator functionality. Data Explorer comes with a few Pre-registered docker images out-of-the-box that is sufficient for most general use cases.
- Pipelines: A pipeline is a directed acyclic graph(DAG) of docker images present in docker repositories. Data Explorer provides a few Pre-registered pipelines for both video and image data types out-of-the-box that is sufficient for most general-purpose visual datasets.
Was this article helpful?