Pipelines and Dockers
  • 15 Feb 2023
  • 1 Minute to read
  • Dark
    Light
  • PDF

Pipelines and Dockers

  • Dark
    Light
  • PDF

Article Summary

Data registration occurs through the execution of a data ingestion pipeline. A data processing pipeline consists of the following modules.

  1. Image Preprocessor: Any processing on the image/frame before feeding to the featurizer.
  2. Featurizer: Featurizes each image/frame, typically using a deep neural network(DNN).
  3. Thumbnail generator: Generates a compact representation of the image/frame that will be displayed on the Data Explorer UI.
  4. Attribute generator: Generates a CSV file with attributes for each image/frame, and these attributes will be ingested into the Data Explorer catalog. There can be more than one attribute generator attached to a pipeline.

Customizing pipelines

The custom pipelines allow the construction of a pipeline with one or more of the above modules provided by the user in the form of a docker image that complies with the Interface specification expected by each type of module. Some use cases that can be achieved with this support are as below:

  1. Bring your own featurizer(BYOF) that is tuned for your dataset.
  2. Domain-specific pre-processors that improve the behaviour of the featurizer.
  3. Custom logic to extract attributes from each image/frame. For e.g., a lens dirt detection program can be packaged as an attribute generator docker, and the result of this program will be stored by Data explorer as an attribute against each frame/image that is available for querying and joining with other internal and external catalog tables.

The section below provides a brief overview of the entities involved in defining pipelines, with later articles describing the details.

  1. Docker repositories: A docker repository hosted at Dockerhub or AWS ECR(Elastic Container Registry) that is owned and managed by your organization. Only OrganizationAdmin role users are allowed to create docker repositories.
  2. Docker Images: A docker image that provides a preprocessor, featurizer, thumbnail generator or attribute generator functionality. Data Explorer comes with a few Pre-registered docker images out-of-the-box that is sufficient for most general use cases.
  3. Pipelines: A pipeline is a directed acyclic graph(DAG) of docker images present in docker repositories. Data Explorer provides a few Pre-registered pipelines for both video and image data types out-of-the-box that is sufficient for most general-purpose visual datasets. 

Was this article helpful?