Disk groups and workflow groups
  • 22 Feb 2022
  • 1 Minute to read
  • Dark
  • PDF

Disk groups and workflow groups

  • Dark
  • PDF

Article summary

A disk group is a group of disks that logically represent a single data set. As an example, in the auto domain the data collected by a vehicle over a week may span across 10-20 disks capturing data from 5-10 different sensors.

  1. Each sensor data may have a different data layout and hence different workflow to ingest and process.
  2. There could be inter sensor dependencies to be satisfied in terms of the order of processing.
  3. The aggregate data set across disks is large and hence practically requires the processing to scale across multiple Akrinodes.

The above use case is addressed by Akrimanager using the workflow group construct. A workflow group is a DAG of workflows that can execute across multiple Akrinode clusters.

The above diagram is a representative workflow group for a disk group of 3 disks with data collected from 2 different sensors.

  1. Each disk has an independent ingest stage that copies data from disk into a shared storage location on-prem or cloud.
  2. Each disk has an independent process stage that does some processing on the ingested data. e.g. run an object detection inference model on ingested data
  3. Sensor1 has an additional processing stage that needs to look at all disks with sensor1 data as an aggregate unit. e.g. count the number of objects detected of each type in the entire disk group.
  4. A fused sensor workflow has some processing that needs data from all sensors. e.g. identify situations where a camera detected an object but LIDAR sensor data did not find any object.

The above dependencies are captured as a workflow group specification.

  1. The workflow group specification is structured such that it provides full flexibility to support any number of disks, presence/absence of a sensor data in a particular disk group, any number of clusters etc.
  2. The Akrimanager software ensures
    1. workflows are executed with maximum parallelism while adhering to the order dependencies specified in the DAG.
    2. Automatic retries for error recovery and completely unattended processing.
    3. Monitoring and statistics(like amount of data ingested) on Akrimanager UI
    4. Alerting through emails for rare situations where manual intervention is required.

A Workflow group is a generic construct to capture a DAG of workflows. It can be triggered through Akrimanager UI for situations where disk data and physical insertion of disk into JBOD devices is not involved.

Was this article helpful?

What's Next