- 22 Feb 2022
- 1 Minute to read
- Updated on 22 Feb 2022
- 1 Minute to read
The diagram below provides a conceptual representation of a workflow. A workflow is a user provided specification of a data pipeline, which calls out information about the input data source, the output data sink, and describes the compute transformations on input data streams to produce output meta-data and data objects.
A workflow is hierarchically defined using the following elements:
- Container: A container represents a data or catalog store that forms the source and/or sink for a workflow.
- Data Processor: A data processor represents an AkriNode or Soft Edge instance on which the workflow will be executed.
- Filter: A filter captures the processing module that implements a certain well defined data transformation operation. The Akridata System provides users with a fair degree of flexibility to reuse existing processing modules. Users can provide filters whose implementations are captured by a Java jar file, a Python script, a Python script wrapping a TensorFlow model, or a Docker image.
- Filter Graph: A filter graph is a directed acyclic graph (DAG) that represents a composition of functionality, using filters as nodes and edges representing the data flow through the graph.
- Workflow: A workflow is a collection of multiple filter graphs. To allow composability and reuse, filters and filter graphs may be parameterized and these parameters are bound to concrete values in the workflow specification. A workflow also binds filter graph(s) to concrete input and output containers.
A workflow is a fully bound specification composed from filter graphs, which in turn, are composed from filters. This architecture provides modularization and reuse of components.
Each of the above elements are specified in YAML.