Pipeline operations on a dataset
  • 01 Jun 2023
  • 2 Minutes to read
  • Dark
    Light
  • PDF

Pipeline operations on a dataset

  • Dark
    Light
  • PDF

Article Summary

As described in the Register dataset article, you can attach pipelines to the dataset when the dataset is created. This article describes the steps to attach and detach pipelines after the dataset is successfully registered.

Attach pipeline to the dataset

  1. On the dataset listing page, the 3-dots button on the dataset card shows a list of operations available on the dataset. Click on the 'Attach Pipeline' option.


  2. Select the pipeline from the drop-down of available pipelines. The 'starred' pipelines are the recommended pipelines.
  3. Select the policy for this attachment that determines the mode(scheduled Vs triggered) of ingestion and the compute resources where ingestion will be done.
    1. Schedule policy(BETA): The ingestion will be triggered as per the provided schedule on the selected cluster. Currently, only one pre-provisioned cluster, 'AkridataEdgeCluster', is available for selection, and this list will be extended with user-registered clusters in the future. The schedule is specified using a cron string. 
    2. On-demand policy(BETA): In this mode, the ingestion is triggered by the user as needed on the selected cluster.
    3. Manual adectl run: In this mode, the compute resource for ingestion is to be provisioned by the user and ingestion must be triggered using the adectl command line utility.
  4. Click on the 'Attach' button.

View pipeline attachment details

  1. The list of attachments for a dataset can be viewed on the dataset details page accessed by clicking the dataset name on the dataset card on the dataset listing page.

    This opens the dataset details page below with a listing of all pipelines attached to the dataset, the attachment policy and other details.

Operations on pipeline attachments

The dataset details page has the following action buttons for each pipeline attachment. 

  1. Edit attachment: This action allows changing attachment details like the policy and schedule. An attachment with a 'manual adectl run' policy cannot be changed to other policies and vice-versa, and hence in the below screenshot, the 'Schedule policy' and 'On demand policy' is greyed out.
  2. View catalog: This action opens catalog page with tables specific to the pipeline in view, as shown below.
  3. View details: For Schedule and On-demand policy attachments, this action shows details of the last executed ingestion session that was scheduled or triggered by the user. The details section shows the progress percent(for in-progress sessions) and other details as shown below.
  4. Ingest Now: For 'On-demand' policy attachments, this action opens a form below where the user must enter the sub-directory on which ingestion must be triggered. Any file already ingested will be skipped automatically. The sub-directory should be relative to the URI specified in the container specification.

Detach pipeline from dataset

  1. On the dataset listing page, the 3-dots button on the dataset card shows a list of operations available on the dataset. Click on the 'Attach Pipeline' option.


  2. Select pipeline to detach from drop-down list of attached pipelines.
  3. Click on the 'Detach' button.
Ingested data stays after the 'Detach' operation
Whatever data has been ingested by the detached pipeline will stay in the system and be accessible for catalog browsing and job creation. The detached pipeline will not be executed on new data. If the same pipeline is re-attached, all data that came into the dataset while the pipeline was detached will go through the re-attached pipeline's processing.

Was this article helpful?