Introduction
  • 17 Mar 2024
  • 3 Minutes to read
  • Dark
    Light
  • PDF

Introduction

  • Dark
    Light
  • PDF

Article Summary

Each dataset has a catalog consisting of one or more tables that store metadata associated with objects in the dataset. The catalog data resides in the following category of tables.

  • Internal tables - Tables that are auto-created and populated as part of data ingestion through the execution of pipelines.
  • Imported tables - Tables created and populated when the user imports external catalog information by uploading files in supported format(like CSV or COCO).
    • SaaS mode: The catalog file must be imported on the web portal UI. The details on the structure of the CSV file are described here.
    • Local mode: CSV file must be imported using the 'adectl import' command. Catalog import on the web portal UI import is not supported. The details on the structure of the CSV file are described here.
  • External tables - Tables that are present in an external catalog database that is registered on the External catalog page. This is available only in SaaS mode. The details on the fields that must be present in external tables are described here.

All catalog-related functionality is accessed by navigating to 'Data -> Repo -> Datasets' and clicking the 'Catalog' button on the card corresponding to the specific dataset.

The below screenshot shows a catalog page. The functionality in each numbered area of the page is described below.

  1. Pipeline tables: Each pipeline attached to the dataset creates one table named 'primary', which holds all the files processed by that pipeline. There will be an additional table per attribute generator for pipelines with an attribute generator docker. The rows in these tables have file_path, file_id, and frame_idx_in_file as the common columns that can be used to join/correlate the contents across tables.
  2. Dataset tables: This section lists tables created when the catalog is imported through file upload in supported formats(CSV or COCO)
  3. Views: A view is a virtual table created by joining a pipeline table with one or more imported tables or tables in the external catalog registered on the External catalog page.
  4. Dataset level actions: The following actions are available.
    1. Create Table: Create a table as a preparatory step before importing the catalog. Importing CSV or COCO files automatically infers the columns and their types. This step is required only if the table needs to be created with specialized types or if some extra columns must be created that currently don't exist in the file to be imported.
    2. Create View: Initiate the creation of Views.
    3. Queries: This option takes to the queries page that lists all catalog queries currently executing or completed.
    4. Import catalog: This option takes you through the steps described in Quick catalog import where required tables with auto-inferred column types are automatically created.
    5. Import Jobs: This option lists all the Import catalog jobs currently running or completed.
  5. Query editor: The edit operation allows adding filtering, sorting, and limit conditions to the catalog query. 
    1. For internal tables and views, the limit is applied to frames/images and not the number of rows. For example, if the limit is 500, 500 frames/images will be chosen, and all rows corresponding to these 500 frames/images will be returned in the result. The number of frames and rows is displayed in the result.
    2. The limit is applied to the number of rows returned for imported tables.
  6. Query result actions: After executing a catalog query or fetching the results of a past executed query, the following are the available actions
    1. Visualize: Create a data exploration(Explore), model analysis(Analyze), or data comparison(Compare) job.
    2. Auto Label: Create a data labeling job for the classes described in a labeling spec.
    3. Global Search: A global search job is a background search operation that does a more exhaustive search on a larger number of points in the dataset. Currently, the global search supports up to 2 times the number of points supported for the 'Visualize' operation.
  7. Column selector: The visible set of columns can be filtered using this option.
  8. Download: This action downloads the results as a CSV file.
  9. Pagination controls: Pagination controls to navigate through the results.

Was this article helpful?

What's Next