- 10 Dec 2022
- 5 Minutes to read
- Updated on 10 Dec 2022
- 5 Minutes to read
Each dataset has a catalog consisting of one or more tables that store metadata associated with featurized data. The catalog data can reside in the following category of tables.
- Internal tables - Tables that are auto-created and populated as part of data featurization through the 'adectl run' command.
- Imported tables- Tables that are created and populated when the user imports external catalog information using a CSV file.
- SaaS mode: CSV file must be imported on the web portal UI. Even though 'adectl import' is supported in SaaS mode, the import through the web portal UI is recommended. The details on the structure of the CSV file are described here.
- Local mode: CSV file must be imported using the 'adectl import' command. Catalog import on the web portal UI import is not supported. The details on the structure of the CSV file are described here.
- External tables - Tables that are present in an external catalog that is registered on the External catalog page. This is available only in SaaS mode. The details on the fields that must be present in external tables are described here.
All catalog-related functionality is accessed by navigating to 'Data -> Repo -> Datasets' and clicking the 'Catalog' button on the card corresponding to the specific dataset.
Catalog page overview
The catalog page is shown in the above screenshot. The functionality in each numbered area of the page is described below.
- List of tables in this dataset: The tables could be internal tables created as part of executing adectl run or imported tables that have been populated using an external CSV file.
- List of views in this dataset: A view is a virtual table created by joining an internal table with one or more imported tables or tables present in the external catalog registered on the External catalog page.
- Actions on the results of a catalog table/view query: There are 2 actions possible
- Download results as a CSV file by clicking the 'Download' button.
- Create a data curation job by clicking the 'Visualize' button.
- Query editor: The edit operation allows adding filtering, sorting, and limit conditions to the catalog query.
- For internal tables and views, the limit is applied in terms of frames/images and not in terms of the number of rows. For example, if the limit is 500, then 500 frames/images will be chosen, and all rows corresponding to these 500 frames/images will be returned in the result. The number of frames and number of rows are displayed in the result.
- For imported tables, the limit is applied to the number of rows returned.
- This has a menu for the following actions:
- Create Table: Create a table as a preparatory step for importing a catalog using a CSV file. Please see Import catalog for more details.
- Create View: Initiate the creation of Views.
- Queries: This option takes to the queries page that lists all catalog queries currently executing or completed.
- Import Jobs: This option lists all the Import catalog jobs that are currently running or have been completed.
- Column selector: The visible set of columns can be filtered using this option.
Corresponding to each table/view, an eye icon is provided, which executes the default query to fetch rows with the limit set to 500 without applying filter conditions.
To edit the query executed and filter out the rows based on some conditions, click the 'edit query' button highlighted below.
This opens up the query editor as below
The query editor has the following options:
- Add Conditions: Allows adding filter conditions that become part of the WHERE clause in the query.
- Order By: Allows specifying one or more sort keys and sort order.
- Limit: The 'Max frames' option provides a drop-down to select the maximum number of frames whose rows should be returned. This limit is capped to the number of frames defined in the plan limits corresponding to the subscription plan that is currently active.
- Custom conditions: A text box provided to specify conditions that commonly used operators cannot specify, like 'greater than', 'less than' etc. In the example above, a custom condition (file_id % 8 = 0) that filters every 8th frame can be seen. The custom conditions text box can specify multiple conditions separated by AND/OR operators.
The Queries page consists of the following:
- List of all catalog queries executed on the dataset and their status.
- A 'Saved Queries' tab that lists the saved queries and provides actions to rerun these saved queries.
This page is accessed by clicking on the Action menu, as shown below.
The Queries page is as below and consists of the following:
- Query String: Shows the query filter conditions, limit, and order-by conditions. Click on the 'Query String' to open the catalog page with the same query string to edit and rerun the query.
- Source Name: The table or view name on which the query was issued.
- Frame Count: Number of image files or video frames that were requested in the query.
- Rows Count: Number of rows returned by the query. If one row for an image file/video frame matches the query conditions, then all rows for that image file/video frame are returned. Hence, the 'Rows Count' may not be the same as the 'Frame Count'.
- Download: Download the results as a CSV file. The query results expire based on the total size of all query results and expired query results are not available for download.
A query can be saved for re-execution later using the 'Save Query' button below.
Enter a name and description in the form provided.
On successfully saving the query, the 'Unnamed' changes to the name provided.
All saved queries are available under the 'Saved Queries' tab on the 'Queries' page.
On clicking a saved query, you will be taken to the catalog query page with the conditions from the saved query populated.
From here, you can edit the conditions and rerun the query as per your requirements.