- 03 Apr 2025
- 1 Minute to read
- Print
- DarkLight
- PDF
Overview
- Updated on 03 Apr 2025
- 1 Minute to read
- Print
- DarkLight
- PDF
A dataset is an entity that holds a subset of data in a container along with all associated catalog tables and catalog views (virtual tables). The dataset page is available under 'Data -> Repo' on the left navigation panel.
The Dataset page has the following information and controls:
Search box: To search for a dataset based on its name.
Dataset Type: Image or Video.
Public dataset: When a public dataset is imported as described in Import public dataset, a 'Public' badge is shown on the dataset card.
Filtering: Filter the page based on data type, status, and Dataset Type.
Data Type: You can filter by video or image data type.
Status: The status could be
Activated: A dataset on which new data can be registered and existing data can be accessed and visualized.
Deactivated: A dataset on which no new data can be registered. Existing data can be accessed and visualized.
Dataset Type: Filter datasets based on the following dataset types: Default, Datagen, Manual Upload, Inspection Studio.
For example, when you search for Dataset Type as Manual Upload, all datasets of the matching type appear. You can identify such datasets with the label MU, as shown.Similarly, projects of Datagen type are labelled with DG and Inspection Studio are labelled with IS.
Catalog: Opens up the catalog page for the dataset where all catalog-related operations like querying catalog, importing external catalog, creating views etc. can be performed.
Visualize/Create Visualization: This button creates a default visualization job or opens the default visualization job if it already exists. The default visualization job is a quick way to explore the dataset's contents.
A 3-dot 'Actions' button for additional actions as below.
Visualization: Options to create a default visualization job or refresh the default visualization job in case new data has been ingested into the dataset.
Pipeline: Pipeline-related operations are used to attach/detach pipelines or execute all pipelines attached to the dataset.
Catalog: Catalog-related operations like creating a new table or viewing all the catalog import jobs.
Settings: Other operations on the dataset like managing permissions, deactivating a dataset etc.