External features

For specific datasets, you may have a domain-specific featurizer that best captures the features of that dataset. For such cases, when registering a dataset, you can specify 'Featurizer Type' as 'External.' This article describes the format of the CSV (comma separated values) file to import these external features. Please refer to Register dataset for more details about External featurizer.

file_path(string),frame_idx_in_file(int),features(float[5])
dir1/1.jpg,0,0,0.1,0.25,0.3,0.4
dir1/2.jpg,0,0,0.1,0.35,0.01,0.23

The first row is the header row with three columns, namely the file_path(string), frame_idx_in_file(int), and features(float[N]). The N is the number of features available per file.
The file_path identifies the file for which the feature is being imported. It can be in either of the below forms.
1. Relative path
  1. URI specified in container specification for S3 bucket, Azure blob store, and Google Cloud Storage bucket. As an example, if the container URI is s3://my-bucket and the file path is s3://my-bucket/vehicle/a.jpg, then the file_path should be specified as /vehicle/a.jpg
  2. For the directory on the local filesystem, the file path should be relative to the data directory configured during the 'adectl config'.
2. Absolute path
The frame_idx_in_file must be set to 0 for the image type of the dataset and set to the frame number within the video file for the video type of the dataset.
Each data row must have N+2 comma-separated values.
The features must be normalized between 0 and 1.
There should be only 1 row per file.
For a Video dataset, if 'Sampling rate (fps)' was specified when the dataset was created, then features for only those frames that qualify as per the sampling rate are considered. The other rows in the CSV file are ignored.

External features (Bring your own features)