Introduction
  • 21 Nov 2024
  • 1 Minute to read
  • Dark
    Light
  • PDF

Introduction

  • Dark
    Light
  • PDF

Article summary

Datagen(Data synthesis) is the process of synthesizing images based on concepts marked on them. The method provides a path to augmenting your datasets with images that are difficult to collect in the real world.

Applicability

  • It helps create images of rare data types, such as defects on metal surfaces.

  • It helps create datasets for medical imaging, which assists in training machine-learning models for diagnosing and detecting diseases.

  • It enables the simulation of scenarios in manufacturing and production industries that involve developing and testing AI models.

  • Allows generation of datasets for training models applicable to security and surveillance systems.

How does it work?

For illustration purposes, let’s understand how Datagen(Data Synthesis) helps synthesize concepts on images and generate the required files for a Dataset .

Prerequisites

  • Representative images for

    • Concept -  A minimum of 6 images with the concept localized with a bounding box.

    • Generation canvas - A minimum of 6 images to mark polygons to represent the area within the image where the concept must be synthesized.

    • Background  - A minimum of 6 images(recommended: 50 images) that capture the variety of backgrounds on which the concept must be synthesized.

  • The images can be uploaded through the browser or fetched from an already registered dataset.

Datagen(Synthesis) process

The following diagram depicts the overall Datagen flow in Data Explorer.

The process involves the following steps:

  1. Provide inputs with images: This involves bounding the concept, marking its location on images, and adding background images on which the concept should be synthesized.

  2. Preview the synthesized image: Build a session to generate and preview the results.

  3. Review the synthesized images: Review the results and provide feedback by marking which synthesized images are good and which are not. Optionally, add text comment explaining why a particular image is good or not.

  4. Rerun preview: Iterative through one or more preview + review iterations.

  5. Accept and generate: If preview results are acceptable, generate the required number of images.

You can download the generated files or push the images into a dataset. Once registered under the dataset, all dataset, catalog, and job visualization functionalities are available on the synthesized images.


Was this article helpful?