---
title: "Introduction"
slug: "datagen-introduction"
updated: 2024-11-21T04:35:06Z
published: 2024-11-21T04:35:06Z
canonical: "docs.akridata.ai/datagen-introduction"
---

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.akridata.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Introduction

Datagen(Data synthesis) is the process of synthesizing images based on concepts marked on them. The method provides a path to augmenting your datasets with images that are difficult to collect in the real world.

## Applicability

- It helps create images of rare data types, such as defects on metal surfaces.
- It helps create datasets for medical imaging, which assists in training machine-learning models for diagnosing and detecting diseases.
- It enables the simulation of scenarios in manufacturing and production industries that involve developing and testing AI models.
- Allows generation of datasets for training models applicable to security and surveillance systems.

## How does it work?

For illustration purposes, let’s understand how Datagen(Data Synthesis) helps synthesize concepts on images and generate the required files for a Dataset .

### Prerequisites

- Representative images for
  - Concept - A minimum of 6 images with the concept localized with a bounding box.
  - Generation canvas - A minimum of 6 images to mark polygons to represent the area within the image where the concept must be synthesized.
  - Background - A minimum of 6 images(recommended: 50 images) that capture the variety of backgrounds on which the concept must be synthesized.
- The images can be uploaded through the browser or fetched from an already registered dataset.

### Datagen(Synthesis) process

The following diagram depicts the overall Datagen flow in Data Explorer.

![](https://cdn.document360.io/3e9d4528-fbc6-4948-a804-8ee7068e7ac3/Images/Documentation/Datagen-flow (1).png)

The process involves the following steps:

1. Provide inputs with images: This involves bounding the concept, marking its location on images, and adding background images on which the concept should be synthesized.
2. Preview the synthesized image: Build a session to generate and preview the results.
3. Review the synthesized images: Review the results and provide feedback by marking which synthesized images are good and which are not. Optionally, add text comment explaining why a particular image is good or not.
4. Rerun preview: Iterative through one or more preview + review iterations.
5. Accept and generate: If preview results are acceptable, generate the required number of images.

You can download the generated files or push the images into a dataset. Once registered under the dataset, all dataset, catalog, and job visualization functionalities are available on the synthesized images.

A dataset is an entity that specifies a selector on the contents of the container. A dataset can be of Image or Video type. The selector is a glob pattern like *.png
