TDM 9.0 adds integration with AI-based entities' generation (currently limited to a non-hierarchical BE). K2view's TDM supports two methods of generating synthetic entities:
The user who creates the task can select either of these methods to generate synthetic entities for the task. The AI-based data generation supports only one LU (one schema).
The diagram below describes the integration between TDM and AI:
The training task creates the training models on the LU schema tables. This is a prerequisite for AI-based data generation, as data generation is based on a selected training model.
The following diagram describes the execution of the AI training task:
The AI-based data generation task generates synthetic entities based on a selected training model. The generated entities are imported into the Test Data Store (Fabric), from which they can be loaded into any target environment.
The following diagram describes the execution of the AI training task:
The following shared Globals have been added to the AI-based data generation:
Note that by default, the AI interfaces are disabled (inactive).
Click here for more information about Custom Interface.
Click here for more information about installing TDM for AI-driven synthetic data generation.
Add the AI environment to:
A new constTable Actor has been added in TDM 9.4: AIConfigParams. This Actor is located in Implementation/SharedObjects/Broadway/TDM/TDMImplementorActors/ folder. This table holds the infrastructure configuration parameters, such as cloud provider(GCP, AWS, Azure), AI training/generation/evaluation images, for the AI processes. Open the AIConfigParams and click the Input's description to view the detailed description of the table's fields. Edit the required parameters in this table.
This is an optional table that enables to override the default field classification of either special parameters or categorical in the AI training process:
Special parameters are text fields with high cardinality (above the default threshold set in training execution params). For these fields, the data generation produces values that do not directly emerge from the original data. The generated values do not have to be real, but they should appear realistic.
Categorical data is a type of data that is used for grouping information for values with a low cardinality. The synthetic data keeps the source values for these fields. An example for categorical data is gender.
The override_special and override_categorical column headings indicate if to override the default classification of the fields as special parameters or categorical data. One of these fields must be true for each record.
The Indicator column heading indicates how to override the default behavior:
Examples:
Do not define a city as a special parameter, as the data generation process has to generate real values for a city. Override the special parameters' default classification and set the indicator to false in order to indicate that the city must not be treated as a special parameter field.
Force the AI to treat the case_note field as a special parameter and generate a realistic-like dummy value for this field.
The MTable will be populated as follows:
This is an optional table that enables the inclusion/exclusion of tables/fields from the LU schema export into the PG DB, to be used in the AI training process. See example:
Creation of the K2system tables:
This shall be done by the TDM deploy flow if the CREATE_AI_K2SYSTEM_DB global is set to true.
The TDM AI task and the AI job populate these created tables:
- Task_executions: This table stores all task executions for all task types.
- Task_execution_stats: A table that should be updated during the job execution. Will hold any informative statistics or metrics that may be useful for a later analysis.
- Entity_list: A table with all the entities relevant to an existing training/generation job.
If the LU schema is updated, the subsequent training task execution will drop and recreate the schema tables for the updated LU.
The cleanup process for both the AI execution server and the AI DB is manual and runs a dedicated flow. Click here for more information about the AI synthetic data generation cleanup process.
TDM 9.0 adds integration with AI-based entities' generation (currently limited to a non-hierarchical BE). K2view's TDM supports two methods of generating synthetic entities:
The user who creates the task can select either of these methods to generate synthetic entities for the task. The AI-based data generation supports only one LU (one schema).
The diagram below describes the integration between TDM and AI:
The training task creates the training models on the LU schema tables. This is a prerequisite for AI-based data generation, as data generation is based on a selected training model.
The following diagram describes the execution of the AI training task:
The AI-based data generation task generates synthetic entities based on a selected training model. The generated entities are imported into the Test Data Store (Fabric), from which they can be loaded into any target environment.
The following diagram describes the execution of the AI training task:
The following shared Globals have been added to the AI-based data generation:
Note that by default, the AI interfaces are disabled (inactive).
Click here for more information about Custom Interface.
Click here for more information about installing TDM for AI-driven synthetic data generation.
Add the AI environment to:
A new constTable Actor has been added in TDM 9.4: AIConfigParams. This Actor is located in Implementation/SharedObjects/Broadway/TDM/TDMImplementorActors/ folder. This table holds the infrastructure configuration parameters, such as cloud provider(GCP, AWS, Azure), AI training/generation/evaluation images, for the AI processes. Open the AIConfigParams and click the Input's description to view the detailed description of the table's fields. Edit the required parameters in this table.
This is an optional table that enables to override the default field classification of either special parameters or categorical in the AI training process:
Special parameters are text fields with high cardinality (above the default threshold set in training execution params). For these fields, the data generation produces values that do not directly emerge from the original data. The generated values do not have to be real, but they should appear realistic.
Categorical data is a type of data that is used for grouping information for values with a low cardinality. The synthetic data keeps the source values for these fields. An example for categorical data is gender.
The override_special and override_categorical column headings indicate if to override the default classification of the fields as special parameters or categorical data. One of these fields must be true for each record.
The Indicator column heading indicates how to override the default behavior:
Examples:
Do not define a city as a special parameter, as the data generation process has to generate real values for a city. Override the special parameters' default classification and set the indicator to false in order to indicate that the city must not be treated as a special parameter field.
Force the AI to treat the case_note field as a special parameter and generate a realistic-like dummy value for this field.
The MTable will be populated as follows:
This is an optional table that enables the inclusion/exclusion of tables/fields from the LU schema export into the PG DB, to be used in the AI training process. See example:
Creation of the K2system tables:
This shall be done by the TDM deploy flow if the CREATE_AI_K2SYSTEM_DB global is set to true.
The TDM AI task and the AI job populate these created tables:
- Task_executions: This table stores all task executions for all task types.
- Task_execution_stats: A table that should be updated during the job execution. Will hold any informative statistics or metrics that may be useful for a later analysis.
- Entity_list: A table with all the entities relevant to an existing training/generation job.
If the LU schema is updated, the subsequent training task execution will drop and recreate the schema tables for the updated LU.
The cleanup process for both the AI execution server and the AI DB is manual and runs a dedicated flow. Click here for more information about the AI synthetic data generation cleanup process.