TDM 9.0 adds integration with AI-based entities' generation (currently limited to a non-hierarchical BE). K2view's TDM supports 2 methods of synthetic entities' generation:
The user, who creates the task, can select either one of these methods to generate synthetic entities by the task. The AI-based data generation supports only one LU (one schema).
The below diagram describes the integration between TDM and AI:
The training task creates the training models on the LU schema tables. This is a prerequisite for AI-based data generation as data generation is based on a selected training model.
The following diagram describes the execution of the AI training task:
The AI-based data generation task generates synthetic entities based on a selected training model. The generated entities are imported to the Test Data Store (Fabric), from where they can be loaded to any target environment.
The following diagram describes the execution of the AI training task:
The following shared Globals have been added to the AI-based data generation:
Note that by default, the AI interfaces are disabled (inactive).
Click here for more information about Custom Interface.
Click here for more information about installing TDM with AI.
Add the AI environment to:
This is an optional table that enables to override the default field classification of either special parameters or categorical in the AI training process:
Special parameters are text fields with high cardinality (above the default threshold set in training execution params). For these fields, the data generation generates values that do not emerge directly from the original data. The generated values do not have to be real, just look realistic.
Categorical data is a type of data that is used for grouping information for values with a low cardinality. The synthetic data keeps the source values for these fields. An example for categorical data is gender.
The Special and Categorical column headings indicate which field type you wish to override the default behavior for - the special parameters or the categorical field. One of these fields must be true for each record.
The Indicator column heading indicates how to override the default behavior:
Examples:
Do not define a city as a special param as the data generation process has to generate real values for a city.
Force the AI to treat the case_note field as a special param and generate a realistic-like dummy value for this field.
The MTable will be populated as follows:
This is an optional table that enables the inclusion/exclusion of tables/fields of the LU schema export into the PG DB, to be used in the AI training process. See example:
Creation of the K2system tables:
This shall be done by the TDM deploy flow if the CREATE_AI_K2SYSTEM_DB global is set to true.
These created tables are populated by the TDM AI task and the AI job:
- Task_executions: This table holds all the task executions for all the task types.
- Task_execution_stats: A table that should be updated during the job execution. Will be holding any informative statistics/metrics that may be useful for a later analysis.
- Entity_list: A table with all the entities relevant to an existing training/generation job.
If the LU schema is updated, the next training task execution will drop and recreate the schema tables for the updated LU.
The cleanup process of both the AI execution server and the AI DB is manual, and it runs a dedicated flow. Click here for more information about the AI cleanup process.
TDM 9.0 adds integration with AI-based entities' generation (currently limited to a non-hierarchical BE). K2view's TDM supports 2 methods of synthetic entities' generation:
The user, who creates the task, can select either one of these methods to generate synthetic entities by the task. The AI-based data generation supports only one LU (one schema).
The below diagram describes the integration between TDM and AI:
The training task creates the training models on the LU schema tables. This is a prerequisite for AI-based data generation as data generation is based on a selected training model.
The following diagram describes the execution of the AI training task:
The AI-based data generation task generates synthetic entities based on a selected training model. The generated entities are imported to the Test Data Store (Fabric), from where they can be loaded to any target environment.
The following diagram describes the execution of the AI training task:
The following shared Globals have been added to the AI-based data generation:
Note that by default, the AI interfaces are disabled (inactive).
Click here for more information about Custom Interface.
Click here for more information about installing TDM with AI.
Add the AI environment to:
This is an optional table that enables to override the default field classification of either special parameters or categorical in the AI training process:
Special parameters are text fields with high cardinality (above the default threshold set in training execution params). For these fields, the data generation generates values that do not emerge directly from the original data. The generated values do not have to be real, just look realistic.
Categorical data is a type of data that is used for grouping information for values with a low cardinality. The synthetic data keeps the source values for these fields. An example for categorical data is gender.
The Special and Categorical column headings indicate which field type you wish to override the default behavior for - the special parameters or the categorical field. One of these fields must be true for each record.
The Indicator column heading indicates how to override the default behavior:
Examples:
Do not define a city as a special param as the data generation process has to generate real values for a city.
Force the AI to treat the case_note field as a special param and generate a realistic-like dummy value for this field.
The MTable will be populated as follows:
This is an optional table that enables the inclusion/exclusion of tables/fields of the LU schema export into the PG DB, to be used in the AI training process. See example:
Creation of the K2system tables:
This shall be done by the TDM deploy flow if the CREATE_AI_K2SYSTEM_DB global is set to true.
These created tables are populated by the TDM AI task and the AI job:
- Task_executions: This table holds all the task executions for all the task types.
- Task_execution_stats: A table that should be updated during the job execution. Will be holding any informative statistics/metrics that may be useful for a later analysis.
- Entity_list: A table with all the entities relevant to an existing training/generation job.
If the LU schema is updated, the next training task execution will drop and recreate the schema tables for the updated LU.
The cleanup process of both the AI execution server and the AI DB is manual, and it runs a dedicated flow. Click here for more information about the AI cleanup process.