AI-based Generation Implementation

TDM 9.0 adds integration with AI-based entities' generation (currently limited to a non-hierarchical BE). K2view's TDM supports 2 methods of synthetic entities' generation:

  • Rule-based generation
  • AI-based generation

The user, who creates the task, can select either one of these methods to generate synthetic entities by the task. The AI-based data generation supports only one LU (one schema).

The below diagram describes the integration between TDM and AI:

tdm-ai

Training Task

The training task creates the training models on the LU schema tables. This is a prerequisite for AI-based data generation as data generation is based on a selected training model.

The following diagram describes the execution of the AI training task:

ai training

AI-based Generation Task

The AI-based data generation task generates synthetic entities based on a selected training model. The generated entities are imported to the Test Data Store (Fabric), from where they can be loaded to any target environment.

The following diagram describes the execution of the AI training task:

ai training

Implementation Steps

AI Globals

The following shared Globals have been added to the AI-based data generation:

  • AI_DB_INTERFACE - the name of the AI DB interface. The default value is AI_DB.
  • CREATE_AI_K2SYSTEM_DB - this Global indicates whether the TDM deploy flow needs to create the AI k2system tables in case they do not exist. The default value is false. Set this Global to true in order to implement the AI-based data generation.
  • AI_ENVIRONMENT - this is the name of the AI dummy environment. The default value is AI.

AI Interfaces

  • AI_DB - this Postgres interface must be active in order to enable the AI-based generation functionality. The TDM portal does not allow creating AI-based training or generation tasks if this interface is inactive. You can set the same connection details as the TDM DB if you wish to include the AI schemas in the TDM DB.
  • AI_Execution - this interface must be active in order to enable the AI-based generation functionality. The TDM portal does not allow to create AI-based training or generation tasks if this interface is inactive.
  • AI_DB_External - this custom interface must be active in order to enable the AI-based generation functionality. This custom interface is utilized in order to securely allow Fabric to interact with the Kubernetes server (K8s server). The AI_DB_EXTERNAL custom interface should have the same credentials as the AI_DB interface, and the Data field should be populated with your database name. ai training

Note that by default, the AI interfaces are disabled (inactive).

Click here for more information about Custom Interface.

Click here for more information about installing TDM with AI.

AI Environment

Add the AI environment to:

AI MTables

AISpecialAndCategoricalFields

  • This is an optional table that enables to override the default field classification of either special parameters or categorical in the AI training process:

    • Special parameters are text fields with high cardinality (above the default threshold set in training execution params). For these fields, the data generation generates values that do not emerge directly from the original data. The generated values do not have to be real, just look realistic.

    • Categorical data is a type of data that is used for grouping information for values with a low cardinality. The synthetic data keeps the source values for these fields. An example for categorical data is gender.

  • The Special and Categorical column headings indicate which field type you wish to override the default behavior for - the special parameters or the categorical field. One of these fields must be true for each record.

  • The Indicator column heading indicates how to override the default behavior:

Examples:

  • Do not define a city as a special param as the data generation process has to generate real values for a city.

  • Force the AI to treat the case_note field as a special param and generate a realistic-like dummy value for this field.

  • The MTable will be populated as follows:

    special params

Note:
  • Primary and foreign keys columns, as well as columns that are not string type, cannot be overridden and populated in this table.

AITableFieldsInclusion

  • This is an optional table that enables the inclusion/exclusion of tables/fields of the LU schema export into the PG DB, to be used in the AI training process. See example:

    special params

K2system Tables

  • Creation of the K2system tables:

    • This shall be done by the TDM deploy flow if the CREATE_AI_K2SYSTEM_DB global is set to true.

    • These created tables are populated by the TDM AI task and the AI job:

        - Task_executions: This table holds all the task executions for all the task types.
        - Task_execution_stats: A table that should be updated during the job execution. Will be holding any informative statistics/metrics that may be useful for a later analysis.
        - Entity_list: A table with all the entities relevant to an existing training/generation job.
      

k2system_tables

Overriding Generated Values

  • In some cases, it may be required to fix or override some of the AI-based generated values. It can be implemented either by defining a post-execution flow that gets the generated entities and updates them, or by adding an override logic to the load flows to update the values before they are loaded to the target environment.

LU Implementation

  • Verify that the linked fields in the LU tables have identical data types. The linked fields must have identical data types in order to support the MDB export of the LU schema into the TDM DB.
  • Verify that the linked fields are defined as either PKs or unique indexes in the parent LU table in order to support the MDB export of these tables. All the parent LU table's PK/unique index fields must be linked to the child LU table. This is required for creating the FK relation in the PG DB for the exported LU tables.
  • The MDB export does not support multiple populations with different links to parent tables. The LU tables must have one link to a parent LU table.

LU Schema Update

If the LU schema is updated, the next training task execution will drop and recreate the schema tables for the updated LU.

Cleanup Process

The cleanup process of both the AI execution server and the AI DB is manual, and it runs a dedicated flow. Click here for more information about the AI cleanup process.

Previous

AI-based Generation Implementation

TDM 9.0 adds integration with AI-based entities' generation (currently limited to a non-hierarchical BE). K2view's TDM supports 2 methods of synthetic entities' generation:

  • Rule-based generation
  • AI-based generation

The user, who creates the task, can select either one of these methods to generate synthetic entities by the task. The AI-based data generation supports only one LU (one schema).

The below diagram describes the integration between TDM and AI:

tdm-ai

Training Task

The training task creates the training models on the LU schema tables. This is a prerequisite for AI-based data generation as data generation is based on a selected training model.

The following diagram describes the execution of the AI training task:

ai training

AI-based Generation Task

The AI-based data generation task generates synthetic entities based on a selected training model. The generated entities are imported to the Test Data Store (Fabric), from where they can be loaded to any target environment.

The following diagram describes the execution of the AI training task:

ai training

Implementation Steps

AI Globals

The following shared Globals have been added to the AI-based data generation:

  • AI_DB_INTERFACE - the name of the AI DB interface. The default value is AI_DB.
  • CREATE_AI_K2SYSTEM_DB - this Global indicates whether the TDM deploy flow needs to create the AI k2system tables in case they do not exist. The default value is false. Set this Global to true in order to implement the AI-based data generation.
  • AI_ENVIRONMENT - this is the name of the AI dummy environment. The default value is AI.

AI Interfaces

  • AI_DB - this Postgres interface must be active in order to enable the AI-based generation functionality. The TDM portal does not allow creating AI-based training or generation tasks if this interface is inactive. You can set the same connection details as the TDM DB if you wish to include the AI schemas in the TDM DB.
  • AI_Execution - this interface must be active in order to enable the AI-based generation functionality. The TDM portal does not allow to create AI-based training or generation tasks if this interface is inactive.
  • AI_DB_External - this custom interface must be active in order to enable the AI-based generation functionality. This custom interface is utilized in order to securely allow Fabric to interact with the Kubernetes server (K8s server). The AI_DB_EXTERNAL custom interface should have the same credentials as the AI_DB interface, and the Data field should be populated with your database name. ai training

Note that by default, the AI interfaces are disabled (inactive).

Click here for more information about Custom Interface.

Click here for more information about installing TDM with AI.

AI Environment

Add the AI environment to:

AI MTables

AISpecialAndCategoricalFields

  • This is an optional table that enables to override the default field classification of either special parameters or categorical in the AI training process:

    • Special parameters are text fields with high cardinality (above the default threshold set in training execution params). For these fields, the data generation generates values that do not emerge directly from the original data. The generated values do not have to be real, just look realistic.

    • Categorical data is a type of data that is used for grouping information for values with a low cardinality. The synthetic data keeps the source values for these fields. An example for categorical data is gender.

  • The Special and Categorical column headings indicate which field type you wish to override the default behavior for - the special parameters or the categorical field. One of these fields must be true for each record.

  • The Indicator column heading indicates how to override the default behavior:

Examples:

  • Do not define a city as a special param as the data generation process has to generate real values for a city.

  • Force the AI to treat the case_note field as a special param and generate a realistic-like dummy value for this field.

  • The MTable will be populated as follows:

    special params

Note:
  • Primary and foreign keys columns, as well as columns that are not string type, cannot be overridden and populated in this table.

AITableFieldsInclusion

  • This is an optional table that enables the inclusion/exclusion of tables/fields of the LU schema export into the PG DB, to be used in the AI training process. See example:

    special params

K2system Tables

  • Creation of the K2system tables:

    • This shall be done by the TDM deploy flow if the CREATE_AI_K2SYSTEM_DB global is set to true.

    • These created tables are populated by the TDM AI task and the AI job:

        - Task_executions: This table holds all the task executions for all the task types.
        - Task_execution_stats: A table that should be updated during the job execution. Will be holding any informative statistics/metrics that may be useful for a later analysis.
        - Entity_list: A table with all the entities relevant to an existing training/generation job.
      

k2system_tables

Overriding Generated Values

  • In some cases, it may be required to fix or override some of the AI-based generated values. It can be implemented either by defining a post-execution flow that gets the generated entities and updates them, or by adding an override logic to the load flows to update the values before they are loaded to the target environment.

LU Implementation

  • Verify that the linked fields in the LU tables have identical data types. The linked fields must have identical data types in order to support the MDB export of the LU schema into the TDM DB.
  • Verify that the linked fields are defined as either PKs or unique indexes in the parent LU table in order to support the MDB export of these tables. All the parent LU table's PK/unique index fields must be linked to the child LU table. This is required for creating the FK relation in the PG DB for the exported LU tables.
  • The MDB export does not support multiple populations with different links to parent tables. The LU tables must have one link to a parent LU table.

LU Schema Update

If the LU schema is updated, the next training task execution will drop and recreate the schema tables for the updated LU.

Cleanup Process

The cleanup process of both the AI execution server and the AI DB is manual, and it runs a dedicated flow. Click here for more information about the AI cleanup process.

Previous