Knowledge Base Academy Community My Tickets

Home Knowledge Base Academy Community My Tickets

Fabric 8.3

Fabric 8.3 Fabric 8.2 Fabric 8.1 Fabric 8.0 Fabric 7.2 Fabric 7.1 Fabric 7.0 Fabric 6.5

Knowledge Base Fabric Batch Process Architecture

Fabric Batch Process Architecture

Fabric Batch Process Flow

The following activities are automatically triggered when a new Batch process is executed:

A new Batch entry is added in the System DB.
A new Job entry is also recorded in the k2system_jobs table with the following parameters:
- Name = the name of the Batch process.
- Type = BATCH PROCESS.
The WAITING_FOR_JOB status is then assigned to the Batch process.

After this, any available node, or any node whose affinity has been specified in the Batch command, handles the execution of the Job and of the subcommand as specified in the Batch command.

Once the corresponding Job begins, and is set to an IN_PROCESS stage, the Batch process undergoes the following stages:

NEW
GENERATE_IID
IN_PROGRESS
FAILED/CANCELLED/PAUSED/DONE

The below illustration shows how, once triggered from the command line, an asynchronous batch process is automatically encapsulated into a Job process.

The Job Process then launches the batch command, which in turn, is executed through its lifecycle phases.

Scheduling Batch Processes

To schedule a Batch process to be executed either in a given time or recurrently, a scheduled Job process must be created. This can be achieved using a user job that contains the batch command, which needs to be repeatedly invoked.

Essentially, this involves creating a scheduled Job that calls a Batch process, which in turn creates multiple or scheduled one-time Jobs (where each job is parameterized thanks to the execution settings parsed in the Batch command).

The following illustration describes the following steps:

Step 1

The user defines a job scheduled to run a specific batch command. Fabric assigns a job process for this batch command.

Step 2

The dedicated job runs the scheduled or multiple instances of the batch command. The Batch process triggers a new (temporary) job dedicated to this specific process as described in the section above.
The new job runs the batch command.

Step 3

The Jobs table is updated for the next run, and the dedicated job will wait for the next instance of the scheduled batch process.

Batch Process Table in System DB

All batch-related information is displayed in the k2batchprocess keyspace in the batchprocess_list table.

Example

cassandra@cqlsh:k2batchprocess> select * from batchprocess_list;

Field	Value	Description
bid	35408af6-b26a-4243-bc95-f114335bfa5e	Unique ID generated by the system upon batch execution.
creation_time	2020-08-12 12:20:07.894000+0000	Timestamp of when the Batch process was created.
end_time	2020-08-12 12:20:09.176000+0000	Timestamp of when the Batch process ended.
lut_name	AUTODATA_DELTA	Name of the LUT recieving the instances.
start_time	2020-08-12 12:20:07.907000+0000	Timestamp of when the Batch process started.
status	Done	Stage of the Batch process.
total_entities	200	Number of entities processed.

Additional fields featured in the table:

Command

BATCH AUTODATA_DELTA FROM idsFile USING ('select id from ids  limit 100') FABRIC_COMMAND="sync_instance AUTODATA_DELTA.?" with JOB_AFFINITY='10.21.2.102' ASYNC='true';

In this case, the command describes a synchronization process of a list of IDs with affinity set to Node: 10.21.2.102

extra_stats

This field shows the slowest-processed entities, along with their IDs, processing time, status, and field changes:

{"slowestProcessed":[{"entityId":"4","processTimeMS":572,"status":"COMPLETED","result":"{\"Added\":1,\"Updated\":0,\"Unchanged\":0}"},{"entityId":"5","processTimeMS":573,"status":"COMPLETED","result":"{\"Added\":1,\"Updated\":0,\"Unchanged\":0}"},{"entityId":"47","processTimeMS":645,"status":"COMPLETED","result":"{\"Added\":1,\"Updated\":0,\"Unchanged\":0}"}

Batch Process Execution and Resiliency

When executed asynchronously (async flag set to true), the batch process inherits from the Jobs the ability to transfer the process to a different node when a node is no longer active or unresponsive.

This handover mechanism uses the hearbeats and keepalive parameters defined within the node.id file.

The next handling node picks up the batch process (via its associated job) and resumes its execution from the latest known recorded stage.

Each Fabric node uses its Fabric built-in BatchProcessAPI and Job Manager classes to manage the Batch process through its different lifecycle stages, as defined in the above illustrations.

How Does Fabric Handle the Migration Process?

When a migration process is initiated, it is treated as a batch of multiple entities synchronization processes.

The illustration below shows the sequence of actions involved in this process.

Step 1

The batch command (or migrate) is executed from a Fabric node. This node (Node 1) will assume the role of Coordinator throughout this process.
A job process for this batch command starts.

Step 2

The node responsible for the overall execution of the migration process is selected in the Fabric cluster as per the nodes' allocation rules described in the Affinity article.
This node (Node 3) is referred to as the Job Owner node.
The Job Owner node initiates the migration's statistic collection process.

Step 3

The Job Owner node (Node 3) generates a list of iiDs to migrate from the External Source systems. In our example, the iiDs are X1, X2, X3, X4 and X5, referred to as iiDX1, iiDX2, iiDX3, iiDX4, iiDX5.

Step 4

Node 3 initiates worker threads for each node that will participate in the migration process. In our example, all five nodes are required to contribute: the Job Owner node (Node 3), the Coordinator node (Node 1), and the non-coordinator nodes (N2, N4, N5).

Step 5

Each node syncs the instances that have been allocated from the External Source systems.
N3 collects statistical information on each node and entity synchronization. The collected information is written into the System DB (e.g., Cassandra).

Step 6

Each node writes iiDs into the System DB (e.g., Cassandra).

Fabric Batch Process Architecture

Fabric Batch Process Flow

The following activities are automatically triggered when a new Batch process is executed:

A new Batch entry is added in the System DB.
A new Job entry is also recorded in the k2system_jobs table with the following parameters:
- Name = the name of the Batch process.
- Type = BATCH PROCESS.
The WAITING_FOR_JOB status is then assigned to the Batch process.

After this, any available node, or any node whose affinity has been specified in the Batch command, handles the execution of the Job and of the subcommand as specified in the Batch command.

Once the corresponding Job begins, and is set to an IN_PROCESS stage, the Batch process undergoes the following stages:

NEW
GENERATE_IID
IN_PROGRESS
FAILED/CANCELLED/PAUSED/DONE

The below illustration shows how, once triggered from the command line, an asynchronous batch process is automatically encapsulated into a Job process.

The Job Process then launches the batch command, which in turn, is executed through its lifecycle phases.

Scheduling Batch Processes

The following illustration describes the following steps:

Step 1

The user defines a job scheduled to run a specific batch command. Fabric assigns a job process for this batch command.

Step 2

Step 3

The Jobs table is updated for the next run, and the dedicated job will wait for the next instance of the scheduled batch process.

Batch Process Table in System DB

All batch-related information is displayed in the k2batchprocess keyspace in the batchprocess_list table.

Example

cassandra@cqlsh:k2batchprocess> select * from batchprocess_list;

Field	Value	Description
bid	35408af6-b26a-4243-bc95-f114335bfa5e	Unique ID generated by the system upon batch execution.
creation_time	2020-08-12 12:20:07.894000+0000	Timestamp of when the Batch process was created.
end_time	2020-08-12 12:20:09.176000+0000	Timestamp of when the Batch process ended.
lut_name	AUTODATA_DELTA	Name of the LUT recieving the instances.
start_time	2020-08-12 12:20:07.907000+0000	Timestamp of when the Batch process started.
status	Done	Stage of the Batch process.
total_entities	200	Number of entities processed.

Additional fields featured in the table:

Command

BATCH AUTODATA_DELTA FROM idsFile USING ('select id from ids  limit 100') FABRIC_COMMAND="sync_instance AUTODATA_DELTA.?" with JOB_AFFINITY='10.21.2.102' ASYNC='true';

In this case, the command describes a synchronization process of a list of IDs with affinity set to Node: 10.21.2.102

extra_stats

This field shows the slowest-processed entities, along with their IDs, processing time, status, and field changes:

{"slowestProcessed":[{"entityId":"4","processTimeMS":572,"status":"COMPLETED","result":"{\"Added\":1,\"Updated\":0,\"Unchanged\":0}"},{"entityId":"5","processTimeMS":573,"status":"COMPLETED","result":"{\"Added\":1,\"Updated\":0,\"Unchanged\":0}"},{"entityId":"47","processTimeMS":645,"status":"COMPLETED","result":"{\"Added\":1,\"Updated\":0,\"Unchanged\":0}"}

Batch Process Execution and Resiliency

This handover mechanism uses the hearbeats and keepalive parameters defined within the node.id file.

The next handling node picks up the batch process (via its associated job) and resumes its execution from the latest known recorded stage.

Each Fabric node uses its Fabric built-in BatchProcessAPI and Job Manager classes to manage the Batch process through its different lifecycle stages, as defined in the above illustrations.

How Does Fabric Handle the Migration Process?

When a migration process is initiated, it is treated as a batch of multiple entities synchronization processes.

The illustration below shows the sequence of actions involved in this process.

Step 1

The batch command (or migrate) is executed from a Fabric node. This node (Node 1) will assume the role of Coordinator throughout this process.
A job process for this batch command starts.

Step 2

The node responsible for the overall execution of the migration process is selected in the Fabric cluster as per the nodes' allocation rules described in the Affinity article.
This node (Node 3) is referred to as the Job Owner node.
The Job Owner node initiates the migration's statistic collection process.

Step 3

The Job Owner node (Node 3) generates a list of iiDs to migrate from the External Source systems. In our example, the iiDs are X1, X2, X3, X4 and X5, referred to as iiDX1, iiDX2, iiDX3, iiDX4, iiDX5.

Step 4

Step 5

Each node syncs the instances that have been allocated from the External Source systems.
N3 collects statistical information on each node and entity synchronization. The collected information is written into the System DB (e.g., Cassandra).

Step 6

Each node writes iiDs into the System DB (e.g., Cassandra).