Data Management Systems, like TDM, often handle sensitive data. To be compliant with Data Protection and Privacy laws, Fabric provides a masking category of Actors that can mask sensitive fields like SSN, credit card numbers, and email addresses before they are loaded into a target DB.
The masking process contains the generation (manufacturing) of a random synthetic value that replaces the real value, and the caching of both the hashed original value and the masked value in order to keep the referential integrity of the data. Starting from V7.1, Fabric separates data generation (manufacturing) of synthetic data from the hashing and caching capabilities. Broadway provides the following Actors:
Note that if there is a need to mask data before they are loaded to Fabric, masking Actors can be used in Broadway population flows.
Another important functionality for systems that need to frequently load data to target DBs is the ability to generate and populate a unique sequence ID: the MaskingSequence and the Sequence Actors generate a unique sequence ID based on the provided input arguments.
Click for more information about the data generation Actors.
Click for more information about TDM.
Common input arguments of masking Actors are:
maskingId - a unique masking identifier, used for generating a target value; populated by a String. To use the same masking Actor in different flows of the same project, use this parameter to refer to the same masking cache. By default, the masking's specific ID is used across different DCs.
flowName - the name of the flow or Actor to be executed in order to obtain a masked value. This parameter has been added to the Masking Actor - for enabling the execution of the data generation flow or Actor - which generates a fake value.
category - this parameter has been added by Fabric 6.5.3 and it indicates when the masking Actor needs to generate a new value, e.g., when masking sensitive data or replacing the ID (sequence). The following values can be set in the category:
By default, the category is set to enable_masking on all masking Actors except for the MaskingSequence Actor, in which case the default category is set to enable_sequences.
The masking Actor inspects the value of the session level key, set in the category:
Note that TDM implementation sets the enable_masking and enable_sequences session-level keys to either true or false, based on the TDM task's attributes. For example, the MaskingSequence Actor generates a new ID value when the task replaces the sequences of the copied entities, or else, the original ID value is returned.
useEnvironment - indicates whether to separate the masked value per environment. When set to true, it generates a new masked value in each environment. When set to false, the same masked value is used across all environments.
useExecutionId - indicates whether to use the Execution ID during the flow run whereby the Execution ID is a unique string generated each time the flow is run. When set to true, it generates a new masked value in each execution. When set to false, the same masked value is used across different executions.
useInstanceId - indicates whether to use the instance ID as part of the masking cache. If it is set to true, the instance ID is added to the masking cache. When set to false, the masked value is used across entities. Note that from Fabric 7.1 onwards, if this parameter is set to true, Fabric gets the instance ID value from the root_iid key, if it is set. If the root_iid key is not set, it gets the current LUI instance. The root_iid key enables the maintenance of the referential integrity on PII fields across different LUs that logically belong to each other. For example, CRM and Billing LUs keep the Customer's data. The customer name needs to be identical in both LUs for a given customer. Setting the root_iid with the customer ID enables keeping the referential integrity between the CRM and Billing LUs.
hashedInputValue - indicates whether to store the original or the hashed input value. By default, the hashed value is stored. When set to false, it disables the caching and stores the original value.
interface - the interface to be used to cache the masked values. This interface may be either any SQL DB interface defined in Fabric or the Fabric server memory.
verifyUnique - determines whether different input values can be masked with the same masked value. The uniqueness is checked per original value (masked value) and maskingId. The uniqueness is also checked per environment where the useEnvironment is set to true, and per execution id where the useExecutionId is set to true. Set this parameter to true if the masked value should be unique, as in the case of masking an SSN. Notes:
TTL - time in seconds to keep the masked values in the cache table. The default value is 86400 seconds (24 hours). If this parameter is 0, the masked value will not be deleted from the cache table. Note that the TTL is supported only when creating the k2masking keyspace in Cassandra or populating the interface parameter with IN-MEMORY value.
onEmpty - determines what to do with the input value when it is either an empty string or NULL:
Note: The MaskingSequence Actor has specific arguments. Click here for more information.
The below are specific input arguments for the MaskingSequence Actor:
The following example shows how to mask an Address description and a ZIP Code using 2 masking Actors in the population flow.
The same masking rule can be implemented in several flows of the same project. For example, if the ZIP Code is populated in several LU tables in Fabric, use the same Actor in the flows and specify the same Masking ID.
The purpose of the MaskingSequence Actor is to enable the implementation of a sequence's solution when creating Broadway flows that load data into a target DB.
The following example shows how to use a MaskingSequence Actor to generate a new sequence for a Customer ID instead of the original value:
Data Management Systems, like TDM, often handle sensitive data. To be compliant with Data Protection and Privacy laws, Fabric provides a masking category of Actors that can mask sensitive fields like SSN, credit card numbers, and email addresses before they are loaded into a target DB.
The masking process contains the generation (manufacturing) of a random synthetic value that replaces the real value, and the caching of both the hashed original value and the masked value in order to keep the referential integrity of the data. Starting from V7.1, Fabric separates data generation (manufacturing) of synthetic data from the hashing and caching capabilities. Broadway provides the following Actors:
Note that if there is a need to mask data before they are loaded to Fabric, masking Actors can be used in Broadway population flows.
Another important functionality for systems that need to frequently load data to target DBs is the ability to generate and populate a unique sequence ID: the MaskingSequence and the Sequence Actors generate a unique sequence ID based on the provided input arguments.
Click for more information about the data generation Actors.
Click for more information about TDM.
Common input arguments of masking Actors are:
maskingId - a unique masking identifier, used for generating a target value; populated by a String. To use the same masking Actor in different flows of the same project, use this parameter to refer to the same masking cache. By default, the masking's specific ID is used across different DCs.
flowName - the name of the flow or Actor to be executed in order to obtain a masked value. This parameter has been added to the Masking Actor - for enabling the execution of the data generation flow or Actor - which generates a fake value.
category - this parameter has been added by Fabric 6.5.3 and it indicates when the masking Actor needs to generate a new value, e.g., when masking sensitive data or replacing the ID (sequence). The following values can be set in the category:
By default, the category is set to enable_masking on all masking Actors except for the MaskingSequence Actor, in which case the default category is set to enable_sequences.
The masking Actor inspects the value of the session level key, set in the category:
Note that TDM implementation sets the enable_masking and enable_sequences session-level keys to either true or false, based on the TDM task's attributes. For example, the MaskingSequence Actor generates a new ID value when the task replaces the sequences of the copied entities, or else, the original ID value is returned.
useEnvironment - indicates whether to separate the masked value per environment. When set to true, it generates a new masked value in each environment. When set to false, the same masked value is used across all environments.
useExecutionId - indicates whether to use the Execution ID during the flow run whereby the Execution ID is a unique string generated each time the flow is run. When set to true, it generates a new masked value in each execution. When set to false, the same masked value is used across different executions.
useInstanceId - indicates whether to use the instance ID as part of the masking cache. If it is set to true, the instance ID is added to the masking cache. When set to false, the masked value is used across entities. Note that from Fabric 7.1 onwards, if this parameter is set to true, Fabric gets the instance ID value from the root_iid key, if it is set. If the root_iid key is not set, it gets the current LUI instance. The root_iid key enables the maintenance of the referential integrity on PII fields across different LUs that logically belong to each other. For example, CRM and Billing LUs keep the Customer's data. The customer name needs to be identical in both LUs for a given customer. Setting the root_iid with the customer ID enables keeping the referential integrity between the CRM and Billing LUs.
hashedInputValue - indicates whether to store the original or the hashed input value. By default, the hashed value is stored. When set to false, it disables the caching and stores the original value.
interface - the interface to be used to cache the masked values. This interface may be either any SQL DB interface defined in Fabric or the Fabric server memory.
verifyUnique - determines whether different input values can be masked with the same masked value. The uniqueness is checked per original value (masked value) and maskingId. The uniqueness is also checked per environment where the useEnvironment is set to true, and per execution id where the useExecutionId is set to true. Set this parameter to true if the masked value should be unique, as in the case of masking an SSN. Notes:
TTL - time in seconds to keep the masked values in the cache table. The default value is 86400 seconds (24 hours). If this parameter is 0, the masked value will not be deleted from the cache table. Note that the TTL is supported only when creating the k2masking keyspace in Cassandra or populating the interface parameter with IN-MEMORY value.
onEmpty - determines what to do with the input value when it is either an empty string or NULL:
Note: The MaskingSequence Actor has specific arguments. Click here for more information.
The below are specific input arguments for the MaskingSequence Actor:
The following example shows how to mask an Address description and a ZIP Code using 2 masking Actors in the population flow.
The same masking rule can be implemented in several flows of the same project. For example, if the ZIP Code is populated in several LU tables in Fabric, use the same Actor in the flows and specify the same Masking ID.
The purpose of the MaskingSequence Actor is to enable the implementation of a sequence's solution when creating Broadway flows that load data into a target DB.
The following example shows how to use a MaskingSequence Actor to generate a new sequence for a Customer ID instead of the original value: