Starting from V7.1, Fabric separates the data generation of the masked value from the hashing and caching parts. Data generation Actors can be used to either generate synthetic entities (rule-based generation) or mask sensitive data. Broadway provides various built-in data generation Actors (under the generators category) - e.g., RandomString, RandomNumber, Sequence - to generate a random synthetic values.
A data generator Actor can be either executed by the Broadway flow ('as is') for generating new data, invoked by the Masking Actor for caching the generated data or activated by the Catalog masking mechanism.
This Actor generates a random string that matches the input regular expression.
The regex input argument can get any regular expression.
Examples:
This Actor generates random values according to input distribution settings. The supported distribution types are normal, uniform, weighted and constant (returns one value).
The distribution parameters are set based on the selected distribution type:
Normal distribution (gaussian) works using mean and stddev (standard deviation), and can be bound by minimum and maximum values, both inclusive.
Uniform distribution returns a random value between the minimum and maximum values.
Weighted distribution returns a value from the list, based on the value's weight. For example, 30% of the generated customers are based in Miami, 20% in LA and 50% in NY. Weighted distribution uses a 'weights' map, where the keys are the results and the values are positive numbers indicating the entry's weight as a porportion of the whole list.
See example:
Fabric 8.1 has added the option to set the values in the list based on a selected MTable. This option is available for a weighted distribution of string values. Do the following in order to define a weighted distribution based on an MTable:
Costant distribution returns the populated constant value. For example: set the number of generated addresses to 1 address per customer.
This Actor returns a random value from the input collection.
This Actor generates a fake but valid credit card number based on the input value and prefixLength input arguments:
Example:
This Actor generates a random number in a given range. The precision of the number can be set in the precision input argument. Note that a random decimal number can also be generated using the RandomDistribution Actor.
This Actor generates a random String with a specified length. The String's length is set based on the minLength and maxLength input arguments. Note that a random String can also be generated using the RandomRegexGenerator and RandomDistribution Actors.
This sequence implements a unique sequential number.
Click here for more information about the sequence implementation.
This Actor generates a random UUID.
This Actor has been added to support a generation of synthetic data into the LU table and is a framework for generating random rows given a set of parent rows, a distribution and an inner flow. It relies on the inner flow to generate the actual rows data.
This Actor is invoked by the SourceDbQuery Actor in the LU population flow. The SourceDbQuery Actor checks the ROWS_GENERATOR key:
For every parent row, the RowsGenerator Actor calls the data generation inner flow a random number of times, according to the given distribution.
The following values are passed to the inner flow:
total - the total number of rows for the current parent row.
count - the current iteration within the current parent row, starting at 0.
parent_row - the current parent row.
parent_rows - the remaining parent rows, including the current parent_row. Reading rows from this container means they will not be available to the actor.
There are several options to develop the inner flow:
Example:
A customer has 2 activities. The data generation inner flow needs to generate 3 case records for each activity.
Row by row mode: the data generation inner flow is called 6 times (2*3) to generate the cases for the customer. It generates one case record on each call.
Rows per parent mode: the data generation inner flow is called 2 times (there are 2 parent activities) - each call is set with a different parent activity ID and it generates 3 cases on each call.
Handle all parent rows mode: the data generation inner flow is called once for the customer and generates 6 case records (2*3) for the customer: 3 case records for each parent activity ID.
Defining Broadway flows or Actors for customized data generation logic is possible.
Set the output generated value to be an external variable.
Add an external input named value to the data generator. This is needed since the Masking Actor always sends the input value (i.e. the original value) to the data generator. For example - a Masking Actor gets the original full address as an input value and calls a data generator in order to generate a new masked value based on an input State. The address data generator flow needs to get the value and state as input parameters. The Masking Actor will send both parameters to the data generator.
From Fabric 8.2 and onwards, the catalog masking can send the entire record to the data generator. The record is sent with the original values. This can be beneficial to enable data generation where the generated value of one field can be determined based on other fields within the same record. For example - generating an SSN based on the customer type.
Add an external variable, named record, to the flow in order to get the entire record from the Catalog masking.
Example: The following flow gets the original address record as an input and generates a masked city based on the original state:
The data generator must support the generating a random value using seed and must contain the seed external input parameter.
import com.k2view.broadway.actors.masking.random.AbstractRandomGeneratorActor;
import com.k2view.broadway.actors.masking.random.MaskingRandom;
import com.k2view.broadway.model.Data;
public class customGeneratorTest extends AbstractRandomGeneratorActor { ...
@Override
public Object generate(Data input, MaskingRandom maskingRandom) {
The new Actor must contain the seed as an input. The Masking actor sends the seed and original values to the data generator Actor.
Use the MaskingRandom methods in the generate method in order to get a consistent value based on the input seed.
The customized flow support a data consistency using seed by using the built-in product data generator Actors: set the data generator Actor's input seed parameter to be an external variable. The following example flow gets a random first name from a Collection. The RandomFromCollection Actor's seed input parameter must be set as external variable:
Starting from V7.1, Fabric separates the data generation of the masked value from the hashing and caching parts. Data generation Actors can be used to either generate synthetic entities (rule-based generation) or mask sensitive data. Broadway provides various built-in data generation Actors (under the generators category) - e.g., RandomString, RandomNumber, Sequence - to generate a random synthetic values.
A data generator Actor can be either executed by the Broadway flow ('as is') for generating new data, invoked by the Masking Actor for caching the generated data or activated by the Catalog masking mechanism.
This Actor generates a random string that matches the input regular expression.
The regex input argument can get any regular expression.
Examples:
This Actor generates random values according to input distribution settings. The supported distribution types are normal, uniform, weighted and constant (returns one value).
The distribution parameters are set based on the selected distribution type:
Normal distribution (gaussian) works using mean and stddev (standard deviation), and can be bound by minimum and maximum values, both inclusive.
Uniform distribution returns a random value between the minimum and maximum values.
Weighted distribution returns a value from the list, based on the value's weight. For example, 30% of the generated customers are based in Miami, 20% in LA and 50% in NY. Weighted distribution uses a 'weights' map, where the keys are the results and the values are positive numbers indicating the entry's weight as a porportion of the whole list.
See example:
Fabric 8.1 has added the option to set the values in the list based on a selected MTable. This option is available for a weighted distribution of string values. Do the following in order to define a weighted distribution based on an MTable:
Costant distribution returns the populated constant value. For example: set the number of generated addresses to 1 address per customer.
This Actor returns a random value from the input collection.
This Actor generates a fake but valid credit card number based on the input value and prefixLength input arguments:
Example:
This Actor generates a random number in a given range. The precision of the number can be set in the precision input argument. Note that a random decimal number can also be generated using the RandomDistribution Actor.
This Actor generates a random String with a specified length. The String's length is set based on the minLength and maxLength input arguments. Note that a random String can also be generated using the RandomRegexGenerator and RandomDistribution Actors.
This sequence implements a unique sequential number.
Click here for more information about the sequence implementation.
This Actor generates a random UUID.
This Actor has been added to support a generation of synthetic data into the LU table and is a framework for generating random rows given a set of parent rows, a distribution and an inner flow. It relies on the inner flow to generate the actual rows data.
This Actor is invoked by the SourceDbQuery Actor in the LU population flow. The SourceDbQuery Actor checks the ROWS_GENERATOR key:
For every parent row, the RowsGenerator Actor calls the data generation inner flow a random number of times, according to the given distribution.
The following values are passed to the inner flow:
total - the total number of rows for the current parent row.
count - the current iteration within the current parent row, starting at 0.
parent_row - the current parent row.
parent_rows - the remaining parent rows, including the current parent_row. Reading rows from this container means they will not be available to the actor.
There are several options to develop the inner flow:
Example:
A customer has 2 activities. The data generation inner flow needs to generate 3 case records for each activity.
Row by row mode: the data generation inner flow is called 6 times (2*3) to generate the cases for the customer. It generates one case record on each call.
Rows per parent mode: the data generation inner flow is called 2 times (there are 2 parent activities) - each call is set with a different parent activity ID and it generates 3 cases on each call.
Handle all parent rows mode: the data generation inner flow is called once for the customer and generates 6 case records (2*3) for the customer: 3 case records for each parent activity ID.
Defining Broadway flows or Actors for customized data generation logic is possible.
Set the output generated value to be an external variable.
Add an external input named value to the data generator. This is needed since the Masking Actor always sends the input value (i.e. the original value) to the data generator. For example - a Masking Actor gets the original full address as an input value and calls a data generator in order to generate a new masked value based on an input State. The address data generator flow needs to get the value and state as input parameters. The Masking Actor will send both parameters to the data generator.
From Fabric 8.2 and onwards, the catalog masking can send the entire record to the data generator. The record is sent with the original values. This can be beneficial to enable data generation where the generated value of one field can be determined based on other fields within the same record. For example - generating an SSN based on the customer type.
Add an external variable, named record, to the flow in order to get the entire record from the Catalog masking.
Example: The following flow gets the original address record as an input and generates a masked city based on the original state:
The data generator must support the generating a random value using seed and must contain the seed external input parameter.
import com.k2view.broadway.actors.masking.random.AbstractRandomGeneratorActor;
import com.k2view.broadway.actors.masking.random.MaskingRandom;
import com.k2view.broadway.model.Data;
public class customGeneratorTest extends AbstractRandomGeneratorActor { ...
@Override
public Object generate(Data input, MaskingRandom maskingRandom) {
The new Actor must contain the seed as an input. The Masking actor sends the seed and original values to the data generator Actor.
Use the MaskingRandom methods in the generate method in order to get a consistent value based on the input seed.
The customized flow support a data consistency using seed by using the built-in product data generator Actors: set the data generator Actor's input seed parameter to be an external variable. The following example flow gets a random first name from a Collection. The RandomFromCollection Actor's seed input parameter must be set as external variable: