Starting from V7.1, Fabric separates the data generation (manufacturing) of synthetic data from the hashing and caching capabilities. The data generation Actors can be used to either generate synthetic entities or mask sensitive data. Broadway provides various data generation Actors under the generators category to generate a random synthetic value. For example: RandomString, RandomNumber, Sequence…
A data generator Actor can be executed by either the Broadway flow ('as is') for generating new data or the Masking Actor for caching the generated data.
This Actor generates a random string that matches the input regular expression.
The regex input argument can get any regular expression.
Examples:
This Actor generates random values according to input distribution settings. The supported distribution types are normal, uniform, weighted and constant (returns one value).
The distribution parameters are set based on the selected distribution type:
Normal distribution (gaussian) works using mean and stddev (standard deviation), and can be bound by minimum and maximum values, both inclusive.
Uniform distribution returns a random value between the minimum and maximum values.
Weighted distribution returns a value from the list, based on the value's weight. For example, 30% of the generated customers are based in Miami, 20% in LA and 50% in NY. Weighted distribution uses a 'weights' map, where the keys are the results and the values are positive numbers indicating the entry's weight as a porportion of the whole list.
See example:
Fabric 8.1 has added the option to set the values in the list based on a selected MTable. This option is available for a weighted distribution of string values. Do the following in order to define a weighted distribution based on an MTable:
Costant distribution returns the populated constant value. For example: set the number of generated addresses to 1 address per customer.
This Actor returns a random value from the input collection.
This Actor generates a fake but valid credit card number based on the input value and prefixLength input arguments:
Example:
This Actor generates a random number in a given range. The precision of the number can be set in the precision input argument. Note that a random decimal number can also be generated using the RandomDistribution Actor.
This Actor generates a random String with a specified length. The String's length is set based on the minLength and maxLength input arguments. Note that a random String can also be generated using the RandomRegexGenerator and RandomDistribution Actors.
This sequence implements a unique sequential number.
Click here for more information about the sequence implementation.
This Actor generates a random UUID.
This Actor has been added to support a generation of synthetic data into the LU table and is a framework for generating random rows given a set of parent rows, a distribution and an inner flow. It relies on the inner flow to generate the actual rows data.
This Actor is invoked by the SourceDbQuery Actor in the LU population flow. The SourceDbQuery Actor checks the ROWS_GENERATOR key:
For every parent row, the RowsGenerator Actor calls the data generation inner flow a random number of times, according to the given distribution.
The following values are passed to the inner flow:
total - the total number of rows for the current parent row.
count - the current iteration within the current parent row, starting at 0.
parent_row - the current parent row.
parent_rows - the remaining parent rows, including the current parent_row. Reading rows from this container means they will not be available to the actor.
There are several options to develop the inner flow:
Example:
A customer has 2 activities. The data generation inner flow needs to generate 3 case records for each activity.
Row by row mode: the data generation inner flow is called 6 times (2*3) to generate the cases for the customer. It generates one case record on each call.
Rows per parent mode: the data generation inner flow is called 2 times (there are 2 parent activities) - each call is set with a different parent activity ID and it generates 3 cases on each call.
Handle all parents rows mode: the data generation inner flow is called once for the customer and generates 6 case records (2*3) for the customer: 3 case records for each parent activity ID.
Starting from V7.1, Fabric separates the data generation (manufacturing) of synthetic data from the hashing and caching capabilities. The data generation Actors can be used to either generate synthetic entities or mask sensitive data. Broadway provides various data generation Actors under the generators category to generate a random synthetic value. For example: RandomString, RandomNumber, Sequence…
A data generator Actor can be executed by either the Broadway flow ('as is') for generating new data or the Masking Actor for caching the generated data.
This Actor generates a random string that matches the input regular expression.
The regex input argument can get any regular expression.
Examples:
This Actor generates random values according to input distribution settings. The supported distribution types are normal, uniform, weighted and constant (returns one value).
The distribution parameters are set based on the selected distribution type:
Normal distribution (gaussian) works using mean and stddev (standard deviation), and can be bound by minimum and maximum values, both inclusive.
Uniform distribution returns a random value between the minimum and maximum values.
Weighted distribution returns a value from the list, based on the value's weight. For example, 30% of the generated customers are based in Miami, 20% in LA and 50% in NY. Weighted distribution uses a 'weights' map, where the keys are the results and the values are positive numbers indicating the entry's weight as a porportion of the whole list.
See example:
Fabric 8.1 has added the option to set the values in the list based on a selected MTable. This option is available for a weighted distribution of string values. Do the following in order to define a weighted distribution based on an MTable:
Costant distribution returns the populated constant value. For example: set the number of generated addresses to 1 address per customer.
This Actor returns a random value from the input collection.
This Actor generates a fake but valid credit card number based on the input value and prefixLength input arguments:
Example:
This Actor generates a random number in a given range. The precision of the number can be set in the precision input argument. Note that a random decimal number can also be generated using the RandomDistribution Actor.
This Actor generates a random String with a specified length. The String's length is set based on the minLength and maxLength input arguments. Note that a random String can also be generated using the RandomRegexGenerator and RandomDistribution Actors.
This sequence implements a unique sequential number.
Click here for more information about the sequence implementation.
This Actor generates a random UUID.
This Actor has been added to support a generation of synthetic data into the LU table and is a framework for generating random rows given a set of parent rows, a distribution and an inner flow. It relies on the inner flow to generate the actual rows data.
This Actor is invoked by the SourceDbQuery Actor in the LU population flow. The SourceDbQuery Actor checks the ROWS_GENERATOR key:
For every parent row, the RowsGenerator Actor calls the data generation inner flow a random number of times, according to the given distribution.
The following values are passed to the inner flow:
total - the total number of rows for the current parent row.
count - the current iteration within the current parent row, starting at 0.
parent_row - the current parent row.
parent_rows - the remaining parent rows, including the current parent_row. Reading rows from this container means they will not be available to the actor.
There are several options to develop the inner flow:
Example:
A customer has 2 activities. The data generation inner flow needs to generate 3 case records for each activity.
Row by row mode: the data generation inner flow is called 6 times (2*3) to generate the cases for the customer. It generates one case record on each call.
Rows per parent mode: the data generation inner flow is called 2 times (there are 2 parent activities) - each call is set with a different parent activity ID and it generates 3 cases on each call.
Handle all parents rows mode: the data generation inner flow is called once for the customer and generates 6 case records (2*3) for the customer: 3 case records for each parent activity ID.