Catalog Masking Mechanism

The purpose of the Catalog Masking mechanism is to perform masking, based on the Catalog's Classifications. This mechanism simplifies the masking implementation significantly, since it allows to base the masking logic on the Discovery results in the Catalog rather than to manually define it in each LU table population.

To apply the Catalog Masking mechanism on either a flow or a population, start with running the Discovery job and building the Catalog artifact. Then, create an LU and add either the CatalogMaskingMapper Actor or the CatalogMaskingRecord Actor to LU populations.

Catalog Masking Actors

The Catalog Masking mechanism introduces the following 3 actors:

  • CatalogMaskingMapper
  • CatalogMaskingRecord
  • CatalogMaskingField

Their purpose is to mask the values, based on the Catalog’s Classification and the masking rules definition. The object (or record) to be masked is set, using the 3 actor’s input parameters that identify it in the Catalog: dataPlatform, schema and dataset.

The Catalog-based masking logic is included in the CatalogMaskigField Actor, while the CatalogMaskingMapper and the CatalogMaskingRecord Actors serve as a wrapper - on a dataset level and a record level, respectively.

The CatalogMaskingMapper Actor receives a dataset object. It then iterates internally on each record, invoking the CatalogMaskingRecord Actor. The CatalogMaskingMapper Actor returns a dataset with the same structure it was received.

The CatalogMaskingRecord Actor receives a record, splits it internally into key-value pairs and invokes the CatalogMaskingField Actor for each pair. The CatalogMaskingRecord Actor returns an object with the same structure it was received.

The CatalogMaskingField Actor’s purpose is to mask a single field’s value, based on the Catalog’s Classification and the masking rules definition.

  • The actor starts by checking whether the field should be masked. The check is based on the field's PII and Masking columns in the Catalog artifact (catalog_field_info MTable). Click for more information about the Catalog artifact.
    • If both the PII is true and the Masking property does not equal to OFF, the field's value should be masked. (More details about the Masking property are explained further in this article).
  • Then, the actor retrieves the field's Classification from the catalog_field_info MTable and searches for the Generator in the catalog_classification_generators MTable. The Generator can be either one of the existing built-in actors (RandomSSN, RandomZipCode, etc.), a custom actor or a flow.
  • Finally, the actor internally invokes the Masking Actor, setting its parameters as follows:
    • The maskingId is set to the Classification.
    • The flowName is set to the Generator defined in the catalog_classification_generators MTable for this Classification.
    • If the given Generator includes parameters, they are also taken from the above MTable.

The Masking Property

While the Classification and PII properties are added to the Catalog nodes by the Classifier plugins, the Masking property should be added manually when it is required to control the masking mechanism of some fields.

The purpose of adding the Masking property to a field is to override the Classification level definitions. The Masking property indicates that the selected field requires special handling by the Catalog Masking mechanism. The Masking property can have one of the following valid values:

  • Consistent with table - the Catalog Masking Actors should produce a consistent value across the Catalog (that is, the same input will always return the same masked value).
  • Consistent with seed - the Catalog Masking Actors should produce a consistent value using seed.
  • Consistent & Unique - the Catalog Masking Actors should produce a consistent yet unique value across the Catalog.
  • Random - the Catalog Masking Actors should produce a random value, not consistent and not unique.
  • OFF - the Catalog Masking mechanism should not mask the field. This valid value is useful when custom masking logic is required. In such case, it is the implementor's responsibility to add the custom masking logic to the relevant LU population.

Previous

Catalog Masking Mechanism

The purpose of the Catalog Masking mechanism is to perform masking, based on the Catalog's Classifications. This mechanism simplifies the masking implementation significantly, since it allows to base the masking logic on the Discovery results in the Catalog rather than to manually define it in each LU table population.

To apply the Catalog Masking mechanism on either a flow or a population, start with running the Discovery job and building the Catalog artifact. Then, create an LU and add either the CatalogMaskingMapper Actor or the CatalogMaskingRecord Actor to LU populations.

Catalog Masking Actors

The Catalog Masking mechanism introduces the following 3 actors:

  • CatalogMaskingMapper
  • CatalogMaskingRecord
  • CatalogMaskingField

Their purpose is to mask the values, based on the Catalog’s Classification and the masking rules definition. The object (or record) to be masked is set, using the 3 actor’s input parameters that identify it in the Catalog: dataPlatform, schema and dataset.

The Catalog-based masking logic is included in the CatalogMaskigField Actor, while the CatalogMaskingMapper and the CatalogMaskingRecord Actors serve as a wrapper - on a dataset level and a record level, respectively.

The CatalogMaskingMapper Actor receives a dataset object. It then iterates internally on each record, invoking the CatalogMaskingRecord Actor. The CatalogMaskingMapper Actor returns a dataset with the same structure it was received.

The CatalogMaskingRecord Actor receives a record, splits it internally into key-value pairs and invokes the CatalogMaskingField Actor for each pair. The CatalogMaskingRecord Actor returns an object with the same structure it was received.

The CatalogMaskingField Actor’s purpose is to mask a single field’s value, based on the Catalog’s Classification and the masking rules definition.

  • The actor starts by checking whether the field should be masked. The check is based on the field's PII and Masking columns in the Catalog artifact (catalog_field_info MTable). Click for more information about the Catalog artifact.
    • If both the PII is true and the Masking property does not equal to OFF, the field's value should be masked. (More details about the Masking property are explained further in this article).
  • Then, the actor retrieves the field's Classification from the catalog_field_info MTable and searches for the Generator in the catalog_classification_generators MTable. The Generator can be either one of the existing built-in actors (RandomSSN, RandomZipCode, etc.), a custom actor or a flow.
  • Finally, the actor internally invokes the Masking Actor, setting its parameters as follows:
    • The maskingId is set to the Classification.
    • The flowName is set to the Generator defined in the catalog_classification_generators MTable for this Classification.
    • If the given Generator includes parameters, they are also taken from the above MTable.

The Masking Property

While the Classification and PII properties are added to the Catalog nodes by the Classifier plugins, the Masking property should be added manually when it is required to control the masking mechanism of some fields.

The purpose of adding the Masking property to a field is to override the Classification level definitions. The Masking property indicates that the selected field requires special handling by the Catalog Masking mechanism. The Masking property can have one of the following valid values:

  • Consistent with table - the Catalog Masking Actors should produce a consistent value across the Catalog (that is, the same input will always return the same masked value).
  • Consistent with seed - the Catalog Masking Actors should produce a consistent value using seed.
  • Consistent & Unique - the Catalog Masking Actors should produce a consistent yet unique value across the Catalog.
  • Random - the Catalog Masking Actors should produce a random value, not consistent and not unique.
  • OFF - the Catalog Masking mechanism should not mask the field. This valid value is useful when custom masking logic is required. In such case, it is the implementor's responsibility to add the custom masking logic to the relevant LU population.

Previous