This article describes plugins that analyze source systems and calculate various metrics. The analysis is done based on data snapshots.
The plugins are:
All of the above plugins are inactive by default and must be activated through the Discovery Pipeline if needed.
Some data sources may contain a large number of empty tables that are irrelevant for the Catalog and for further LU creation.
Accordingly, the purpose of this plugin is to improve the Catalog usability as well as the LU development process. When activated, the plugin automatically discards all empty tables during the Discovery job, writing a message in the Fabric log (one message per schema):
"<num> empty datasets were removed from schema <schema name>"
The Catalog schema is then created without the discarded tables.
This plugin scans the data of the data sample in order to calculate various data quality metrics. These metrics can then be used for masking and synthetic data generation.
The purpose of this plugin is to identify fields with a limited number of distinct values (in the data sample) and save these values in a dedicated MTable, enabling their use in masking and synthetic data generation.
Once a field is identified as an Option Set, the property optionSet = true is created for it. A separate MTable is generated for each data platform and schema to store the distinct values (and their distribution) identified by the plugin in a field. The MTable has the following format:
catalog_field_option_set___<dataPlatform>_<schema>.csv, (containing three underscores before the data platform name).
The below image is an example of such MTable:

Starting from Fabric V8.3.1, the OPTION_SET classification is assigned to this field, unless it has already been classified. In the Catalog Settings, the OPTION_SET classification is mapped to the RandomOptionSet Actor for masking and synthetic data generation. The actor randomly selects a value from the catalog_field_option_set MTable, based on the input data platform, schema, dataset, class and field.
To identify fields with limited distinct values, consider these guidelines:
Ensure the field is non-PII to comply with privacy regulations and avoid the exposure of sensitive data.
The number of distinct values should be below either a plugin’s threshold (e.g., 0.05) or the Absolute Threshold input parameter, which defaults to 15.
Additional rules apply based on the plugin input parameters, as explained below.
The property that will be created on a field if the plugin returns true. By default, the property is named optionSet.
This parameter defines the absolute threshold number of distinct values. If the relative number of distinct values per field, found in a data sample, exceeds the plugin’s threshold (0.05), it is then validated against the absolute threshold (15). For example:
The fieldTypeIncludeList plugin input parameter controls which field data types are considered when checking for distinct values.
By default, this parameter is set to either the STRING or INTEGER data type for this plugin. The valid values are STRING, INTEGER, REAL, DATETIME, DATE and BOOLEAN.
This parameter allows to set up an override list of field names. These fields will be included in the plugin's validation algorithm, even if they are identified as PII or belong to a small table (see the minSampleSize property).
This parameter allows to set up an override list of field names. These fields will be excluded from the plugin's validation algorithm.
The incrementalMode parameter is introduced in Fabric V8.3.1. It defines whether the Option Set Analyzer plugin should be executed for the fields that already have the same property created by this plugin in a previous Discovery Job execution. It has the following modes:
"Keep All" (default) — if the plugin has already been executed for this field in a previous Discovery Job execution, do not invoke the plugin again (even if the field does not have the 'Option Set' property). The plugin will only be invoked for new fields."Keep Existing" — if the plugin has already been executed for this field in a previous Discovery Job execution and created a property, do not invoke it again. The plugin will only be invoked for new fields and for the fields without this property."Evaluate All" — the plugin will be invoked for all fields.This parameter sets a limit on STRING size to prevent handling text files or complex structures within a field. The default value is 512 bytes.
This parameter allows to skip small tables by defining the minimum sample size required to determine whether a field qualifies as an Option Set. The default value is 100.
This article describes plugins that analyze source systems and calculate various metrics. The analysis is done based on data snapshots.
The plugins are:
All of the above plugins are inactive by default and must be activated through the Discovery Pipeline if needed.
Some data sources may contain a large number of empty tables that are irrelevant for the Catalog and for further LU creation.
Accordingly, the purpose of this plugin is to improve the Catalog usability as well as the LU development process. When activated, the plugin automatically discards all empty tables during the Discovery job, writing a message in the Fabric log (one message per schema):
"<num> empty datasets were removed from schema <schema name>"
The Catalog schema is then created without the discarded tables.
This plugin scans the data of the data sample in order to calculate various data quality metrics. These metrics can then be used for masking and synthetic data generation.
The purpose of this plugin is to identify fields with a limited number of distinct values (in the data sample) and save these values in a dedicated MTable, enabling their use in masking and synthetic data generation.
Once a field is identified as an Option Set, the property optionSet = true is created for it. A separate MTable is generated for each data platform and schema to store the distinct values (and their distribution) identified by the plugin in a field. The MTable has the following format:
catalog_field_option_set___<dataPlatform>_<schema>.csv, (containing three underscores before the data platform name).
The below image is an example of such MTable:

Starting from Fabric V8.3.1, the OPTION_SET classification is assigned to this field, unless it has already been classified. In the Catalog Settings, the OPTION_SET classification is mapped to the RandomOptionSet Actor for masking and synthetic data generation. The actor randomly selects a value from the catalog_field_option_set MTable, based on the input data platform, schema, dataset, class and field.
To identify fields with limited distinct values, consider these guidelines:
Ensure the field is non-PII to comply with privacy regulations and avoid the exposure of sensitive data.
The number of distinct values should be below either a plugin’s threshold (e.g., 0.05) or the Absolute Threshold input parameter, which defaults to 15.
Additional rules apply based on the plugin input parameters, as explained below.
The property that will be created on a field if the plugin returns true. By default, the property is named optionSet.
This parameter defines the absolute threshold number of distinct values. If the relative number of distinct values per field, found in a data sample, exceeds the plugin’s threshold (0.05), it is then validated against the absolute threshold (15). For example:
The fieldTypeIncludeList plugin input parameter controls which field data types are considered when checking for distinct values.
By default, this parameter is set to either the STRING or INTEGER data type for this plugin. The valid values are STRING, INTEGER, REAL, DATETIME, DATE and BOOLEAN.
This parameter allows to set up an override list of field names. These fields will be included in the plugin's validation algorithm, even if they are identified as PII or belong to a small table (see the minSampleSize property).
This parameter allows to set up an override list of field names. These fields will be excluded from the plugin's validation algorithm.
The incrementalMode parameter is introduced in Fabric V8.3.1. It defines whether the Option Set Analyzer plugin should be executed for the fields that already have the same property created by this plugin in a previous Discovery Job execution. It has the following modes:
"Keep All" (default) — if the plugin has already been executed for this field in a previous Discovery Job execution, do not invoke the plugin again (even if the field does not have the 'Option Set' property). The plugin will only be invoked for new fields."Keep Existing" — if the plugin has already been executed for this field in a previous Discovery Job execution and created a property, do not invoke it again. The plugin will only be invoked for new fields and for the fields without this property."Evaluate All" — the plugin will be invoked for all fields.This parameter sets a limit on STRING size to prevent handling text files or complex structures within a field. The default value is 512 bytes.
This parameter allows to skip small tables by defining the minimum sample size required to determine whether a field qualifies as an Option Set. The default value is 100.