This article describes plugins that analyze source systems and calculate various metrics. The analysis is done based on data snapshots.
The plugins are:
All of the above plugins are inactive by default and must be activated through the Discovery Pipeline if needed.
Some data sources may contain a large number of empty tables that are irrelevant for the Catalog and for further LU creation.
This plugin improves both Catalog usability and the LU development process. When activated, the plugin automatically discards all empty tables during the Discovery job, writing a message in the Fabric log (one message per schema):
"<num> empty datasets were removed from schema <schema name>"
The Catalog schema is then created without the discarded tables.
This plugin scans the data sample to calculate various data quality metrics. These metrics can then be used for masking and synthetic data generation.
The purpose of this plugin is to identify fields with a limited number of distinct values within a data sample. It saves these values into a dedicated MTable, enabling their use in data masking and synthetic data generation.
Starting with Fabric V8.4, when running discovery on JSON Schema files (using the File Cataloging solution), the plugin identifies fields with an enum property and extracts those values directly from the schema rather than a data sample.
To identify fields with limited distinct values, consider these guidelines:
Additional rules apply based on the plugin input parameters, as explained further in this article.
Once a field is identified as an Option Set, the property optionSet = true is created for it. Starting from Fabric V8.3.1, the classification = OPTION_SET is assigned to this field, unless it has already been classified. In the Catalog Settings, the OPTION_SET classification is mapped to the RandomOptionSet Actor for masking and synthetic data generation. The actor randomly selects a value from the catalog_field_option_set MTable, based on the input data platform, schema, dataset, class and field.
A separate MTable is generated for each data platform and schema to store the distinct values (and their distribution) identified by the plugin in a field. The MTable name format is:
catalog_field_option_set___<dataPlatform>_<schema>.csv (containing three underscores before the data platform name).
The image below shows an example of such an MTable:

The plugin's input parameters are described below:
propertyName is the field property that the plugin creates when the field qualifies as an Option Set. By default, the property is named optionSet.
absoluteThreshold defines the maximum number of distinct values allowed. If the proportion of distinct values in a sample exceeds the plugin’s threshold (0.05), the plugin checks the count against this absolute value (15). For example:
fieldTypeIncludeList controls which field data types should be analyzed.
fieldTypeIncludeList is set to STRING and INTEGER. fieldNameIncludeList is an override list of field names to be included in the plugin's validation algorithm, even if they are identified as PII or belong to a small table (see the minSampleSize property).
fieldNameExcludeList is an override list of field names to be excluded from the plugin's validation algorithm.
incrementalMode (introduced in Fabric V8.3.1) defines how the plugin handles fields analyzed in previous Discovery Job executions. It has the following modes:
"Keep All" (default) — the plugin will not analyze the fields that have already been analyzed in a previous Discovery Job execution (even if the field does not have the 'Option Set' property). The plugin will only analyze new fields."Keep Existing" — the plugin will not analyze the fields that have already been analyzed in a previous Discovery Job execution and an 'Option Set' property was created for it. The plugin will analyze new fields and the existing fields that do not have this property."Evaluate All" — the plugin will analyze all fields.maxStringLength sets a limit on STRING size to prevent handling text files or complex structures within a field. The default value is 512 bytes.
minSampleSize defines the minimum sample size for performing the plugin's algorithm to determine whether a field qualifies as an Option Set. Datasets with a smaller sample size are skipped. The default value is 100.
This article describes plugins that analyze source systems and calculate various metrics. The analysis is done based on data snapshots.
The plugins are:
All of the above plugins are inactive by default and must be activated through the Discovery Pipeline if needed.
Some data sources may contain a large number of empty tables that are irrelevant for the Catalog and for further LU creation.
This plugin improves both Catalog usability and the LU development process. When activated, the plugin automatically discards all empty tables during the Discovery job, writing a message in the Fabric log (one message per schema):
"<num> empty datasets were removed from schema <schema name>"
The Catalog schema is then created without the discarded tables.
This plugin scans the data sample to calculate various data quality metrics. These metrics can then be used for masking and synthetic data generation.
The purpose of this plugin is to identify fields with a limited number of distinct values within a data sample. It saves these values into a dedicated MTable, enabling their use in data masking and synthetic data generation.
Starting with Fabric V8.4, when running discovery on JSON Schema files (using the File Cataloging solution), the plugin identifies fields with an enum property and extracts those values directly from the schema rather than a data sample.
To identify fields with limited distinct values, consider these guidelines:
Additional rules apply based on the plugin input parameters, as explained further in this article.
Once a field is identified as an Option Set, the property optionSet = true is created for it. Starting from Fabric V8.3.1, the classification = OPTION_SET is assigned to this field, unless it has already been classified. In the Catalog Settings, the OPTION_SET classification is mapped to the RandomOptionSet Actor for masking and synthetic data generation. The actor randomly selects a value from the catalog_field_option_set MTable, based on the input data platform, schema, dataset, class and field.
A separate MTable is generated for each data platform and schema to store the distinct values (and their distribution) identified by the plugin in a field. The MTable name format is:
catalog_field_option_set___<dataPlatform>_<schema>.csv (containing three underscores before the data platform name).
The image below shows an example of such an MTable:

The plugin's input parameters are described below:
propertyName is the field property that the plugin creates when the field qualifies as an Option Set. By default, the property is named optionSet.
absoluteThreshold defines the maximum number of distinct values allowed. If the proportion of distinct values in a sample exceeds the plugin’s threshold (0.05), the plugin checks the count against this absolute value (15). For example:
fieldTypeIncludeList controls which field data types should be analyzed.
fieldTypeIncludeList is set to STRING and INTEGER. fieldNameIncludeList is an override list of field names to be included in the plugin's validation algorithm, even if they are identified as PII or belong to a small table (see the minSampleSize property).
fieldNameExcludeList is an override list of field names to be excluded from the plugin's validation algorithm.
incrementalMode (introduced in Fabric V8.3.1) defines how the plugin handles fields analyzed in previous Discovery Job executions. It has the following modes:
"Keep All" (default) — the plugin will not analyze the fields that have already been analyzed in a previous Discovery Job execution (even if the field does not have the 'Option Set' property). The plugin will only analyze new fields."Keep Existing" — the plugin will not analyze the fields that have already been analyzed in a previous Discovery Job execution and an 'Option Set' property was created for it. The plugin will analyze new fields and the existing fields that do not have this property."Evaluate All" — the plugin will analyze all fields.maxStringLength sets a limit on STRING size to prevent handling text files or complex structures within a field. The default value is 512 bytes.
minSampleSize defines the minimum sample size for performing the plugin's algorithm to determine whether a field qualifies as an Option Set. Datasets with a smaller sample size are skipped. The default value is 100.