This article describes plugins that analyze source systems and calculate various metrics. The analysis is done based on data snapshots.
The plugins are:
All of the above plugins are inactive by default and must be activated through the Discovery Pipeline if needed.
Some data sources may contain a large number of empty tables that are irrelevant for the Catalog and for further LU creation.
Accordingly, the purpose of this plugin is to improve the Catalog usability as well as the LU development process. When activated, the plugin automatically discards all empty tables during the Discovery job, writing a message in the Fabric log (one message per schema):
"<num> empty datasets were removed from schema <schema name>"
The Catalog schema is then created without the discarded tables.
This plugin scans the data of the data sample in order to calculate various data quality metrics. These metrics can then be used for masking and synthetic data generation.
The purpose of this plugin is to identify fields with a limited number of distinct values within a data sample. It saves these values into a dedicated MTable, enabling their use in data masking and synthetic data generation.
Starting with Fabric V8.4, when running discovery on JSON Schema files (using the File Cataloging solution), the plugin identifies fields with enum property and extracts those values directly from the schema rather than a data sample.
To identify fields with limited distinct values, consider these guidelines:
Additional rules apply based on the plugin input parameters, as explained further in this article.
Once a field is identified as an Option Set, the property optionSet = true is created for it. Starting from Fabric V8.3.1, the classification = OPTION_SET is assigned to this field, unless it has already been classified. In the Catalog Settings, the OPTION_SET classification is mapped to the RandomOptionSet Actor for masking and synthetic data generation. The actor randomly selects a value from the catalog_field_option_set MTable, based on the input data platform, schema, dataset, class and field.
A separate MTable is generated for each data platform and schema to store the distinct values (and their distribution) identified by the plugin in a field. The MTable name format is:
catalog_field_option_set___<dataPlatform>_<schema>.csv, (containing three underscores before the data platform name).
The below image is an example of such MTable:

The plugin's input parameters are described below:
propertyName is a column's property that should be created by the plugin on a field if the plugin returns true. By default, the property is named optionSet.
absoluteThreshold defines the maximum number of distinct values allowed. If the proportion of distinct values in a sample exceeds the plugin’s threshold (0.05), the plugin checks the count against this absolute value (15). For example:
fieldTypeIncludeList controls which field data types should be analyzed.
fieldTypeIncludeList is set to STRING and INTEGER. fieldNameIncludeList is an override list of field names to be included in the plugin's validation algorithm, even if they are identified as PII or belong to a small table (see the minSampleSize property).
fieldNameExcludeList is an override list of field names to be excluded from the plugin's validation algorithm.
incrementalMode (introduced in Fabric V8.3.1) defines how the plugin handles fields analyzed in previous Discovery Job executions. It has the following modes:
"Keep All" (default) — the plugin will not analyze the field that have already been analyzed in a previous Discovery Job execution (even if the field does not have the 'Option Set' property). The plugin will only analyze new fields."Keep Existing" — the plugin will not analyze the field that have already been analyzed in a previous Discovery Job execution and an 'Option Set' property was created for it. The plugin will analyze new fields and the existing fields that do not have this property."Evaluate All" — the plugin will analyze all fields.maxStringLength sets a limit on STRING size to prevent handling text files or complex structures within a field. The default value is 512 bytes.
minSampleSize allows to skip small tables by defining the minimum sample size required to determine whether a field qualifies as an Option Set. The default value is 100.
This article describes plugins that analyze source systems and calculate various metrics. The analysis is done based on data snapshots.
The plugins are:
All of the above plugins are inactive by default and must be activated through the Discovery Pipeline if needed.
Some data sources may contain a large number of empty tables that are irrelevant for the Catalog and for further LU creation.
Accordingly, the purpose of this plugin is to improve the Catalog usability as well as the LU development process. When activated, the plugin automatically discards all empty tables during the Discovery job, writing a message in the Fabric log (one message per schema):
"<num> empty datasets were removed from schema <schema name>"
The Catalog schema is then created without the discarded tables.
This plugin scans the data of the data sample in order to calculate various data quality metrics. These metrics can then be used for masking and synthetic data generation.
The purpose of this plugin is to identify fields with a limited number of distinct values within a data sample. It saves these values into a dedicated MTable, enabling their use in data masking and synthetic data generation.
Starting with Fabric V8.4, when running discovery on JSON Schema files (using the File Cataloging solution), the plugin identifies fields with enum property and extracts those values directly from the schema rather than a data sample.
To identify fields with limited distinct values, consider these guidelines:
Additional rules apply based on the plugin input parameters, as explained further in this article.
Once a field is identified as an Option Set, the property optionSet = true is created for it. Starting from Fabric V8.3.1, the classification = OPTION_SET is assigned to this field, unless it has already been classified. In the Catalog Settings, the OPTION_SET classification is mapped to the RandomOptionSet Actor for masking and synthetic data generation. The actor randomly selects a value from the catalog_field_option_set MTable, based on the input data platform, schema, dataset, class and field.
A separate MTable is generated for each data platform and schema to store the distinct values (and their distribution) identified by the plugin in a field. The MTable name format is:
catalog_field_option_set___<dataPlatform>_<schema>.csv, (containing three underscores before the data platform name).
The below image is an example of such MTable:

The plugin's input parameters are described below:
propertyName is a column's property that should be created by the plugin on a field if the plugin returns true. By default, the property is named optionSet.
absoluteThreshold defines the maximum number of distinct values allowed. If the proportion of distinct values in a sample exceeds the plugin’s threshold (0.05), the plugin checks the count against this absolute value (15). For example:
fieldTypeIncludeList controls which field data types should be analyzed.
fieldTypeIncludeList is set to STRING and INTEGER. fieldNameIncludeList is an override list of field names to be included in the plugin's validation algorithm, even if they are identified as PII or belong to a small table (see the minSampleSize property).
fieldNameExcludeList is an override list of field names to be excluded from the plugin's validation algorithm.
incrementalMode (introduced in Fabric V8.3.1) defines how the plugin handles fields analyzed in previous Discovery Job executions. It has the following modes:
"Keep All" (default) — the plugin will not analyze the field that have already been analyzed in a previous Discovery Job execution (even if the field does not have the 'Option Set' property). The plugin will only analyze new fields."Keep Existing" — the plugin will not analyze the field that have already been analyzed in a previous Discovery Job execution and an 'Option Set' property was created for it. The plugin will analyze new fields and the existing fields that do not have this property."Evaluate All" — the plugin will analyze all fields.maxStringLength sets a limit on STRING size to prevent handling text files or complex structures within a field. The default value is 512 bytes.
minSampleSize allows to skip small tables by defining the minimum sample size required to determine whether a field qualifies as an Option Set. The default value is 100.