The Discovery Pipeline screen in the Catalog Settings tab provides a full and comprehensive view of the Discovery job configuration. It displays the product's default baseline configuration (retrieved from the product's plugins.discovery file) and the project-level rules.
The Baseline rule includes a list of product built-in plugins with their input parameters, data snapshot sample size and more.
The Discovery Pipeline screen enables performing the following actions, described further in this article:
The overrides are saved into the project pluginsOverride.discovery file, which is created in the Project's Implementation/SharedObjects/Interfaces/Discovery/
folder.
This article describes the capabilities of the Discovery Pipeline screen and explains how they can impact the Discovery job.
The Baseline rule is a default configuration, applied when running the Discovery job on any data platform. It includes a sample size definition, a global schema exclude list and a list of product plugins with their settings.
The Baseline rule is always enabled. It can be edited by checking the Override checkbox. The following changes can be applied to the Baseline rule:
Note that the Baseline rule overrides are automatically propagated to the project-level rules. For example, when a plugin is updated from 'active' to 'inactive' in the baseline, it becomes 'inactive' in all project-level rules. The rule, however, can override the baseline.
The Baseline rule overrides can be reverted in one of the following ways:
The Discovery Pipeline screen enables the user to refine the default configuration per the project's requirements.
A rule should be attached to a data platform, along with several optional parameters (schema, dataset, crawler filter and override indicator) that may become mandatory, based on conditions; this is described further in this article.
Click on Add Rule + to create a new rule.
The mandatory rule's parameters are Rule Name (which must be unique) and Data Platform.
Populating a schema and a dataset is optional.
When multiple schemas or datasets are populated, they should be comma-separated.
A rule should include either a Crawler Filter or a checked Override checkbox, or both. Possible filter settings are described below.
When the filter is set to Exclude This:
When the filter is set to Exclude Others:
When the Crawler Filter is empty and the Override checkbox is checked:
Multiple rules can be defined for the same data platform. The purpose of creating multiple rules is to allow variations of the Discovery process execution for different elements. For example, one may need to set a higher sample size for some datasets or execute a certain plugin on a selected dataset or schema only.
When multiple rules are defined for the same data platform, they adhere to the following hierarchy:
Example of rules combination and hierarchy
The below image shows 3 rules defined for the AdventureWorks data platform:
When a new plugin is created in a project, it should be added to the Baseline rule in order to become part of the Discovery job execution. Once added to the baseline, the new plugin is automatically propagated to all the existing rules and can have different settings in each rule.
For example, when a newly created plugin is applicable only for running Discovery on the CRM_DB, it should be added to the baseline as 'inactive'. In addition, a rule for the CRM_DB should be created, where this plugin should be set to 'active'.
The steps for adding a new plugin to the pipeline are:
The new plugin is always added to the end of the Plugins list. However, the plugin's execution order can be changed by dragging it to a required position in the list.
Note that the Delete selected option in the context menu is only available for the project plugins while the product plugins cannot be deleted. If a product plugin is not needed, it can be set to 'inactive' in the Baseline rule.
The Discovery Pipeline screen in the Catalog Settings tab provides a full and comprehensive view of the Discovery job configuration. It displays the product's default baseline configuration (retrieved from the product's plugins.discovery file) and the project-level rules.
The Baseline rule includes a list of product built-in plugins with their input parameters, data snapshot sample size and more.
The Discovery Pipeline screen enables performing the following actions, described further in this article:
The overrides are saved into the project pluginsOverride.discovery file, which is created in the Project's Implementation/SharedObjects/Interfaces/Discovery/
folder.
This article describes the capabilities of the Discovery Pipeline screen and explains how they can impact the Discovery job.
The Baseline rule is a default configuration, applied when running the Discovery job on any data platform. It includes a sample size definition, a global schema exclude list and a list of product plugins with their settings.
The Baseline rule is always enabled. It can be edited by checking the Override checkbox. The following changes can be applied to the Baseline rule:
Note that the Baseline rule overrides are automatically propagated to the project-level rules. For example, when a plugin is updated from 'active' to 'inactive' in the baseline, it becomes 'inactive' in all project-level rules. The rule, however, can override the baseline.
The Baseline rule overrides can be reverted in one of the following ways:
The Discovery Pipeline screen enables the user to refine the default configuration per the project's requirements.
A rule should be attached to a data platform, along with several optional parameters (schema, dataset, crawler filter and override indicator) that may become mandatory, based on conditions; this is described further in this article.
Click on Add Rule + to create a new rule.
The mandatory rule's parameters are Rule Name (which must be unique) and Data Platform.
Populating a schema and a dataset is optional.
When multiple schemas or datasets are populated, they should be comma-separated.
A rule should include either a Crawler Filter or a checked Override checkbox, or both. Possible filter settings are described below.
When the filter is set to Exclude This:
When the filter is set to Exclude Others:
When the Crawler Filter is empty and the Override checkbox is checked:
Multiple rules can be defined for the same data platform. The purpose of creating multiple rules is to allow variations of the Discovery process execution for different elements. For example, one may need to set a higher sample size for some datasets or execute a certain plugin on a selected dataset or schema only.
When multiple rules are defined for the same data platform, they adhere to the following hierarchy:
Example of rules combination and hierarchy
The below image shows 3 rules defined for the AdventureWorks data platform:
When a new plugin is created in a project, it should be added to the Baseline rule in order to become part of the Discovery job execution. Once added to the baseline, the new plugin is automatically propagated to all the existing rules and can have different settings in each rule.
For example, when a newly created plugin is applicable only for running Discovery on the CRM_DB, it should be added to the baseline as 'inactive'. In addition, a rule for the CRM_DB should be created, where this plugin should be set to 'active'.
The steps for adding a new plugin to the pipeline are:
The new plugin is always added to the end of the Plugins list. However, the plugin's execution order can be changed by dragging it to a required position in the list.
Note that the Delete selected option in the context menu is only available for the project plugins while the product plugins cannot be deleted. If a product plugin is not needed, it can be set to 'inactive' in the Baseline rule.