Advanced Configuration

Web Studio

The Implementation/SharedObjects/Interfaces/Discovery/ folder in the Project tree is a folder that holds all Catalog and Discovery process-related configuration files:

  • The MTable subfolder holds the MTables used by the Catalog's various processes. Additionally, when the Build Artifacts action is performed, the Catalog artifact - catalog_field_info.csv - is created in this folder.
    • Starting from V8.1, the Catalog artifact can be split. As a result, multiple files will be created instead of a single one. Click here for more information about it.
  • The pluginsOverride.discovery is a configuration file that defines the prospective overrides in the Crawler and the plugins configuration (starting from V8.2). Click here for more information about it.

Show Catalog Commands is a Web Studio setting that either shows or hides the Catalog's related commands - Run Discovery Job and Open in Catalog - in the Web Studio.

Catalog Application Configuration

The properties-info.json is a configuration file used by the Catalog application to determine the view and the behavior of various Catalog UI elements. The following settings can be performed:

  • "editable": true - the property is editable via the Edit Catalog capabililty.
  • "deletable": true - the property can be deleted via the Edit Catalog capabililty.
  • "searchable": true - the property is searchable using the Advanced Search screen.
  • "filterable": true - the property is filterable using the Catalog Filter screen.
  • "values":[] defines a list of a property's valid values. For some properties, this list is combined with programmatically retrieved values (e.g., classification).
  • "allow_custom_values": true defines the ability to populate a custom value for a property that has a drop-down list.
  • "hidden": true - the property is hidden from the Catalog's Properties tab.

The properties-info.json file is located in the fabric/staticWeb/catalog folder.

In order to perform the project-level overrides in the properties-info.json file:

  • Create a catalog folder under the Web folder of the Web Services LU, and copy the file into it.

  • After updating the file, save it and deploy the LUs.

General

The NEO4J_SERVER_MEMORY_HEAP_MAX_SIZE_MB parameter in the [data_discovery] section of the config.ini file specifies the maximum heap size for the Neo4j server. By default, it is set to 2048 MB.

  • The Neo4j heap size is set when starting Neo4j in a space.
  • This value can be adjusted, based on the data platform size and the number of schemas. For example, when an expected data platform size is large, it is recommended to increase this setting.
  • To update the heap size in an existing space, stop the Neo4j and the DATA_DISCOVERY_JOB, update this setting in the config.ini file and run the Discovery job.

The DATA_SNAP_WRITE_MEMORY_CAP_MB parameter in the [data_discovery] section of the config.ini file specifies the maximum amount of Fabric memory allocated for the Data Snapshot process. This parameter helps to balance the Fabric memory when running the Discovery on a data platform with multiple schemas or when multiple Discovery jobs are running in parallel on the same Neo4j.

  • When the in-memory data reaches this predefined limit, the Data Snapshot's data is committed to the SQLite file.

  • By default, the parameter is set to 4096 MB. When working with very large data sources, it is recommended to increase this setting – assuming the system has sufficient resources for such increase.

The ENABLE_DATA_DISCOVERY is a hidden configuration parameter that defines whether the Discovery should be enabled in the system (if Neo4j is part of the Fabric space). By default, it is set to true. If the Fabric space does not include Neo4j, the ENABLE_DATA_DISCOVERY parameter should be added to this section and set to false.

Previous

Advanced Configuration

Web Studio

The Implementation/SharedObjects/Interfaces/Discovery/ folder in the Project tree is a folder that holds all Catalog and Discovery process-related configuration files:

  • The MTable subfolder holds the MTables used by the Catalog's various processes. Additionally, when the Build Artifacts action is performed, the Catalog artifact - catalog_field_info.csv - is created in this folder.
    • Starting from V8.1, the Catalog artifact can be split. As a result, multiple files will be created instead of a single one. Click here for more information about it.
  • The pluginsOverride.discovery is a configuration file that defines the prospective overrides in the Crawler and the plugins configuration (starting from V8.2). Click here for more information about it.

Show Catalog Commands is a Web Studio setting that either shows or hides the Catalog's related commands - Run Discovery Job and Open in Catalog - in the Web Studio.

Catalog Application Configuration

The properties-info.json is a configuration file used by the Catalog application to determine the view and the behavior of various Catalog UI elements. The following settings can be performed:

  • "editable": true - the property is editable via the Edit Catalog capabililty.
  • "deletable": true - the property can be deleted via the Edit Catalog capabililty.
  • "searchable": true - the property is searchable using the Advanced Search screen.
  • "filterable": true - the property is filterable using the Catalog Filter screen.
  • "values":[] defines a list of a property's valid values. For some properties, this list is combined with programmatically retrieved values (e.g., classification).
  • "allow_custom_values": true defines the ability to populate a custom value for a property that has a drop-down list.
  • "hidden": true - the property is hidden from the Catalog's Properties tab.

The properties-info.json file is located in the fabric/staticWeb/catalog folder.

In order to perform the project-level overrides in the properties-info.json file:

  • Create a catalog folder under the Web folder of the Web Services LU, and copy the file into it.

  • After updating the file, save it and deploy the LUs.

General

The NEO4J_SERVER_MEMORY_HEAP_MAX_SIZE_MB parameter in the [data_discovery] section of the config.ini file specifies the maximum heap size for the Neo4j server. By default, it is set to 2048 MB.

  • The Neo4j heap size is set when starting Neo4j in a space.
  • This value can be adjusted, based on the data platform size and the number of schemas. For example, when an expected data platform size is large, it is recommended to increase this setting.
  • To update the heap size in an existing space, stop the Neo4j and the DATA_DISCOVERY_JOB, update this setting in the config.ini file and run the Discovery job.

The DATA_SNAP_WRITE_MEMORY_CAP_MB parameter in the [data_discovery] section of the config.ini file specifies the maximum amount of Fabric memory allocated for the Data Snapshot process. This parameter helps to balance the Fabric memory when running the Discovery on a data platform with multiple schemas or when multiple Discovery jobs are running in parallel on the same Neo4j.

  • When the in-memory data reaches this predefined limit, the Data Snapshot's data is committed to the SQLite file.

  • By default, the parameter is set to 4096 MB. When working with very large data sources, it is recommended to increase this setting – assuming the system has sufficient resources for such increase.

The ENABLE_DATA_DISCOVERY is a hidden configuration parameter that defines whether the Discovery should be enabled in the system (if Neo4j is part of the Fabric space). By default, it is set to true. If the Fabric space does not include Neo4j, the ENABLE_DATA_DISCOVERY parameter should be added to this section and set to false.

Previous