Catalog Artifacts

Overview

The Catalog provides the ability to build artifacts and save them in the Project tree. An artifact includes details of all Catalog fields and their properties — such as Classification and PII — for the currently displayed Catalog version.

The prerequisite for building the Catalog artifact is running the Discovery job for at least one project interface.

Building Artifacts

Building a Catalog artifact is done by clicking Actions > Build Artifacts in the Catalog application's Menu bar.

A Catalog artifact is a file called catalog_field_info.csv. It is created in a CSV format, saved in the Implementation/SharedObjects/Interfaces/Discovery/MTable folder of the Project tree and uploaded to the Fabric memory as an MTable.

The below image is an example of a Catalog artifact:

The artifact is created for the Catalog version, which is displayed in the application. The heading of the last column indicates the version number (V14 in the above example), and the column itself always remains empty.

Catalog artifacts can be created for any Catalog version. Each new artifact overrides the existing one in the Project tree.

Building Artifacts Including Relations

Starting from Fabric V8.3, the relations artifact can be created when needed. This is only available through the /api/catalog/{version}/build-catalog-artifacts API, by setting refersTo=true in the API input, as described here. Note that relations artifacts are not created when Build Artifacts activity is initiated via the Catalog application.

The below image is an example of the Catalog relations artifact. As you can see, it includes a list of relations between datasets upon their keys:

The relations artifact includes a list of refersTo relations, containing the following information:

  • Parent data platform, schema, dataset and field(s)
  • Child data platform, schema, dataset and field(s)
  • Origin of the relation (Crawler or manual)

In case of a combined relations key, the field names are separated by a semicolon.

Splitting and Combining Artifacts

Catalog artifacts can be split into separate files for each data platform and schema of a given Catalog version. The content of these files is then combined into one single MTable in Fabric's memory although the files are saved separately in the Project tree.

Splitting the Catalog artifacts is enabled when the SPLIT_CATALOG_ARTIFACTS parameter in the config.ini file is set to ON (default parameter setting starting from Fabric V8.3).

This ability allows to combine separate artifacts, created in different projects (or different spaces), into a single artifact. Hence, the artifact files can be copied from one project to another, and upon deployment, they will be combined into one MTable.

Note that if either the catalog_field_info.csv or catalog_relations_info.csv file exists in the Project tree, it should be manually deleted.

The names of the separate files follow the below format:

  • catalog_field_info___<dataPlatform>_<schema>.csv, (containing 3 underscores before the data platform name)
  • catalog_relations_info___<dataPlatform>_<schema>.csv, (containing 3 underscores before the data platform name)

Previous

Catalog Artifacts

Overview

The Catalog provides the ability to build artifacts and save them in the Project tree. An artifact includes details of all Catalog fields and their properties — such as Classification and PII — for the currently displayed Catalog version.

The prerequisite for building the Catalog artifact is running the Discovery job for at least one project interface.

Building Artifacts

Building a Catalog artifact is done by clicking Actions > Build Artifacts in the Catalog application's Menu bar.

A Catalog artifact is a file called catalog_field_info.csv. It is created in a CSV format, saved in the Implementation/SharedObjects/Interfaces/Discovery/MTable folder of the Project tree and uploaded to the Fabric memory as an MTable.

The below image is an example of a Catalog artifact:

The artifact is created for the Catalog version, which is displayed in the application. The heading of the last column indicates the version number (V14 in the above example), and the column itself always remains empty.

Catalog artifacts can be created for any Catalog version. Each new artifact overrides the existing one in the Project tree.

Building Artifacts Including Relations

Starting from Fabric V8.3, the relations artifact can be created when needed. This is only available through the /api/catalog/{version}/build-catalog-artifacts API, by setting refersTo=true in the API input, as described here. Note that relations artifacts are not created when Build Artifacts activity is initiated via the Catalog application.

The below image is an example of the Catalog relations artifact. As you can see, it includes a list of relations between datasets upon their keys:

The relations artifact includes a list of refersTo relations, containing the following information:

  • Parent data platform, schema, dataset and field(s)
  • Child data platform, schema, dataset and field(s)
  • Origin of the relation (Crawler or manual)

In case of a combined relations key, the field names are separated by a semicolon.

Splitting and Combining Artifacts

Catalog artifacts can be split into separate files for each data platform and schema of a given Catalog version. The content of these files is then combined into one single MTable in Fabric's memory although the files are saved separately in the Project tree.

Splitting the Catalog artifacts is enabled when the SPLIT_CATALOG_ARTIFACTS parameter in the config.ini file is set to ON (default parameter setting starting from Fabric V8.3).

This ability allows to combine separate artifacts, created in different projects (or different spaces), into a single artifact. Hence, the artifact files can be copied from one project to another, and upon deployment, they will be combined into one MTable.

Note that if either the catalog_field_info.csv or catalog_relations_info.csv file exists in the Project tree, it should be manually deleted.

The names of the separate files follow the below format:

  • catalog_field_info___<dataPlatform>_<schema>.csv, (containing 3 underscores before the data platform name)
  • catalog_relations_info___<dataPlatform>_<schema>.csv, (containing 3 underscores before the data platform name)

Previous