Catalog Vocabulary

The Fabric Catalog introduces a vocabulary that describes the Catalog entities and the relations between them. The relations indicate the connections between the data source entities and determine their hierarchy.

The below vocabulary serves as a model for describing a Catalog and assists with processes standardization across different interface types.

The data entities are represented by nodes and the referential links between the nodes are represented by relations. Nodes and relations have predefined properties that enrich the Catalog schema.

Additionally, due to differences between the data sources, some nodes' properties are generic, while others are relevant only for specific interface types.

Node Types

Node Type Description
dataPlatform Represents a Fabric interface in the Catalog data model
schema Represents an interface schema
dataset Represents a dataset (e.g. table) of an interface schema
class

Represents one of the following:

  • A dataset (e.g. table) - when the class name is identical to a dataset name
  • A complex structure - when the class name is different from a dataset name (available starting from V8.0)
field

Represents a dataset field. The field data type can be:

  • Primitive - string, integer, blob, date, number, boolean or any.
  • Collection - an array of primitive values.

Relation Types

Relation Type Description
contains

The relation between the objects that belong to two hierarchy levels:

  • dataPlatform contains schema
  • schema contains dataset
  • dataset contains class
  • class contains field

Example: CRM_DB contains public

The direction is One To Many.

refersTo

The relation between a parent (dataset1) and a child (dataset2). The direction is Many To One (from child to parent).

  • dataset2 refersTo dataset1 (foreign keys)

Example: INVOICE refersTo CUSTOMER (customer_id)

The relation key columns are included in the relation's properties.

definedBy

The relation between a field and its respective Class node (available starting from V8.0):

  • field definedBy Class

Example: ACTIVITY_JSON definedBy Activity_jsonClass

Node Properties

Each Catalog node has properties that provide additional information about the node. A Dataset field might have a variety of properties, whereas some of them are created by the Crawler and others are created by the plugins during the Discovery job.

The Defined By property is a mandatory property for every Catalog field. It specifies the field's Catalog Type and can include one of the following:

Field Type Property Definition Description
Primitive

One of the following:

  • STRING
  • INTEGER
  • REAL
  • DATA
  • TIME
  • DATETIME
  • BYTES
  • BOOLEAN
  • UNKNOWN

These values of the definedBy property establish standartization of various primitive data types across different platforms and data sources. 

For example, a string value can be defined as VARCHAR, CHAR, string or other in different data sources. In Catalog, all of them are interpretend as a STRING.

Object <name>

When a field includes a complex structure (e.g. XML), it is defined in Catalog by a class. 

Array

One of the following:

  • Collection (primitive type)
  • Collection(<name>)

When a field includes an array, the values of the array can be either primitives or obejcts. 

Array of arrays is supported as well.

Previous

Catalog Vocabulary

The Fabric Catalog introduces a vocabulary that describes the Catalog entities and the relations between them. The relations indicate the connections between the data source entities and determine their hierarchy.

The below vocabulary serves as a model for describing a Catalog and assists with processes standardization across different interface types.

The data entities are represented by nodes and the referential links between the nodes are represented by relations. Nodes and relations have predefined properties that enrich the Catalog schema.

Additionally, due to differences between the data sources, some nodes' properties are generic, while others are relevant only for specific interface types.

Node Types

Node Type Description
dataPlatform Represents a Fabric interface in the Catalog data model
schema Represents an interface schema
dataset Represents a dataset (e.g. table) of an interface schema
class

Represents one of the following:

  • A dataset (e.g. table) - when the class name is identical to a dataset name
  • A complex structure - when the class name is different from a dataset name (available starting from V8.0)
field

Represents a dataset field. The field data type can be:

  • Primitive - string, integer, blob, date, number, boolean or any.
  • Collection - an array of primitive values.

Relation Types

Relation Type Description
contains

The relation between the objects that belong to two hierarchy levels:

  • dataPlatform contains schema
  • schema contains dataset
  • dataset contains class
  • class contains field

Example: CRM_DB contains public

The direction is One To Many.

refersTo

The relation between a parent (dataset1) and a child (dataset2). The direction is Many To One (from child to parent).

  • dataset2 refersTo dataset1 (foreign keys)

Example: INVOICE refersTo CUSTOMER (customer_id)

The relation key columns are included in the relation's properties.

definedBy

The relation between a field and its respective Class node (available starting from V8.0):

  • field definedBy Class

Example: ACTIVITY_JSON definedBy Activity_jsonClass

Node Properties

Each Catalog node has properties that provide additional information about the node. A Dataset field might have a variety of properties, whereas some of them are created by the Crawler and others are created by the plugins during the Discovery job.

The Defined By property is a mandatory property for every Catalog field. It specifies the field's Catalog Type and can include one of the following:

Field Type Property Definition Description
Primitive

One of the following:

  • STRING
  • INTEGER
  • REAL
  • DATA
  • TIME
  • DATETIME
  • BYTES
  • BOOLEAN
  • UNKNOWN

These values of the definedBy property establish standartization of various primitive data types across different platforms and data sources. 

For example, a string value can be defined as VARCHAR, CHAR, string or other in different data sources. In Catalog, all of them are interpretend as a STRING.

Object <name>

When a field includes a complex structure (e.g. XML), it is defined in Catalog by a class. 

Array

One of the following:

  • Collection (primitive type)
  • Collection(<name>)

When a field includes an array, the values of the array can be either primitives or obejcts. 

Array of arrays is supported as well.

Previous