Knowledge Base

Attribute Lineage

What is Attribute Lineage?

Attribute Lineage is the process of tracking the propagation of attributes from one dataset to another. This process is essential for maintaining the semantic meaning of data across different datasets.

When a dataset is mapped to a Rosetta Stone attribute, the attribute is propagated to any dataset that is created as a derivative of the original dataset. This propagation ensures that the semantic meaning of the data is preserved as it moves through the data pipeline

Why is Attribute Lineage Important?

Attribute Lineage is important because it allows you to maintain the semantic meaning of data as it moves through the data pipeline. This is essential for ensuring that the data remains accurate and meaningful as it is used for analysis and decision-making.

Rosetta Stone Attributes are the core of the normalization engine of Narrative's Data Collaboration Platform and keeping them consistent and populated across as much data as possible enables the seamless flow of data between different datasets and organizations.

How Does Attribute Lineage Work?

Attribute Lineage works by tracking the propagation of attributes from one dataset to another. When a dataset is mapped to a Rosetta Stone attribute, the attribute is propagated to any dataset that is created as a derivative of the original dataset. This propagation ensures that the semantic meaning of the data is preserved as it moves through the data pipeline

Example: Attribute Lineage in Action

We'll use a simple NQL query to illustrate the different ways that attributes can be propagated from one dataset to another.

From Global Rosetta Stone Queries

SELECT
  narrative.rosetta_stone.event_timestamp,
FROM
    narrative.rosetta_stone

In this query, the event_timestamp attribute is propagated from the narrative.rosetta_stone dataset -- which itself is mapping from a non-rosetta stone dataset -- to the result of the query. This propagation ensures that the semantic meaning of the event_timestamp attribute is preserved as it moves through the data pipeline.

From Company Specific Rosetta Stone Queries

A company can query their own data using rosetta stone attributes as well. The lineage works the same way as with global rosetta stone queries.

SELECT
  company_data._rosetta_stone.event_timestamp,
FROM
    company_data._rosetta_stone

From Non-Rosetta Stone Queries

SELECT
  company_data.company_table.event_timestamp
FROM
    company_data.company_table

Even though this query does not use Rosetta Stone directly as long as the event_timestamp column in company_table is mapped to a Rosetta Stone attribute, the lineage will be preserved in the resulting dataset.

When Lineage is Not Preserved

There are a few scenarios where lineage is not preserved:

When a dataset is not mapped to a Rosetta Stone attribute. In this case there is no lineage to preserve
When a dataset is mapped to a Rosetta Stone attribute but the attribute is not used in the query. In this case the lineage is not preserved in the result of the query.
When a field is selected that could have preserved the lineage, but the field is transformed in a way that breaks the lineage. For example, if a date field is truncated, the lineage is broken because the resulting output no longer has the same semantic meaning as the original field.
If a attribute mapping does not exist at the time the derivitive dataset is created, and later that mapping is added, the lineage will not be preserved in the derivitive dataset unless it is recreated.

Conclusion

Attribute Lineage is an essential part of maintaining the semantic meaning of data as it moves through the data pipeline. By tracking the propagation of attributes from one dataset to another, you can ensure that the data remains accurate and meaningful as it is used for analysis and decision-making. This is essential for ensuring that the data remains accurate and meaningful as it is used for analysis and decision-making. Rosetta Stone Attributes are the core of the normalization engine of Narrative's Data Collaboration Platform and keeping them consistent and populated across as much data as possible enables the seamless flow of data between different datasets and organizations.