Knowledge Base

Incremental View Maintenance

Overview

NQL Incremental View Maintenance is designed to streamline your data processing by allowing you to update materialized views with only the new or modified data since the last query run. This feature eliminates the need to rescan entire datasets, reducing query runtime and data processing costs. By using incremental view maintenance, your queries become more efficient, especially when working with large datasets, helping you deliver fresh, relevant data faster.

Key Benefits

  1. Efficiency: Incremental queries ensure that only new or updated data is processed, avoiding the need for a full dataset scan. This saves time and resources, making your queries faster and more cost-effective.
  2. Scalability: As your datasets grow, incremental maintenance allows you to scale your queries without a significant increase in processing costs or time.
  3. Flexibility Across Data Platforms: Whether you're using Iceberg, Snowflake, or another data plane, NQL Incremental View Maintenance supports a range of data platforms, offering flexibility in how your data is processed.
  4. Optimized for Regular Data Refreshes: Perfect for use cases where data needs to be refreshed regularly (daily, weekly, monthly), such as marketing, reporting, or any process requiring frequent data updates.

How to Use NQL Incremental View Maintenance

Creating an Incremental Materialized View

To start using Incremental View Maintenance, you need to create a materialized view that will refresh based only on the new data since the last update. Here's a step-by-step guide:

Step 1: Create a Materialized View

Create a materialized view by specifying your dataset and refresh schedule. The key difference is that instead of a full overwrite, only the incremental changes will be applied. Example:

CREATE MATERIALIZED VIEW "user_activity_summary"
  WRITE_MODE = 'append'
  REFRESH_SCHEDULE = '@daily'
AS
SELECT
  user_id,
  last_login,
  country
FROM DELTA(table "company_data".user_activity)
WHERE last_login > CURRENT_TIMESTAMP - INTERVAL '1' DAY;

Deep Dive Into Delta Syntax

  • Prefixing a namespace with delta in a query denotes:
    • A full dataset read during the first query execution.
    • Subsequent incremental reads from the event timestamp of the most recent materialized view update.
  • The DELTA function consumes a single table as an argument. Example:
    SELECT _rosetta_stone.nio_last_modified_at
    FROM DELTA(table company_data.<YOUR_DATASET>);
    
  • The schema of the delta matches the schema of the dataset it is used on. This means the DELTA function can be applied to any table, including Rosetta Stone attribute tables. Example:
    SELECT unique_id
    FROM DELTA(table narrative."rosetta_stone");
    
  • DELTA tables can be used like regular tables. You can have multiple DELTA tables in one query or use them in joins. Example:
    SELECT rs.unique_id
    FROM DELTA(table narrative."rosetta_stone") rs
    JOIN DELTA(table company_data.<YOUR_DATASET>) ds
      ON ds._rosetta_stone.unique_id = rs.unique_id;
    

Conceptualizing Refreshes

  • Every time a view is refreshed, we recompile the query and apply filters to set boundaries for the nio_last_modified_at attribute.

A run for the query SELECT unique_id FROM DELTA(table narrative."rosetta_stone") will look like:

SELECT 
  unique_id
FROM 
  narrative."rosetta_stone"
WHERE 
  nio_last_modified_at >= <TIMESTAMP_OF_PREVIOUS_RUN> AND
  nio_last_modified_at < <NOW>;

Step 2: Insert New Data

As your dataset grows and new data is added, the materialized view will only scan and incorporate the new data from the user_activity table.

Step 3: Refresh the Materialized View

Based on the schedule you set (in this case, daily), the materialized view will automatically refresh to include the new data. Each time the view refreshes, it will process only the data that has been added or updated since the last refresh, ensuring that your data is always up to date without the need for a full dataset scan after the first run.

Example Queries

Example 1: Incremental Update of Monthly Purchase History

CREATE MATERIALIZED VIEW "incremental_new_signups"
  WRITE_MODE = 'append'
  REFRESH_SCHEDULE = '@daily'
AS
SELECT 
  customer_id, 
  signup_date, 
  region, 
  email
FROM DELTA(table "company_data".user_activity)
WHERE signup_date > CURRENT_TIMESTAMP - INTERVAL '1' DAY;

This materialized view adds new customer sign-ups to the existing data using the append write mode. Each day, the view will append the new sign-ups from the last 24 hours to the current list of customers, allowing you to build up a historical log of sign-ups over time. This is useful when you want to accumulate new records without replacing previously captured data.

Example 2: Incremental Update of Inventory Restocks and Transfers

CREATE MATERIALIZED VIEW "incremental_inventory_activity"
  WRITE_MODE = 'append'
  REFRESH_SCHEDULE = '@daily'
AS
SELECT
  product_id,
  warehouse_id,
  stock_level,
  last_restocked,
  CASE
    WHEN transaction_type = 'restock' THEN 'Restocked'
    WHEN transaction_type = 'transfer' THEN 'Transferred'
    ELSE 'Unknown'
  END AS activity_type,
  transaction_timestamp,
  transferred_to_warehouse
FROM DELTA(table "company_data".inventory_transactions) 
WHERE transaction_timestamp > CURRENT_TIMESTAMP - INTERVAL '1' DAY;

This query captures daily inventory activity, including both restocking and transfers, and appends the new records without overwriting previous data. The complexity comes from handling multiple transaction types (restock and transfer), ensuring that inventory movements between warehouses are logged alongside restocking events.

Example 3: Tracking Daily Inventory Restocks and Summarizing Product Movement

CREATE MATERIALIZED VIEW "inventory_daily_summary"
  WRITE_MODE = 'append'
  REFRESH_SCHEDULE = '@daily'
AS
SELECT
  product_id,
  COUNT(*) AS activity_count,
  SUM(CASE
      WHEN transaction_type = 'restock' THEN stock_change
      ELSE 0
  END) AS total_restocked_units,
  SUM(CASE
      WHEN transaction_type = 'transfer' THEN stock_change
      ELSE 0
  END) AS total_transferred_units,
  MAX(transaction_timestamp) AS last_activity
FROM DELTA(table "company_data".inventory_transactions_overwrite)
WHERE transaction_timestamp > CURRENT_TIMESTAMP - INTERVAL '1' DAY
GROUP BY product_id;

In this case, the source dataset (inventory_transactions_overwrite) is in overwrite mode, meaning it replaces all data during each refresh. The materialized view, however, is in append mode and tracks daily inventory activities, capturing restocking and transfer events.

Best Practices

  • Align Your Refresh Schedule: To ensure optimal results, align your materialized view refresh schedule with how frequently your data is updated.
  • Use Incremental Processing for Large Datasets: Incremental processing is most beneficial when working with large datasets that are updated regularly. It prevents the need for full scans and ensures quick, efficient data processing.
  • Hints for Incremental or Full Scans: If you have some tables that should always be fully scanned (such as reference data), you can specify that in your query by using the appropriate table syntax.

Conclusion

NQL Incremental View Maintenance provides a powerful way to manage and refresh materialized views efficiently, especially in scenarios involving large, frequently updated datasets. By leveraging incremental processing, you can reduce query times and costs, ensuring faster access to up-to-date data without unnecessary overhead.

< Back
Rosetta

Hi! I’m Rosetta, your big data assistant. Ask me anything! If you want to talk to one of our wonderful human team members, let me know! I can schedule a call for you.