Knowledge Base
Incremental View Maintenance
Overview
NQL Incremental View Maintenance is designed to streamline your data processing by allowing you to update materialized views with only the new or modified data since the last query run. This feature eliminates the need to rescan entire datasets, reducing query runtime and data processing costs. By using incremental view maintenance, your queries become more efficient, especially when working with large datasets, helping you deliver fresh, relevant data faster.
Key Benefits
- Efficiency: Incremental queries ensure that only new or updated data is processed, avoiding the need for a full dataset scan. This saves time and resources, making your queries faster and more cost-effective.
- Scalability: As your datasets grow, incremental maintenance allows you to scale your queries without a significant increase in processing costs or time.
- Flexibility Across Data Platforms: Whether you're using Iceberg, Snowflake, or another data plane, NQL Incremental View Maintenance supports a range of data platforms, offering flexibility in how your data is processed.
- Optimized for Regular Data Refreshes: Perfect for use cases where data needs to be refreshed regularly (daily, weekly, monthly), such as marketing, reporting, or any process requiring frequent data updates.
How to Use NQL Incremental View Maintenance
Creating an Incremental Materialized View
To start using Incremental View Maintenance, you need to create a materialized view that will refresh based only on the new data since the last update. Here's a step-by-step guide:
Step 1: Create a Materialized View
Create a materialized view by specifying your dataset and refresh schedule. The key difference is that instead of a full overwrite, only the incremental changes will be applied. Example:
CREATE MATERIALIZED VIEW "user_activity_summary"
WRITE_MODE = 'append'
REFRESH_SCHEDULE = '@daily'
AS
SELECT
user_id,
last_login,
country
FROM DELTA(table "company_data".user_activity)
WHERE last_login > CURRENT_TIMESTAMP - INTERVAL '1' DAY;
Deep Dive Into Delta Syntax
- Prefixing a namespace with
delta
in a query denotes:- A full dataset read during the first query execution.
- Subsequent incremental reads from the event timestamp of the most recent materialized view update.
- The
DELTA
function consumes a single table as an argument. Example:SELECT _rosetta_stone.nio_last_modified_at FROM DELTA(table company_data.<YOUR_DATASET>);
- The schema of the delta matches the schema of the dataset it is used on. This means the
DELTA
function can be applied to any table, including Rosetta Stone attribute tables. Example:SELECT unique_id FROM DELTA(table narrative."rosetta_stone");
- DELTA tables can be used like regular tables. You can have multiple DELTA tables in one query or use them in joins. Example:
SELECT rs.unique_id FROM DELTA(table narrative."rosetta_stone") rs JOIN DELTA(table company_data.<YOUR_DATASET>) ds ON ds._rosetta_stone.unique_id = rs.unique_id;
Conceptualizing Refreshes
- Every time a view is refreshed, we recompile the query and apply filters to set boundaries for the
nio_last_modified_at
attribute.
A run for the query SELECT unique_id FROM DELTA(table narrative."rosetta_stone")
will look like:
SELECT
unique_id
FROM
narrative."rosetta_stone"
WHERE
nio_last_modified_at >= <TIMESTAMP_OF_PREVIOUS_RUN> AND
nio_last_modified_at < <NOW>;
Step 2: Insert New Data
As your dataset grows and new data is added, the materialized view will only scan and incorporate the new data from the user_activity
table.
Step 3: Refresh the Materialized View
Based on the schedule you set (in this case, daily), the materialized view will automatically refresh to include the new data. Each time the view refreshes, it will process only the data that has been added or updated since the last refresh, ensuring that your data is always up to date without the need for a full dataset scan after the first run.
Example Queries
Example 1: Incremental Update of Monthly Purchase History
CREATE MATERIALIZED VIEW "incremental_new_signups"
WRITE_MODE = 'append'
REFRESH_SCHEDULE = '@daily'
AS
SELECT
customer_id,
signup_date,
region,
email
FROM DELTA(table "company_data".user_activity)
WHERE signup_date > CURRENT_TIMESTAMP - INTERVAL '1' DAY;
This materialized view adds new customer sign-ups to the existing data using the append
write mode. Each day, the view will append the new sign-ups from the last 24 hours to the current list of customers, allowing you to build up a historical log of sign-ups over time. This is useful when you want to accumulate new records without replacing previously captured data.
Example 2: Incremental Update of Inventory Restocks and Transfers
CREATE MATERIALIZED VIEW "incremental_inventory_activity"
WRITE_MODE = 'append'
REFRESH_SCHEDULE = '@daily'
AS
SELECT
product_id,
warehouse_id,
stock_level,
last_restocked,
CASE
WHEN transaction_type = 'restock' THEN 'Restocked'
WHEN transaction_type = 'transfer' THEN 'Transferred'
ELSE 'Unknown'
END AS activity_type,
transaction_timestamp,
transferred_to_warehouse
FROM DELTA(table "company_data".inventory_transactions)
WHERE transaction_timestamp > CURRENT_TIMESTAMP - INTERVAL '1' DAY;
This query captures daily inventory activity, including both restocking and transfers, and appends the new records without overwriting previous data. The complexity comes from handling multiple transaction types (restock
and transfer
), ensuring that inventory movements between warehouses are logged alongside restocking events.
Example 3: Tracking Daily Inventory Restocks and Summarizing Product Movement
CREATE MATERIALIZED VIEW "inventory_daily_summary"
WRITE_MODE = 'append'
REFRESH_SCHEDULE = '@daily'
AS
SELECT
product_id,
COUNT(*) AS activity_count,
SUM(CASE
WHEN transaction_type = 'restock' THEN stock_change
ELSE 0
END) AS total_restocked_units,
SUM(CASE
WHEN transaction_type = 'transfer' THEN stock_change
ELSE 0
END) AS total_transferred_units,
MAX(transaction_timestamp) AS last_activity
FROM DELTA(table "company_data".inventory_transactions_overwrite)
WHERE transaction_timestamp > CURRENT_TIMESTAMP - INTERVAL '1' DAY
GROUP BY product_id;
In this case, the source dataset (inventory_transactions_overwrite
) is in overwrite
mode, meaning it replaces all data during each refresh. The materialized view, however, is in append
mode and tracks daily inventory activities, capturing restocking and transfer events.
Best Practices
- Align Your Refresh Schedule: To ensure optimal results, align your materialized view refresh schedule with how frequently your data is updated.
- Use Incremental Processing for Large Datasets: Incremental processing is most beneficial when working with large datasets that are updated regularly. It prevents the need for full scans and ensures quick, efficient data processing.
- Hints for Incremental or Full Scans: If you have some tables that should always be fully scanned (such as reference data), you can specify that in your query by using the appropriate table syntax.
Conclusion
NQL Incremental View Maintenance provides a powerful way to manage and refresh materialized views efficiently, especially in scenarios involving large, frequently updated datasets. By leveraging incremental processing, you can reduce query times and costs, ensuring faster access to up-to-date data without unnecessary overhead.