Scroll Top

Change Data Capture using Snowflake Dynamic Tables

Home Change Data Capture using Snowflake Dynamic Tables

By Vishali Sakthivel Technical Blog January 9, 2025

Change data capture (CDC) refers to the process of identifying and capturing changes made to data in a database and then delivering those changes in near real-time to a downstream process or system. Dynamic tables in Snowflake can implement CDC, leveraging their flexibility and near real-time capabilities to capture and process data changes seamlessly.

Snowflake Dynamic Tables

When we talk about dynamic tables, we usually mean schemas or tables that are created, altered, or maintained using Snowflake’s SQL capabilities programmatically or in response to runtime conditions. Because Snowflake supports dynamic SQL and can handle schema changes and data transformations well, it is strongly tied to the idea of dynamic tables.

Dynamic tables in Snowflake provide a flexible way to manage schema evolution and adapt to changing data structures without the need for extensive schema modifications. This flexibility is important for implementing CDC because it lets tables change to meet the needs of the business while still capturing and processing data efficiently.

Dynamic Tables can join and aggregate across multiple source objects and incrementally update results as sources change. LAG specifies the difference in time between the dynamic table and the base table that forms its foundation. Snowflake not only automatically optimizes queries to make them more efficient and reduce the need for human tuning, but it also keeps compute resources and storage separate. This way, scalable compute resources can handle data without affecting the scalability of storage.

With the help of all these characteristics, businesses can adopt a “Zero ETL” strategy.

Key Features of Dynamic Tables for CDC

Schema Evolution:

Schema evolution is vital to preserving continuity and data integrity in CDC scenarios. Schema evolution in Snowflake dynamic tables makes it easy to quickly upgrade and change schemas by allowing agile development and data modeling techniques. It also cuts down on manual schema updates that are prone to mistakes, which helps make the best use of resources.

Real-time Data Capture:

In the past, keeping streaming and batch architectures separate has been hard for streaming data because it meant managing two systems at once, which added extra work and made failures more likely.

The integration of batch and streaming data increases delay and complexity in pipelining.

Many users can stream without knowing how to use Spark, Flink, or other streaming systems. This is possible with dynamic tables, which let users use common SQL and strong stream processing features. Furthermore, dynamic tables eliminate the extra logic often required for incremental updates by automatically applying incremental updates for both batch and streaming data.

Data Vault:

Data vault modeling is a hybrid approach that combines traditional relational data warehouse models with newer big data architectures to build a data warehouse for enterprise-scale analytics. Dynamic tables function as materialized views, adapting dynamically to the data they support, making them an ideal fit for the information mart layer in a Data Vault Architecture.

Implementing CDC using Dynamic Tables

A dynamic table allows us to specify a query, and the result materializes. It can track changes made to the query data we give it. It slowly updates the materialized results, solving declarative data transformation problems.

One key feature of this table that differentiates it from features like streams and tasks is that it eliminates the additional step. This step involves identifying and merging changes from the base table. The entire process is automatically performed within the dynamic table.

So, with this feature there is no longer a need to write code to transform and update the data in a separate target table. Dynamic tables support both incremental changes and Slowly Changing Dimensions (SCD) with row versioning.

Implementation of incremental changes:

Step 1—Setup Warehouse, DB, Schema

USE ROLE accountadmin; 

CREATE OR replace WAREHOUSE demo_wh warehouse_size = 'XSMALL'; 

USE WAREHOUSE demo_wh 

CREATE OR replace DATABASE demo_db; 

CREATE OR replace SCHEMA demo_schema;

Step 2—Create raw table

CREATE OR REPLACE TABLE demo_db.demo_schema.orders_raw 

AS SELECT * FROM SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.ORDERS LIMIT 100;

Step 3—Create a dynamic table

CREATE OR REPLACE DYNAMIC TABLE demo_db.demo_schema.orders 

TARGET_LAG = '1 minute' 

WAREHOUSE = demo_wh 

AS SELECT * FROM demo_db.demo_schema.orders_raw;

Step 4—Modify raw table

SELECT * FROM demo_db.demo_schema.orders WHERE O_CUSTKEY = 106660

UPDATE demo_db.demo_schema.orders_raw  

SET O_TOTALPRICE = 135445.43 

WHERE O_CUSTKEY = 106660

Step 5—Check the CDC data reflection in the dynamic table.

SELECT * FROM demo_db.demo_schema.orders WHERE O_CUSTKEY = 106660

Implementation of SCD2 with row versioning:

Step 1—Create raw table

CREATE OR REPLACE TABLE demo_db.demo_schema.inventory_raw 

AS 

SELECT * FROM SNOWFLAKE_SAMPLE_DATA.TPCDS_SF10TCL.INVENTORY limit 100;

Step 2: Create a stage table.

CREATE OR REPLACE TABLE demo_db.demo_schema.inventory_stg 

AS 

SELECT * FROM demo_db.demo_schema.inventory_raw;

Step 3—Create a dynamic table

CREATE OR REPLACE DYNAMIC TABLE demo_db.demo_schema.inventory_main  

TARGET_LAG = '1 minute' 

WAREHOUSE = demo_wh 

AS 

SELECT  

INV_DATE_SK ,INV_ITEM_SK,INV_WAREHOUSE_SK,INV_QUANTITY_ON_HAND, 

ROW_NUMBER () OVER (PARTITION BY INV_ITEM_SK ORDER BY INV_DATE_SK DESC) rnm, 

CASE WHEN rnm=1 THEN 'Y' ELSE 'N' END ACTION_CD, 

CASE WHEN rnm=1 THEN NULL ELSE CURRENT_DATE () END LOAD_END_DATE 

FROM demo_db.demo_schema.inventory_stg

Step 4—Modify Stage Table

SELECT * FROM  demo_db.demo_schema.inventory_stg 
WHERE INV_ITEM_SK= 386545




INSERT INTO demo_db.demo_schema.inventory_stg  

VALUES (2451060, 386545,2,999) 

SELECT * FROM demo_db.demo_schema.inventory_stg 

WHERE INV_ITEM_SK= 386545.

Step 5—Refresh Dynamic Table

ALTER DYNAMIC TABLE demo_db.demo_schema.inventory_main REFRESH

Step 6—Check SCD reflection in the dynamic table

SELECT * FROM demo_db.demo_schema.inventory_main  

WHERE INV_ITEM_SK=386545

Managing Dynamic Tables Refresh

Incremental refresh:

When possible, the automated refresh process performs an incremental refresh. With incremental refresh, the automated refresh process analyzes the query for the dynamic table. It computes the changes to the query results (the changes since the dynamic table was last refreshed). The refresh process then merges those changes into the dynamic table.

Full refresh:

If the automated process is unable to determine how to perform an incremental refresh, the process performs a full refresh. With a full refresh, the automated refresh process performs the query for the dynamic table and materializes the results. This completely replaces the current materialized results of the dynamic table.

Conclusion

Dynamic tables possess immense capabilities for solving various use cases. They improve operational efficiency, availability, and CDC management. The goal of this blog was to provide an overview of dynamic tables along with several use case examples.

Vishali Sakthivel

+ posts

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these cookies, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use nonessential cookies that help us analyze and understand how you use this website and enhance your user experience. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may have an effect on your browsing experience.

Necessary

Always Enabled

Necessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.

Cookie	Duration	Description
_GRECAPTCHA	5 months 27 days	This cookie is set by Google. In addition to certain standard Google cookies, reCAPTCHA sets a necessary cookie (_GRECAPTCHA) when executed for the purpose of providing its risk analysis.
cookielawinfo-checbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other".
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category.
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Functional

Functional cookies help to perform certain functionalities like sharing the content of the website on social media platforms, collect feedbacks, and other third-party features.

Cookie	Duration	Description
bcookie	2 years	LinkedIn sets this cookie from LinkedIn share buttons and ad tags to recognize browser ID.
lang	session	This cookie is used to store the language preferences of a user to serve up content in that stored language the next time user visit the website.
lidc	1 day	LinkedIn sets the lidc cookie to facilitate data center selection.
Zoominfo	session	Zoominfo uses technologies to collect and store information when you interact with services it offer to their partners, such as advertising services or analytics. All of those processes are meant to improve your user experience and the overall quality of our services.

Analytics

Analytics cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics the number of visitors, bounce rate, traffic source, etc.

Cookie	Duration	Description
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_gat_gtag_UA_111355416_1	1 minute	Set by Google to distinguish users.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
_hjAbsoluteSessionInProgress	30 minutes	This cookie is used to detect the first pageview session of a user. This is a True/False flag set by the cookie.
_hjFirstSeen	30 minutes	This is set by Hotjar to identify a new user’s first session. It stores a true/false value, indicating whether this was the first time Hotjar saw this user. It is used by Recording filters to identify new user sessions.
_hjid	1 year	This is a Hotjar cookie that is set when the customer first lands on a page using the Hotjar script.
_hjIncludedInPageviewSample	2 minutes	This cookie is set to let Hotjar know whether the user is included in the data sampling defined by site's pageview limit.
_hjIncludedInSessionSample	2 minutes	This cookie is set to let Hotjar know whether the user is included in the data sampling defined by site's daily session limit.
_hjTLDTest	session	Hotjar test cookie to check the most generic cookie path it should use, instead of the page hostname. This is done so that cookies can be shared across subdomains (where applicable). To determine this, we store the _hjTLDTest cookie for different URL substring alternatives until it fails. After this check, the cookie is removed.
oktgid	1 year	This cookie is used for storing the visitor ID of the user who clicked on an okt.to link.
oktsid	session	This cookie is used for storing the session ID of the user who clicked on an okt.to link.

Advertisement cookies are used to provide visitors with relevant ads and marketing campaigns. These cookies track visitors across websites and collect information to provide customized ads.

Cookie	Duration	Description
bscookie	2 years	This cookie is a browser ID cookie set by Linked share Buttons and ad tags.
personalization_id	2 years	Twitter sets this cookie to integrate and share features for social media and also store information about how the user uses the website, for tracking and targeting.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by YouTube and is used to track the views of embedded videos on YouTube pages.

Other

Other uncategorized cookies are those that are being analyzed and have not yet been classified into a category according to their type and purpose.

Cookie	Duration	Description
__gwtCookieCheck	session	This cookie is used to check if the visitors' browser supports cookies.
AnalyticsSyncHistory	1 month	These cookies are used to deliver advertisements more relevant to you and your interests. They are also used to limit the number of times you see an advertisement as well as help measure the effectiveness of the advertising campaign. They remember that you have visited a website and this information is shared with other organizations such as advertisers.
li_gc	2 years	These cookies are used to deliver advertisements more relevant to you and your interests. They are also used to limit the number of times you see an advertisement as well as help measure the effectiveness of the advertising campaign. They remember that you have visited a website and this information is shared with other organizations such as advertisers.
UserMatchHistory	1 month	LinkedIn - Used to track visitors on multiple websites, in order to present relevant advertisement based on the visitor's preferences.

Change Data Capture using Snowflake Dynamic Tables

Snowflake Dynamic Tables

Key Features of Dynamic Tables for CDC

Schema Evolution:

Data Vault:

Implementing CDC using Dynamic Tables

Implementation of incremental changes:

Step 1—Setup Warehouse, DB, Schema

Step 2—Create raw table

Step 3—Create a dynamic table

Step 4—Modify raw table

Step 5—Check the CDC data reflection in the dynamic table.

Implementation of SCD2 with row versioning:

Step 1—Create raw table

Step 2: Create a stage table.

Step 3—Create a dynamic table

Step 4—Modify Stage Table

Step 5—Refresh Dynamic Table

Step 6—Check SCD reflection in the dynamic table

Managing Dynamic Tables Refresh

Incremental refresh:

Full refresh:

Conclusion

Vishali Sakthivel

Contact Us

Contact Us

Contact Us

Contact Us

Contact Us

Contact Us

Contact Us