Implementing Version-Controlled Medallion Architecture in Databricks Using Liquibase

By Narasimman Rajendran Technical Blog March 31, 2026

Introduction

Managing and evolving data lake architectures at scale can be challenging—especially when you require traceability and rollback capabilities. In this blog, I’ll walk you through how we used Liquibase to implement version-controlled DDL management for a Medallion Architecture in Databricks.

Medallion Architecture (Bronze → Silver → Gold):

A medallion architecture is a data design pattern used to organize data logically in a lakehouse. It aims to improve data structure and quality incrementally as data moves through each layer: Bronze, Silver, and Gold tables. This approach is sometimes called a “multi-hop” architecture.

What is Liquibase?

Liquibase is an open-source tool for managing database schema changes. It helps teams track, version-control, and deploy changes like table creation, column addition, and constraints. It works in a structured and automated way, similar to how Git tracks application code.

Think of it as Git for database DDLs (Data Definition Language scripts).

Liquibase in a Databricks/Delta Lake World

How It Fits in Databricks

Databricks is a data engineering and analytics platform. Delta Lake is its storage layer, adding ACID transactions, schema enforcement, and time travel to data lakes.

Databricks has strong data capabilities but lacks native version control or audit tracking for schema changes like table creation or alteration.

That’s where Liquibase becomes valuable.

Why Use Liquibase in Databricks

Liquibase + Delta Lake = GitOps for Data Schemas

Why Version Control for DDLs Matters:

You can show a basic flow diagram:

Liquibase → Databricks (via JDBC) → Bronze/Silver/Gold Schemas → Common Log Schema

Flow Diagram:

Prerequisites:

Technical Setup:

You will need the following tools installed and configured on your system:

Java(Liquibase runs on Java)
Liquibase CLI(download and install from the Liquibase websiteor use Homebrew)
Databricks JDBC Drivers

Access Setup:

Azure Data Bricks
Azure DevOps

Step-by-Step Implementation:

This section walks you through setting up and using Liquibase to manage version-controlled DDLs in Databricks with Delta Lake, step by step, covering:

Project Setup
Liquibase Configuration
Writing Changelogs
Execution

a. Project Setup:

Recommended Folder Structure

Common Schema for Logs

Create a centralized schema to store Liquibase logs:

CREATE SCHEMA IF NOT EXISTS

Liquibase will automatically create the DATABASECHANGELOG and DATABASECHANGELOGLOCK tables inside this schema.

Note:
DATABASECHANGELOGLOCK:

This table ensures that only one instance of Liquibase applies changes at a time. It prevents race conditions and conflicts when multiple users or automation pipelines run Liquibase concurrently.

DATABASECHANGELOG:

This is the main changelog history table. It records each changeset executed against the database.

Flow:

Liquibase starts → checks DATABASECHANGELOGLOCK → acquires lock.
Reads changelogs and compares with DATABASECHANGELOG.
Applies new changesets → logs each one in DATABASECHANGELOG.
Releases lock in DATABASECHANGELOGLOCK.

If These Tables Are Missing:

Liquibase will create them automatically in the target database schema during the first run. However, if you manually delete or tamper with them:

It may reapply changes.
It may cause duplicates or failures.

b. Liquibase Configuration

Create a liquibase.properties File with,

driver: com.databricks.client.jdbc.Driver
classpath: driver/DatabricksJDBC42.jar
changeLogFile: changelogs/master_changelog.xml


url: jdbc:databricks://:443;TransportMode=http;SSL=1;\
AuthMech=3;UID=token;PWD=;\
httpPath=;\
ConnCatalog=;ConnSchema=;\
UserAgentEntry=Liquibase;EnableArrow=0;\

Note: Use an environment variable for the URL for a secure process.

Replace <workspace-hostname> ,<your-token>, <warehouse-http-path>, <catalog>, and <schema> with your actual values.

Common Pitfalls in Databricks Setup:

c. Writing Changelogs:

You can write changelogs in:

SQL format (recommended for Databricks)
XML, YAML, or JSON (requires extra syntax handling)

<databaseChangeLog>
<include file=”bronze/your_sqlfile.sql” />
<include file=”silver/your_sqlfile.sql” />
</databaseChangeLog>

Master Changelog (XML):

Sample SQL Changelog (01_create_customers.sql):

— liquibase formatted sql 
— changeset author:table_name context:schema_name
CREATE TABLE IF NOT EXISTS schema_name.table_name(
column_name1 STRING, column_name2 STRING
) 
USING DELTA 
TBLPROPERTIES ( 
‘delta.columnMapping.mode’ = ‘name’
)
— rollback DROP TABLE IF EXISTS schema_name.table_name;

How Rollback Works:

Liquibase reads the rollback section of your changelog and reverts the change.

liquibase rollbackCount 1

d. Execution:

Run Liquibase Commands.

Apply changes:

liquibase update

Rollback last change:

liquibase rollbackCount 1

Preview changes (dry run):

liquibase updateSQL

Check current status:

liquibase status

Conclusion

As lakehouse platforms scale, schema changes can no longer be an afterthought. In a Medallion Architecture, unmanaged DDL updates across Bronze, Silver, and Gold layers quickly lead to broken pipelines and hard-to-trace issues.

By using Liquibase with Databricks and Delta Lake, you bring Git-style version control, auditability, and rollback to data schemas. DDLs become code, changes become repeatable, and deployments become safer across environments.

Combined with Delta Lake’s ACID guarantees, this approach enables a production-ready, governed, and scalable lakehouse—where schema evolution is controlled, not chaotic.

Narasimman Rajendran

+ posts

Cookie	Duration	Description
_GRECAPTCHA	5 months 27 days	This cookie is set by Google. In addition to certain standard Google cookies, reCAPTCHA sets a necessary cookie (_GRECAPTCHA) when executed for the purpose of providing its risk analysis.
cookielawinfo-checbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other".
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category.
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
bcookie	2 years	LinkedIn sets this cookie from LinkedIn share buttons and ad tags to recognize browser ID.
lang	session	This cookie is used to store the language preferences of a user to serve up content in that stored language the next time user visit the website.
lidc	1 day	LinkedIn sets the lidc cookie to facilitate data center selection.
Zoominfo	session	Zoominfo uses technologies to collect and store information when you interact with services it offer to their partners, such as advertising services or analytics. All of those processes are meant to improve your user experience and the overall quality of our services.

Cookie	Duration	Description
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_gat_gtag_UA_111355416_1	1 minute	Set by Google to distinguish users.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
_hjAbsoluteSessionInProgress	30 minutes	This cookie is used to detect the first pageview session of a user. This is a True/False flag set by the cookie.
_hjFirstSeen	30 minutes	This is set by Hotjar to identify a new user’s first session. It stores a true/false value, indicating whether this was the first time Hotjar saw this user. It is used by Recording filters to identify new user sessions.
_hjid	1 year	This is a Hotjar cookie that is set when the customer first lands on a page using the Hotjar script.
_hjIncludedInPageviewSample	2 minutes	This cookie is set to let Hotjar know whether the user is included in the data sampling defined by site's pageview limit.
_hjIncludedInSessionSample	2 minutes	This cookie is set to let Hotjar know whether the user is included in the data sampling defined by site's daily session limit.
_hjTLDTest	session	Hotjar test cookie to check the most generic cookie path it should use, instead of the page hostname. This is done so that cookies can be shared across subdomains (where applicable). To determine this, we store the _hjTLDTest cookie for different URL substring alternatives until it fails. After this check, the cookie is removed.
oktgid	1 year	This cookie is used for storing the visitor ID of the user who clicked on an okt.to link.
oktsid	session	This cookie is used for storing the session ID of the user who clicked on an okt.to link.

Cookie	Duration	Description
bscookie	2 years	This cookie is a browser ID cookie set by Linked share Buttons and ad tags.
personalization_id	2 years	Twitter sets this cookie to integrate and share features for social media and also store information about how the user uses the website, for tracking and targeting.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by YouTube and is used to track the views of embedded videos on YouTube pages.

Cookie	Duration	Description
__gwtCookieCheck	session	This cookie is used to check if the visitors' browser supports cookies.
AnalyticsSyncHistory	1 month	These cookies are used to deliver advertisements more relevant to you and your interests. They are also used to limit the number of times you see an advertisement as well as help measure the effectiveness of the advertising campaign. They remember that you have visited a website and this information is shared with other organizations such as advertisers.
li_gc	2 years	These cookies are used to deliver advertisements more relevant to you and your interests. They are also used to limit the number of times you see an advertisement as well as help measure the effectiveness of the advertising campaign. They remember that you have visited a website and this information is shared with other organizations such as advertisers.
UserMatchHistory	1 month	LinkedIn - Used to track visitors on multiple websites, in order to present relevant advertisement based on the visitor's preferences.

Implementing Version-Controlled Medallion Architecture in Databricks Using Liquibase

Introduction

Medallion Architecture (Bronze → Silver → Gold):

What is Liquibase?

Liquibase in a Databricks/Delta Lake World