Introducing Change Data Capture for SQLite in Turso

Nikita SivukhinNikita Sivukhin
Jamie BartonJamie Barton
Cover image for Introducing Change Data Capture for SQLite in Turso

We're excited to share an early preview of a powerful new feature in Turso's SQLite rewrite: built-in Change Data Capture (CDC). While still unstable and under active development, it's ready for testing and we'd love your feedback and contributions.

#What is Change Data Capture?

Change Data Capture is a pattern for tracking and recording changes to your database in real-time. Instead of periodically scanning tables to find what changed, CDC automatically logs every insert, update, and delete as it happens. This creates an audit trail that's invaluable for:

  • Building real-time data pipelines
  • Syncing data between systems
  • Creating audit logs
  • Implementing event-driven architectures
  • Debugging data issues

#How CDC Works in Turso

Our implementation makes CDC a first-class citizen in SQLite. You enable it with a simple pragma:

PRAGMA unstable_capture_data_changes_conn('full');

Once enabled, every data change on that connection gets logged to a CDC table:

CREATE TABLE turso_cdc (
  change_id INTEGER PRIMARY KEY AUTOINCREMENT,
  change_time INTEGER,      -- Unix timestamp
  change_type INTEGER,      -- -1=delete, 0=update, 1=insert
  table_name TEXT,
  id,                         -- rowid of affected row
  before BLOB,                -- row data before change
  after BLOB                  -- row data after change
)

#CDC Modes

We've designed multiple modes to balance between detail and performance:

  • id — Logs only the rowid, keeping the CDC table compact
  • before — Captures row state before updates and deletes
  • after — Captures row state after inserts and updates
  • full — Captures both before and after states

Here's what it looks like in action:

-- Enable full CDC mode
PRAGMA unstable_capture_data_changes_conn('full');

--- Custom table name
--- PRAGMA unstable_capture_data_changes_conn('full,custom_cdc_table');

-- Make some changes
PRAGMA unstable_capture_data_changes_conn('full');
CREATE TABLE t(a INTEGER PRIMARY KEY, b);
INSERT INTO t VALUES (1, 2), (3, 4);
UPDATE t SET b = 20 WHERE a = 1;
UPDATE t SET a = 30, b = 40 WHERE a = 3;
DELETE FROM t WHERE a = 1;

-- Query the CDC log with our JSON helper functions
SELECT
id,
change_type,
bin_record_json_object(table_columns_json_array('users'), before) AS before,
bin_record_json_object(table_columns_json_array('users'), after) AS after
FROM turso_cdc;

This gives you a complete history of changes in an easily queryable format:

SELECT change_type, id,
    bin_record_json_object(table_columns_json_array('t'), before) before,
    bin_record_json_object(table_columns_json_array('t'), after) after
    FROM turso_cdc;
┌─────────────┬────┬────────────────────┬────────────────────┐
│ change_type │ id │ before             │ after              │
├─────────────┼────┼────────────────────┼────────────────────┤
│           11 │                    │ {"a":1,"b":2}      │
│           13 │                    │ {"a":3,"b":4}      │
│           01 │ {"a":1,"b":2}      │ {"a":1,"b":20}     │
│          -13 │ {"a":3,"b":4}      │                    │
│           130 │                    │ {"a":30,"b":40}    │
│          -11 │ {"a":1,"b":20}     │                    │
└─────────────┴────┴────────────────────┴────────────────────┘

#Handling Schema Changes

One important limitation to be aware of: if you change your table schema (adding/dropping columns), the table_columns_json_array() function will return the current schema, not the historical one. This can lead to incorrect results when decoding older CDC records.

For example, if you drop a column, using bin_record_json_object(table_columns_json_array('users'), before) on historical records will shift the column mappings and produce incorrect JSON. As a workaround, you can manually track schema versions by storing the output of table_columns_json_array() before making schema changes.

#What This Enables

Built-in CDC opens up exciting possibilities:

  • Real-time Sync: Stream changes to other systems without polling or triggers. Perfect for keeping caches, search indexes, or analytics databases in sync.
  • Event Sourcing: Build event-driven applications where every state change is an event you can react to.
  • Time Travel: Reconstruct the state of your data at any point in time by replaying the CDC log.
  • Compliance & Auditing: Maintain a tamper-evident log of who changed what and when.
  • Debugging: When something goes wrong, you have a complete record of every change.

#Design Decisions

We've made some intentional choices in this implementation:

  1. CDC tables are just regular tables — You can query, maintain, and clean them up with standard SQL
  2. Per-connection control — Different connections can have different CDC settings or tables
  3. Binary format with JSON helpers — We store changes efficiently but provide functions to make them human-readable

#Try It Out

This feature is marked as unstable because we're still iterating on the API and functionality. We expect breaking changes as we incorporate feedback and add features like:

  • Richer schema change tracking
  • More efficient storage formats
  • Better tooling for CDC consumption

But that's exactly why we need you to try it! Test it with your workloads, push its limits, and let us know what works and what doesn't.

curl -sSL tur.so/install | sh

#Other Exciting Updates in v0.1.2

While CDC is the headline feature, v0.1.2 brings many other improvements:

#🔍 Vector Search Support

We've added Euclidean distance support for vector search, opening the door for AI and similarity search use cases right in your database.

#Expanded Language Support

  • Dart bindings: Initial implementation for Flutter developers
  • JavaScript bindings: Added Database.open functionality
  • Java bindings: Now works with JetBrains DataGrip
  • Python bindings: Better connection lifecycle management

#Performance Optimizations

  • Binary search implementation in find_cell() for faster lookups
  • Efficient record comparison and incremental parsing
  • Optimized schema parsing

#Reliability Improvements

  • Schema rollback support for safer migrations
  • WAL checkpointing synchronization
  • Better handling of edge cases when rowid reaches i64::MAX
  • Extensive bug fixes from I/O fault injection testing

#Testing Infrastructure

  • Imported subset of SQLite TCL tests for better compatibility
  • Antithesis integration for advanced testing scenarios
  • Improved fuzz testing in CI

#Get Involved

We're building this in the open and welcome contributions. Whether it's testing edge cases, suggesting API improvements, or submitting PRs, your input will shape how CDC works in Turso.

Check out the implementation PRs to see the technical details, report issues, or contribute code. Together, we can make CDC a killer feature that sets Turso apart.

Ready to capture some changes? 🚀

curl -sSL tur.so/install | sh