The next evolution of SQLite is here! Read Announcement

Introducing Real-Time Data with Materialized Views in Turso

Glauber CostaGlauber Costa
Cover image for Introducing Real-Time Data with Materialized Views in Turso

Today we’re launching something that changes how you think about real-time data: Materialized Views (currently experimental). Some of you who are familiar with Materialized Views in RDBMS are likely asking yourselves: "Wait, what do Materialized Views have to do with Live data?". That is a fair question: Traditional Materialized Views are just a way to take a snapshot of an expensive query on-disk.

But these aren’t your typical Materialized Views. With the beta release of Turso, we are launching Live Materialized Views

#How Materialized Views usually work.

Materialized Views are a way to materialize the results of an expensive query, so that if the query is needed again, the database does not need to recompute the results.

Many RDBMS support them. SQLite, in fact, does not. The first major news then is: Turso, improves upon SQLite and is today adding experimental support to Materialized Views.

But traditional Materialized Views have a problem: precisely because the queries that they represent are usually expensive, keeping them up-to-date is very expensive. Logically, the only two choices are:

  1. pay the price at update time. Writes pay the price of recomputing the view, but are now very slow
  2. keep the materialized views stale, and require them to be periodically recomputed.

Live Materialized Views on Turso are different: they are based on new research on Incremental View Maintenance (IVM), and uses the DBSP framework to make sure that data is kept up-to-date as the base tables are updated.

This allows Turso to be used in streaming and other real-time systems, fronting Kafka queues, CDC logs, and other system update sources, while always providing realtime intelligence on the data.

#Incremental Computation

Incremental computation is, at least in principle, simple: it is the ability to reevaluate a computation (in the case of IVM, the result of a materialized view), in time proportional to the size of the changes to the system, instead of time proportional to the size of the base tables.

That means that the time spent to recompute 1MB of changes being applied to a 1MB table should be roughly the same as the time spent to recompute 1MB of changes being applied to a 1TB table.

DBSP is a powerful theoretical framework for IVM. Its power comes from the fact that it demonstrates that complex languages, like SQL, can be compiled to DBSP. DBSP introduces operators (some stateful, some stateless), and joins those operators in a Directed Acyclic Graph (DAG). The delta, or difference between the existing state of the tables and the new state of the tables (in SQL parlance those are your inserts, updates, and deletes) transverses that circuit, and the end result is a delta that can be applied to the existing Materialized View to bring it to its new state.

Too complex? I agree. So let's look at a simple example:

Imagine the following query:

SELECT role, count(*), avg(salary)
   FROM employees
   WHERE company_id = 123
   GROUP BY role

This query displays the number of people in a particular role and what is the average salary for that position. It uses three operators: the filter operator, to filter only results that belong to company_id `123, the projection operator, to transform specific columns in the output, and the aggregate operator. They are connected together in a topology like the one shown below:

#Transactional Guarantees

Because maintaining views is cheap, Turso is able to provide transactional guarantees to Materialized Views, at the same isolation level as the rest of the database.

For example, a view will be seen as updated within a transaction:

turso> BEGIN;
turso> select * from v;
┌───────┬───────────┬──────────────────┐
│ role  │ count (*) │ avg (salary)     │
├───────┼───────────┼──────────────────┤
│ cto   │         1180000.0 │
├───────┼───────────┼──────────────────┤
│ ceo   │         1200000.0 │
├───────┼───────────┼──────────────────┤
│ sweng │         3163333.333333333 │
└───────┴───────────┴──────────────────┘
turso> INSERT INTO employees
       (name, role, salary, company_id) VALUES
       ('cracked dev', 'sweng', 500000, 123);
turso> SELECT * FROM v;
┌───────┬───────────┬──────────────┐
│ role  │ count (*) │ avg (salary) │
├───────┼───────────┼──────────────┤
│ cto   │         1180000.0 │
├───────┼───────────┼──────────────┤
│ ceo   │         1200000.0 │
├───────┼───────────┼──────────────┤
│ sweng │         4247500.0 │
└───────┴───────────┴──────────────┘

But upon a ROLLBACK, the data is no longer there, as ACID requires:

turso> ROLLBACK;
turso> SELECT * FROM v;
┌───────┬───────────┬──────────────────┐
│ role  │ count (*) │ avg (salary)     │
├───────┼───────────┼──────────────────┤
│ cto   │         1180000.0 │
├───────┼───────────┼──────────────────┤
│ ceo   │         1200000.0 │
├───────┼───────────┼──────────────────┤
│ sweng │         3163333.333333333 │
└───────┴───────────┴──────────────────┘

#The state of Live Materialized Views on Turso

Turso implements the follow operators from DBSP:

  • Join used to implement SQL Join,
  • Filter, used to implement WHERE clauses
  • Aggregate, used to implement SUM, COUNT, AVG, MIN, MAX (with more to come)
  • Merge, used to implement UNION

The delay operator is used to implement recursive CTEs and it is not yet implemented. CTEs in general are still not yet implemented.

The on-disk format of Materialized Views is not stable. Future versions of Turso will not necessarily be able to open views created by older versions of Turso. Materialized Views are not production-ready, and are not eligible for bounty payments in the Turso bug challenge