Reviewing the Turso Beta week

Last week, we announced that the project we started earlier this year, Turso, has reached its first beta release.

We called it a beta week because we released 5 features that set us apart from SQLite: native browser support, support for sync, encryption, concurrent writes, and live materialized views.

This is a huge milestone. So instead of just reviewing the week, I wanted to spend some time now reviewing the motivation we had in starting such an ambitious project: why we need to embrace databoxes in addition to our data lakes, which capabilities modern databoxes need to have, and why databases will be everywhere.

#SQLite has the right shape for the age of agents.

Saying that AI will “change everything” sounds like a cliché. But it doesn’t make it any less true. Skepticism around the valuations of the companies involved is a different discussion (despite my personal bull thesis), and one that I will leave to more competent people. But looking at what the Internet did to our field can serve as a north star: yes, there was widespread delusion and a bubble that popped, but computer science, and in fact the world, was irreversibly transformed.

In the early 2000s, people reached out to the technologies they had available to them to build the future. Sun, HP and IBM dominated the Unix field. Oracle was the database of choice. “The computer” was larger than a pickup truck, and the kind of people buying software was radically transformed. As pressure mounted, we had to come up with new versions of those old concepts. Versions that were more in line with the demands of the Internet era.

I believe something similar is happening now. And as usual, data is at the center of it. At the end of the day, there is no computing without data. We now live in a world of data lakes, data warehouses, and data gravity. That has the ideal shape for the Internet world, where there is a single entity, the “application” that needs to access “the data”

Agents are fundamentally different. Some companies are moving towards agents as an extension of their employees. Like interns you can delegate a task to. And LLMs are, in fact, much like interns: always eager to help, but oftentimes lack the discernment to tell helpful and hurtful apart.

Stories abound about “ChatGPT deleted my production database”. Computing companies have an easier time offering something helpful here: they place agents in sandboxes. They can do this, because our industry has evolved techniques to cheaply spawn isolated computing environments through lightweight virtualization.

Intelligence is also moving towards physical devices. From phones, to specialized agents like autonomous vehicles, humanoid robots. To be truly useful to us, agents will need to break the digital barrier into the physical world. The physical devices themselves are sandboxes of sorts. A lightweight environment, where intelligence is being run. The data, due to its gravity, is still central. But I believe those sandboxes will have to be matched by databoxes: the data that you need, when you need, to operate the task you need. Isolated, fenced, bounded, so that the agent can work their magic.

As agents become more capable, there will be billions of them. SQLite is the only database that can handle this kind of scale. It counts its databases in the trillions. Each database is just a file, and the database itself is in process. It is the closest thing to a databox that we have.

#SQLite needs to evolve

At the same time, we believe SQLite is not quite there. It is beloved by the developer community at large, due to its ability to fit anywhere. It is simple, yet powerful, and oh boy, it just works.

However, SQLite also comes with a set of limitations. We know those limitations well. For 2 years we have operated a Cloud service, the Turso Cloud, that is backed by SQLite. We have added code to a fork of SQLite so we could add value to our customers. But in this process, we also found out that a lot of those limitations are fundamental. They stem not from SQLite’s shape or how lightweight it is, but by architectural choices that made sense in the age of the Internet, but do not necessarily make sense in the Age of Agents.

In January of this year it became clear to us that such limitations would eventually prevent SQLite from being deployed at scale for agents. We made a decision: patching the edges of SQLite was not enough. We had to fully and completely rewrite it (while keeping full API and file format compatibility). As we announced at the time, we were going all-in.

We have now reached the point where we are ready to show what we are here for. Turso, the rewrite of SQLite is now in Beta. We expect a production ready to be out in a matter of months. But it already has the right set of capabilities to show what modern databoxes need to offer.

During the beta week, we announced the following:

#Asynchronous SQLite, and web browsers

SQLite follows a synchronous architecture. Every time it needs to read data, it blocks everything, until data is available. While that made sense for the old world, users and agents of today operate in environments where things are expected to be asynchronous.

One example of an environment where this makes a huge difference is the web browser. It is hard to execute synchronous code in web browsers, as any blocking will block the entire tab. Because of its asynchronous nature, Turso can run natively inside web browsers, including persisting data locally. As companies start to add agents inside web browsers to take actions on behalf of users, data will have to follow.

You can read more about this here, and play with a live demo at https://shell.turso.tech. Just pick a different database name, and you will see the data persist!

The other advantage of being asynchronous is that it is possible to enhance the local database with targeted over-the-network operations. One example of such operations is to sync data between the local database (be it in the browser or somewhere else!) and a central server - or even another local Turso database. This means that the right data can follow your agents wherever they go. We talked about this here.

#High trust environments

Twenty years ago, encryption was a nice-to-have. Today, it is a requirement. As agents start dealing with trusted data, each agent will be able to hold their own keys, and segregate data from each other. In our beta week, we announced that Turso databases can now be encrypted. All part of the Open Source offering, without the need for any external tool or extension.

#Concurrent Writes

SQLite has a single writer architecture. That means that when the database gets written to, no other changes can happen. This is not a problem for many applications, especially because SQLite is really fast, so it can sustain a very high throughput by any standards. But there are many situations where it is not enough:

When there is compute time interleaved in a transaction. As the agent will go call into a model in the middle of a transaction, the lock is held, meaning that no other transaction can proceed.
When the schema needs to change, or indexes need to be created. As agents experiment with the right data, they may have to evolve it. Those operations are also write operations that will lock the database, potentially for a long time.
When agents need to write from multiple threads within the same agent process, for example to call sub-agents

In our beta week, we announced our experimental support (enabled with a flag) for concurrent writes

#The right data, at the right time

As agents insert themselves into critical processes, their ability to operate with live, fresh, real-time data becomes critical. And because we expect agents to be given slices of data to work with, they need the ability to pluck the exact data the task requires into a box.

To support that, we announced real-time, live materialized views. Unlike standard materialized views, they leverage Incremental Computation, and only need to look at the changes to the data instead of the totality of the data, to keep the view up-to-date. This allows Turso to be added in front of a live stream of data, like Kafka, and then move just the data relevant for a specific task to a certain databox.

#Summary

The Age of Agents will impose very different demands on data than what we had before. In addition to data lakes and data warehouses, we will need a collection of databoxes: small, self-contained per-agent storage to support the explosion in the number of agents that is inevitably coming.

SQLite has the right shape for this, but it lacks key capabilities that modern agents expect. Turso is a full rewrite of SQLite. It has the goal of bridging those capabilities, while keeping full API and file compatibility.

It is now beta. You can try it out now: curl -sSL tur.so/install | sh