Register now for early access to the new Turso Cloud. Join the private beta

Turso Sync: a much, much, much better way to sync

Glauber CostaGlauber Costa
Cover image for Turso Sync: a much, much, much better way to sync

Turso, a full rewrite of SQLite in Rust, includes the ability to have local databases that are synced to and from a server. libSQL, a fork of SQLite, and the first iteration of our dream, has a similar feature called "Embedded Replicas". Our users keep getting confused between the two. And because our docs and official communications still recommend libSQL (for now), often conservatively, it adds to the confusion.

I am here today to dispel that confusion. I will do it by making the blanket recommendation that regardless of the situation and use case, you should be using Turso, not libSQL, if you are using sync. We are finally at the point in which I am comfortable making this recommendation and I am so happy this day has come.

My mom taught me not to curse. So before telling you the reason for this, I'll have to start by apologizing to her. Sorry mom. And now that I did, here's the reason:

The reason we are making this recommendation is because syncing on Turso is so fuc***g good.

First I will tell you why, then I will show you just how amazeballs it is.

#Embedded Replicas: the first iteration of the Holy Grail

When we introduced Embedded Replicas, our users immediately bought into the concept: read locally (microseconds!) and keep your local database synchronized with whatever is in the cloud. Infinite scalability, Insane performance. What's not to like?

It is a great idea, but the practical implementation in SQLite was plagued with issues that were too much to handle. Most of those limitations were downstream of a single fact: There is no good way to have a logical stream of changes in SQLite. So we built our replication protocol on physical pages.

But replicating physical pages has a lot of problems. The main ones being:

  1. It can be wasteful: Databases work in blocks of data called pages. If you are making small changes, but have to fill up a whole page, that is a lot of data to send. Sparse pages compress well, but it is still a lot more data than just the actual data.
  2. There is no visibility on what changed: Once something is committed, you don't hold enough information to know how that data was originated. Hard to replay events and resolve conflicts. The main consequence of that, is that Embedded Replicas by default still sent writes to the remote database, keeping just the reads local. It was possible to disable this behavior, but that increased the rate in which sync errored out (see the next point). The lack of a truly local write experience frustrated many users.
  3. Keeping the pages always applied in the same order for both the Cloud and the Local database was challenging, especially on the face of database checkpoints, an operation that folds the most recent pages into the database's main data structure. A consequence of that, is that the local database often diverged, and had to be re-bootstrapped from the cloud, a process that was both wasteful and frustrating.

So there you go: one wasteful, one frustrating, and the big champion, being wasteful and frustrating.

#Writing Turso

I was always incredibly sad to see users struggling with libSQL's embedded replicas. The only other time I remember being that sad is when I asked my now wife out, and she said no. But I took that as a life lesson: I changed my approach and tried again. And then she said yes. Now we are married, with four kids. If I could turn that sadness into a love story, I was sure I could turn a bad start on Embedded Replicas into a love story too. So we changed our approach.

We had many reasons to rewrite SQLite. But doing sync right was at the very top of the list. Turso improves on SQLite in a lot of ways. It is fully async (which allows us, for example, to greatly improve replica bootstrap time with partial sync: bring in just the pages that you need to serve your query). But the one that matters is Change Data Capture, or CDC. Turso keeps track of everything that changed in the database. This allows us to send just the logical changes to the remote server: this is faster, more efficient, and way less error prone.

Another consequence of being able to transfer logical changes, is that the protocol can now be split in two: instead of a sync() function, Turso implements push() and pull(). That contributes to the efficiency of the protocol, and gives you more control over how and when to exchange data.

#Just how good is it?

That Turso is more reliable is a fact that you will have to just experience for yourself. But faster and more efficient is something I can show you now. I ran five benchmarks comparing the two approaches. The code can be found here so you can try it for yourself.

#Test Setup

All benchmarks ran on a VM close to our database us-east-1 (but not on AWS, so we can introduce some good old public network for added realism).

  • Node.js: v22.22.2
  • Database: Turso Cloud (us-east-1)
  • Packages: @libsql/client@0.17.2, @tursodatabase/sync@0.5.3

Embedded Replicas have a setting, "readYourWrites", that guarantees that every time you insert anything, that goes into the Cloud right away. We have that on or off depending on what we want to show in the benchmark.

#Sequential Inserts (3,000 rows)

The most basic write pattern: insert rows one at a time. readYourWrites is disabled for Embedded Replicas so you can have the feeling of local writes. Then we do one big sync() at the end. The results:

MetricEmbedded Replicas@tursodatabase/sync
Time152s17s
Network traffic34.6 MB2.1 MB

8.9x faster. 16.3x less data.

#Read-Your-Writes (200 cycles)

Insert a row, then immediately read it back. Repeat 200 times. readYourWrites is enabled, which is how most people ended up consuming Embedded Replicas. For Turso, both the write and the read are local — the push happens once at the end.

MetricEmbedded Replicas@tursodatabase/sync
Time48.6s156ms
Per-cycle avg243ms<1ms
Network traffic2.7 MB149 KB

312x faster. 18.4x less data.

I mean, what do I even say here. It's like Embedded Replicas are Caligula and Turso is Marcus F. Aurelius. No further comments.

Ok, actually, there is a further comment. You could argue that this benchmark is unfair: every Embedded Replicas request goes through the network (because readYourWrites forces a round trip on every write), while Turso writes are local. So we're really just measuring the effect of batching 200 network calls into one. And from a strictly amount-of-work point of view, that's true.

But there are two things to consider. First, what we've just shown is exactly how our users want to use local databases: do a lot of activity at the speed of a local write, and then push. That's the whole point. The old model existed because passing physical changes made it impossible to stitch changes together. Any write, even for unrelated tables, would create a conflict because the physical pages would be different.

Second, even if we push after every single write, Turso is still way faster:

MetricEmbedded Replicas@tursodatabase/sync
Time43.7s6.0s
Per-cycle avg218ms30ms
Network traffic2.7 MB1.1 MB

7.3x faster. 2.5x less data. Both sides hit the network on every cycle. Turso still wins because a local write + push is fundamentally cheaper than Embedded Replicas' write + auto-pull. 7.3x faster doesn't sound like a lot, but that's just because we just looked at the other case. It's like the first time I went to Germany: I was constantly annoyed about how late everyone was, and disappointed that this is not the mental image of Germans I had in mind. But then it clicked that this was only because I had spent a week in Switzerland first, and the last person who got late for something in Switzerland (probably 300 years ago) likely was arrested or killed, and their family lived in shame for generations.

#Pull Remote Changes (5,000 rows)

5,000 rows written to the cloud, then pulled to a fresh local database. This should be the best case and simplest scenario for Embedded Replicas. Just start over, fetch the data from the Cloud.

MetricEmbedded Replicas@tursodatabase/sync
Time718ms338ms
Network traffic1.08 MB1.05 MB

2.1x faster. Same data transfer.

There is very little difference in the data transfer: at this point we are transferring an initial existing SQLite database. But the protocol itself can be so much more efficient because the restrictions that were lifted, that even this is faster. And this is the entire database: for libSQL Embedded Replicas you always have to sync the entire database first. Turso allows you to lazy load (because everything in Turso is async), meaning you can start responding to queries right away.

#Summary

If you are using libSQL sync(), also known as Embedded Replicas, we strongly recommend you switch to Turso. We are now at the point in which our compatibility matrix is close enough to SQLite, that we are comfortable making this recommendation. If you do happen to find something that is not implemented yet, then sure, waiting a bit longer is fine.

Try syncing Turso databases today:

npm install @tursodatabase/sync
import { connect } from "@tursodatabase/sync";

const db = await connect({
  path: "./my-app.db",
  url: "libsql://your-db.turso.io",
  authToken: "your-token",
});
await db.connect();

You don't even need a Turso Cloud account, because the Turso CLI ships with a local sync server:

npx turso@latest --sync-server 0.0.0.0:8080

Our docs have more about it here.

And if you see your agent trying to install libSQL, you can calmly explain that you'd like to use Turso, not libSQL, and mention the packages by name. We have anecdotally found that the occasional expletive helps.