Make the most out of your Turso usage allowance, and boost the performance of your SQLite queries at the same time, using indexes and triggers.
Not long ago, Turso announced pricing and usage plans that include a free Starter plan and a paid Scaler plan. There are allowances for the number of databases and locations in each plan, which are easy to understand. But these three observable metrics are more difficult to predict and optimize:
The implementation of these observations is rooted in SQLite internal details, and you can get some details about how Turso accounts for them in the billing documentation. It's mostly straightforward stuff, but there are a couple things pointed out by the documentation that might be surprising at first:
The documentation also gives some advice for reducing usage on these points, and that's what I want to unpack in this post. With some care, you can maximize the use of your quotas. And the great news is that these tips also help you boost the overall performance of your queries!
The two pieces of advice I have are:
count
, avg
, sum
, etc.If you want to jump straight to a collection of examples, follow along with the SQL code on GitHub.
A full table scan describes what a SQL database does when it can't use an index to find the specific rows needed for a query: it must examine the column data in each and every row of the table to find them. As you can imagine, this is wildly inefficient when a table becomes large! With Turso, a table with N rows that requires a full table scan always incurs a full N row reads, even if you only need a few of them. So obviously, we don't want to do that.
The SQLite query planner documentation discusses full table scans, if you want to dive into the gory details.
SQLite provides a statement to help you analyze a query and determine if it's going to do a table scan: EXPLAIN QUERY PLAN
(EQP). It's easy to use. Just prefix your SQL query with those keywords, and it outputs some information about the indexes (or lack of index) that the query would use. For example, to find out if a select statement uses an index, run this:
EXPLAIN QUERY PLAN
SELECT columns FROM tablename WHERE filter = "value";
If the output of the “detail” column contains only the text “SCAN tablename”, then it's going to do a table scan, and incur one row read for each row in the table. But if you see the text “USING INDEX”, it'll efficiently use an index. For more complicated queries, the output gets more difficult to understand. Note that use of a “COVERING INDEX” can improve the performance of a query, but not necessarily the cost of its reads or writes. You can learn more in the SQLite documentation on EXPLAIN QUERY PLAN.
(One exception to the above: if you explain the query SELECT 1
, you'll see the text “SCAN CONSTANT ROW”. This is obviously not doing a table scan since there is no table, but rather, it is using a constant number of in-memory rows.)
Note: With EQP, the output from the Turso CLI does not match that of the sqlite3 CLI. sqlite3 massages the underlying tabular data to make it easier to read. You might prefer using that instead of Turso for the purpose of analyzing queries against your schema. The results will be the same. In this post, I'll show the Turso CLI output.
Imagine you have the following table and rows representing users, their group membership, and a score:
CREATE TABLE example (
uid TEXT NOT NULL,
gid TEXT NOT NULL,
score INTEGER NOT NULL
);
INSERT INTO example VALUES ('a', 'g1', 10);
INSERT INTO example VALUES ('b', 'g1', 15);
INSERT INTO example VALUES ('c', 'g2', 5);
INSERT INTO example VALUES ('d', 'g2', 10);
With that, say you want to understand if the query SELECT * FROM example WHERE uid = 'a'
performs a table scan to get the one matching row. You can run this:
EXPLAIN QUERY PLAN
SELECT * FROM example WHERE uid = 'a';
And you'll see the following output:
ID PARENT NOTUSED DETAIL
2 0 0 SCAN example
I see “SCAN”! That's not great. This query will incur 4 row reads just to get the one matching row. We can improve that by adding an index on the uid
column used in the WHERE
clause. Assuming that uid
is actually supposed to be unique:
CREATE UNIQUE INDEX example_uid ON example (uid);
If you run the same EQP again, the output changes:
ID PARENT NOTUSED DETAIL
3 0 0 SEARCH example USING INDEX example_uid (uid=?)
“SEARCH” using the new index is what we want to see! This query now incurs only 1 row read.
But how about this filter on group “g1”?
EXPLAIN QUERY PLAN
SELECT * FROM example WHERE gid = 'g1';
I won't even bother showing the output — we know this is doing a table scan because there is no index that supports filtering on the gid
column. A new index is required to reduce the number of rows read:
CREATE INDEX example_gid ON example (gid);
Run the same EQP after creating the index, and see that it's all good now.
Do you think we need another index on the score
column in the example table? An index would help the runtime performance (and memory consumption) when filtering or ordering on that column:
SELECT * FROM example ORDER BY score;
But the index wouldn't affect the number of rows read. If you never need to filter or sort values in a column, then the index is useless. Not only that, the index incurs additional storage to your overall usage, and slightly reduces the performance of all inserts and deletes on the table.
With Turso, there is a tradeoff between cost and performance when it comes to indexes. Before creating an index, you should consider if that index is going to be useful for your expected queries, and therefore justify its cost.
Aggregate functions (such as count
, avg
, min
, max
, and sum
) always incur one read for every row that was considered in the query. This makes query like SELECT count(*) FROM example
increasingly expensive as the table grows. If your app needs to perform aggregate queries often, you should consider taking steps to reduce that cost.
There are two viable options here, and they both involve maintaining a pre-computed tally of the required aggregate values as the table data changes, so that the aggregate costs only one row read instead of many.
Let's say you want to maintain a count of rows for some tables in your schema. You can create a new table to store the precomputed count in a single row:
CREATE TABLE table_row_counts (
table_name TEXT UNIQUE NOT NULL,
row_count INTEGER NOT NULL
);
– initialize each table with a count of 0
INSERT INTO table_row_counts VALUES ('example', 0);
After that, keep it up to date as you perform inserts and deletes on the rows. Typically you would do this in a transaction so there is no chance of an interruption that would cause the count to get out of sync. The pseudocode for the transaction goes like this:
DELETE FROM example WHERE uid = 'b'
)UPDATE row_count SETrow_count = row_count — $rows WHERE table = 'example'
)This places the burden on all of your app code that modifies the row count of a table to keep the count up to date. If you miss insert or delete somewhere, your data will be incorrect. Fortunately, you can automate this record-keeping entirely using a SQLite trigger.
SQLite triggers are database operations that are automatically performed when a specified database event occurs. You can set up a trigger that modifies the aggregate value on every insert or delete without having to manage anything at all in your app code.
Taking the example above with table example
to be counted and table_row_counts
maintaining the count, you can set up triggers to automatically keep table_row_counts
up to date when anything inserts or deletes happen in example
:
CREATE TRIGGER insert_row_count_example
AFTER INSERT ON example
FOR EACH ROW
BEGIN
UPDATE table_row_counts SET row_count = row_count + 1 WHERE table_name = 'example';
END;
CREATE TRIGGER delete_row_count_example
AFTER DELETE ON example
FOR EACH ROW
BEGIN
UPDATE table_row_counts set row_count = row_count - 1 WHERE table_name = 'example';
END;
Assuming these triggers were created at the same time as the example
table, you can be sure that table_row_counts
stays up to date with the latest counts.
The above example works well when all you want to know is the row count for the entire table. If you want a filtered count, such as the following query that gets you the number of users in a specific group:
SELECT count(*) FROM example WHERE gid = 'g1';
The row_count
table above will be of no use here. You'll need another table to store the aggregates for each group, and another pair of triggers that keeps them up to date. You can see that pre-computing aggregates gives you a big speed and cost boost, but you have to know which aggregates your app needs ahead of time in order to get the benefit, then design your schema for those use cases.
You can see a specific example of this, and more examples of triggers that maintain precomputed aggregates, in the companion repo for this post.
So you've decided to maintain aggregate values in order to save on your Turso usage: good call! Just be aware that there are some related costs, and you should account for these when it comes time to estimating your total usage in production.
Firstly, you should be aware that each table operation that requires maintaining another precomputed table with aggregates incurs additional writes for that maintenance. So, you are saving on the cost of reads at the expense of additional writes. This makes a lot of sense if you intend to do a lot of reading and not as much writing on that table. But for write-heavy tables with a need for fast, frequent aggregates, you might end up incurring more usage than is beneficial. You'll have to decide ahead of time if this strategy actually helps your actual use cases.
Secondly, it makes your write operations perform slightly worse. If write performance is important for your app, you might want to run some benchmarks to make sure the additional writes from the triggers don't end up slowing down your app's perceived performance too much.
As with many decisions in life, there's no one-size-fits-all approach for all situations for all time. You'll have to decide if it's beneficial to take time to improve the usage and performance of your aggregates, at the cost of additional writes and slower writes. Turso (and SQLite) excels with read-heavy workloads, and not so much with write-heavy. So you'll have to take a guess at what your application actually does in production to figure out what the actual savings is going to be.
Hop on over to the code on GitHub that contains plenty of examples of the use of indexes and triggers that maintain aggregates (especially filtered counts and averages) that are cheap to query.
If you're interested in talking about this with other Turso enthusiasts, I invite you to also join the Discord and post your questions in the help channel. We love a good technical discussion!