Databases will be free

Databases are very expensive. In fact, they are one of the most expensive parts of any infrastructure operation. That’s why conventional thinking centers on the use of a single large monolithic database accessed by many clients - it’s the only scalable way to do it.

I firmly believe that in a not too distant future, their cost will drop down to the point individual databases will be essentially free, and that conventional database infrastructure models will flip on their heads - towards a microdatabase approach. This is a future that was taking shape already, but was accelerated by the rise of AI agents that need thousands of ephemeral (disposable) databases at scale for what is known as agentic memory.

With a strong belief in this future, our internal mission at Turso has been to drive the cost of a database to zero. This north star is what drives investment initiatives like our Deterministic Simulation Testing framework for massive multitenancy.

In this article, I will discuss the trends supporting this future, their consequences, and use cases free databases in potentially limitless numbers enable.

#Paradigm shift

When things are cheaper by orders of magnitude, certain patterns that were attractive but cost prohibitive are catapulted forward. One example that comes to mind is the rise of personal computers in the age of mainframes.

When a computer was something bulky and expensive, the idea of using different computers for different purposes would have been seen as absurd. In fact, Thomas Watson, at IBM, famously said that he anticipated a market for maybe five computers to be sold around the world.

What would later become clear is that there was a huge demand for many things that could be done by a computer. Personal gaming, educational aids, spreadsheets for small businesses, etc. It is just that the value of those use cases were lower than the price tag of the computer.

Database companies today are oriented towards building one component that can sustain a million-dollar a year use case. Which means they build technology to match a world where provisioning a database is an important event.

A high price tag on the building block leads to centralization. There is pressure on the organization to have only a few databases – maybe development and production databases. As a result of centralization, they become a shared resource that has to have access restricted to a subset of the people, since anything that breaks would disrupt the entire organization.

There is market pressure to add increasing levels of resilience to this one database where all the data is, making databases even more expensive and completing the cycle.

#Why are databases expensive?

From the supply side of the equation, why are databases expensive?

Databases traditionally follow a client-server model. They are accessible over the wire, in a server that needs to be always running to serve requests when the client needs them. The performance of those requests is business critical, as the latency of data access is an indicator of a good user experience.

In this world, creating a database is an expensive operation. It often involves the creation of VMs, Kubernetes Pods, or other structures to host the database. Those resources need to be always on, which incur fixed costs. Replicating the database has similar costs, since you need a standby replica of a similar size of your main process just in case you need to switch to it.

#Where is my data?

Having a central database also means that it is much harder to move data closer to the user, to support embedded, local-first, and on-device use cases. Since all data is shared, syncing that data towards place of usage requires the use of a sync engine that is smart about which data to sync, and introduces security issues. It complicates the infrastructure instead of simplifying it.

#Demand

But what if every developer in the company could interact with their own personal microdatabase, that would contain a replica of a portion of the dataset? The demand exists, since it would allow organizations to move faster as developers become more independent, which leads to a shorter time to market. But the cost of maintaining an entire database per employee has not been worth the cost.

Another use case is compliance and privacy. Much of the data stored in databases is user data that companies collect. If data could be segregated in different databases for each different user, privacy is increased, and compliance comes easier. And this is more value than cost for some users – think of your most active or important customers, but not for the long tail of occasional users that most companies have to deal with

Microdatabases with per-user data also allows for the data to move as close as possible to the user, all the way to their mobile devices or browsers

#The AI factor

With LLMs, engineers are generating entire applications from English-language prompts. And even the ones that won’t go as far, will use LLM-assisted tools. 25% of Google’s code is today generated by AI

The trade off with those tools is: Software is faster to write, but it is more error prone and will need more iterations to get right. The pressure is to get those iterations to be as fast as possible.

Ideally, you would like to allow LLM-generated code to access your database, which is feasible if you could give it exclusive access to the database. But provisioning a database for each iteration is challenging. There is value in the aggregate, but there is not enough value on each item to justify provisioning a database for each of them.

Generative AI also increases the demand for databases on-device, as agentic workloads with the aid of vector similarity search need to operate on slices of data that users want to be completely private.

#Buy them in bulk

If the value of a database falls to zero, does that spell doom for database companies? My prediction is that there’s no case for concern for those who meet the new demand and change their form factor accordingly.

If the cost of a database drops down to zero, the best form factor is no longer a database, but a collection of microdatabases. There is still tremendous value to be captured in the aggregate.

#A file-based database is the answer

To be able to support a world where databases are free, you need to be able to create or delete a database in less than 50 milliseconds. This is close to the threshold where one would feel comfortable creating a new database during a web page load.

The majority of the individual databases will be seldom used, so the system needs to be able to respond to requests without cold starts regardless of how infrequently the database is used, with at most the added latency of a cache miss. And it needs to be able to bill requests individually, so that there are no charges for databases that are completely inactive.

Once the databases are small enough, it also needs to be able to make replicas of those databases individually, to serve users where they are.

A model for this is the one pioneered by SQLite: there is no “database” as in, a database server. The database becomes part of the application, so that there is no fixed infrastructure to run.

Instead of being thought of as the combination of the storage layout and the code, the database itself is now just the storage, which can be represented in a flat file, and the engine is just made part of your application. This is known as an embedded database.

#Conclusion

Free and plentiful databases that can be moved around easily will turn conventional thinking about database architecture on its head, from one large monolithic database to a collection of microdatabases. This will greatly simplify backend infrastructure permissioning, data isolation, geographical compliance and latency issues. We’re here to build that future and we invite you to join us at turso.ai and try it for yourself - our generous free tier offers 500 databases for free, and it quickly scales up to unlimited databases from there.