Aug 23, 2022

Never ever use a database as a message queue. Do this instead.

If you are an application developer and need a full backend, including the data layer, you have two choices: Relying on a backend team may be the solution we all want. But often, especially with…

Glauber Costa

Cover image for Never ever use a database as a message queue. Do this instead.

Modern real-time oriented backends depend on a server side message queue in order to deliver a timely experience to users.
Fast moving application developers skip out on messaging queues because they're hard for them to understand and build, but this is a mistake.
Borrowing from concepts like ORMs in the database world, higher abstraction can help fix common architectural issues while reducing complexity.

If you are an application developer and need a full backend, including the data layer, you have two choices:

Rely on a backend team.
Roll your own, with tools that bring the level of abstraction closer to you.

Relying on a backend team may be the solution we all want. But often, especially with nascent projects, we can't afford it.

Developers want to push something production-ready quickly, and must work at a high level of abstraction, which is why we have things like Firebase, or tools like ORMs to simplify data persistence.

In this post we'll show how the lack of good abstractions leads to build-your-own backends that lack a key component that is present in most modern backends: a message queue.

#But what is a backend?

If you ask most engineers, a backend consists of a database, a business logic layer to mediate access to that data, and user authentication.

But if you ask modern backend engineers, the answer differs. Yes, you have all that, but there's a major component missing in this picture: a message queue.

From newer options like Redpanda and Upstash, to established systems like Kafka, message queues are an integral part of most professional backends in use today.

#Why message queues?

Users increasingly expect a real-time experience. For use cases like order flows, webhooks, and user tracking, users expect to be able to see the new data in the user interface instantly, instead of having to wait for some background batch processing to periodically refresh.

Many things that underpin the real-time experience are event-driven in nature, and map naturally to a message queue.

Let's look, for example, at the issue of processing webhooks: your backend service receives a stream of HTTP events. They have to be processed usually (but not always) in order. Events that arrive cannot be lost, and they must be processed exactly once. Processing an event usually means:

Transforming it to an internal format (usually json-based) that makes sense for your application.
Processing the event (with side effects), which may involve invoking other web services or APIs.
Saving the end result of that event in a database.

Stream-processing systems and message queues are the right solution to building responsive applications that users experience as real-time. However, if you are an application developer cooking an amazing prototype for your new cool idea, you are likely using a database instead.

#Can't I use a database as a message queue?

Message queues can be notoriously hard to manage, scale, and is yet another programming model that you have to learn.

If you have never dealt with this problem before, a naive database-based implementation would be like the one below:

Handling events is error-prone. The naive solution won't work reliably in the face of failures.

That may work for the initial testing, but now we have to deal with the failure modes and delivery semantics:

If we fail to transform the data, fail to process, or fail to save, the event may be lost.
Retry policies may increase reliability. But what if the code just crashes?

You can certainly keep piling up patches to these issue, but ultimately, the time comes to move to a more serious architecture:

With some specialized knowledge, we can have something slightly better. Still, may have issues scaling the event handlers, may be slow to react, etc. Is that how the pros do it?

Now your initial webhook handling code does nothing but save the event. Simple, atomic, worry-free.

Later, with some background batch process you can read pending events, transform, process, and when you are confident they were handled, delete the event from the queue.

This works reasonably well, and takes you further. But also has problems:

You are now polling the queue, and getting the polling period right is problematic.
You can make it better with database triggers, but as your list of pending events grows, picking them up adds too much load to the database.
You can add more indexed fields to speed up polling, but… well, you are now back to requiring specialist-level knowledge to operate this well.

But when we look at tailored backends in the open, especially in architectures for FAANG, MAMAA, or whatever the acronym is until the next time Zuck decides to pivot Facebook, things look different. More often than not, you will have a message queue handling those events:

Ah, now we're talking! Give me Kafka or give me death! — Patrick Henry (maybe)

If message queues make so much sense, why are they left out of the first implementation of application backends?

Because database patterns are well understood. Hosted database products are everywhere. ORMs that abstract the database are prevalent, and do their best to hide implementation details. The situation is very different for message queues. I can't think of a single, recognizable, brand name ORM-like library that will abstract Kafka in a way that makes sense for application developers.

Using a message queue is important… but not the right abstraction level for most application developers.

#ChiselStrike raises the abstraction level

ChiselStrike provides a high level abstraction layer over an entire backend that can take you from prototype to production. You can focus on coding what you need, at an abstraction level that you understand, and it'll handle the mapping to backend components.

In an earlier post, we discussed how this enables you to write pure TypeScript, and our compiler worries about generating queries, indexes, and anything that you may need to handle your database load without worrying about database concepts.

With our new 0.12 release, we finally take an important step towards our vision of bringing the power of modern, robust backends to application developers: you can now connect a Kafka-compatible streaming service like Redpanda and Upstash to ChiselStrike, and write TypeScript to implement any logic you need to run on that queue. This still assumes the presence of a message queue, but already unlocks important use cases where your backend is now part of a broader architecture that includes a message queue. In future releases, we'll also internalize the message queue, the same way we internalize the database today, allowing you to not even worry about deploying it at all.

#Where to go from here?

ChiselStrike bridges the abstraction gap by adding message queues to the fold, and even allowing you to use the same types interchangeably between your database code, and your events code.

If you want to try the example in this post by yourself, here's a complete setup, including instructions about publishing to a local Kafka topic.

Share article