A Look Back on BugBash 2026

Last week our CEO Glauber Costa and I attended BugBash 2026, a software correctness conference hosted by Antithesis. And while he got nerd-sniped and spent the whole conference on his computer, I had a chance to attend some fascinating talks. This is my retrospective on this great conference, and some thoughts on what it all means for the industry.

#Amdahl's Law, Coding Agents, and Testing

Will Wilson started the conference with a realization that the correctness industry is going mainstream. Google search trends for formal methods, deterministic simulation testing, and correctness all show peaks in 2025.

What I found interesting was the parallel he drew with Amdahl's law, a formula that describes the theoretical increase in speed when optimizing an algorithm. It shows that the gains in speed depend on how much time is spent in the segment being optimized.

Will's thesis is that coding agents speed up coding itself, leaving more time for testing, and that testing is therefore where the measurable gains will be in the foreseeable future.

At Turso, we're proud to be ahead of this curve. We've invested extensively in deterministic simulation testing, differential testing, various deterministic fuzzers written in Rust, and sqltest, a DSL for example tests. We're also exploring formal methods to verify complex algorithms like VACUUM.

This division of time between coding and testing gets the point across that engineers will likely be thinking about testing more than they used to. But as some attendees have pointed out, it doesn't account for the thinking part of software engineering. I expect that a similar argument will be made for the optimization of decision tools in fields like industrial engineering.

#Some Thoughts on Correctness

BugBash is a conference focused on correctness, with many attendees focusing on areas like property-based testing and formal methods. Something I felt during the conference was a general feeling that finally, the software engineering community was starting to care about correctness. And this is where I have to disagree. Not with the sentiment, but with the impression that the engineering community didn't care, or didn't care enough, about correctness.

Correctness, just like availability, is a goal or a measure that can be optimized for, with the kinds of tradeoffs that are typical of engineering disciplines. Not all software requires absolute correctness. Nuclear control software, cryptocurrency protocols, critical components of databases, and kernels used in space exploration are all prime targets for heavier verification, because a failure in one of these systems can be catastrophic. But a failure in the 2048 game or in an online library catalog has relatively little consequence. Good engineering is about understanding and exploiting these tradeoffs to balance safety, profitability, and opportunity.

Last year at BugBash, Ankush Desai shared his conviction that so-called semiformal methods—anything that gets you to reason about code before implementing it—is already a good start on the way to correctness, something I heard this year again from multiple speakers. Since the term "spec-driven development" seems to have gained some traction, I hope that it will be understood as an invitation to reflection and careful design, rather than a superficial invitation to Shift-Tab to Plan Mode in your favorite coding agent.

#From My Conference Notes

I took a lot of notes. Here are some notable ones:

LLM hallucination is mathematically inevitable, and self-validation can't fix it. This was demonstrated by Xu et al. (2024). It means that whatever frameworks we use to integrate LLMs in the future will need to be resilient to hallucinations.
Gabriela Moreira, CEO of Quint, suggested the following algorithm to evaluate the potency of a testing rig. First, think of the rarest situation you can imagine about your system. Second, see if your tools can reproduce it. Third, assess how much time or depth it took.
Formal methods researchers acknowledge that formal methods won't truly go mainstream until the most expensive option is to not use formal methods. In recent years, multiple verification languages have sprouted to bridge the gap between mathematicians and engineers. LemmaScript lets you write verification hints in TypeScript comments, which then compile down to Dafny. FizzBee has a syntax similar to Python. Quint layers a more intuitive syntax on top of TLA+. Aretta seeks to make verification more accessible to LLMs. The cost of verification is still too high for many applications, but the threshold is constantly being lowered, through innovations (new languages) and shifting power balances (LLMs writing more code).
Peter Alvaro shared a tracing and fault injection tool he worked on called Box of Pain. This is a good metaphor for what a test harness should be in relation to the implementation. I will now be thinking of my tests as little boxes of pain for my code.

#Final Thoughts

This was probably my 10th software conference. I usually try to go to as many talks as I can, but this was the first conference where I felt that I had been missing out on all the human connections. So on day 2, I spent some more time in the so-called hallway track, trading ideas with fascinating people. These exchanges are what I remember most.

One final note: the talks from BugBash 2025 are all online.