Skip to content

Testing Without Mocks

Open the test suite of almost any business application and start counting the mocks. A mock for the repository, a mock for the clock, a mock for the email gateway, a mock for the payment provider. By the time the test finally reaches the code it was meant to check, half of the system has been replaced by stand-ins that you wrote yourself. The test runs, it turns green, and it tells you something. The question is what.

Here is the uncomfortable part: a test that passes because of its mocks can be worse than no test at all. No test leaves you knowingly uncertain. A green mock-driven test hands you confidence that nobody has any right to give you, because nothing guarantees that the mock still resembles the real thing. Mocks are not a testing strategy. They are a workaround for two problems: domain logic that is tangled up with infrastructure, and infrastructure that is too slow to touch in a test. Event Sourcing removes the first. A database that starts in milliseconds removes the second. This is what testing looks like once you stop pretending.

Why We Reach for Mocks

Mocks did not appear out of nowhere. They solve a real problem. In a typical state-based system, your business logic is entangled with the things it depends on. The code that decides whether an order may ship also loads the order from a database, asks a clock for the current time, and calls a shipping service. You cannot exercise the decision without dragging all of that along. So you replace each dependency with a mock, and suddenly the decision is testable in isolation.

That much is reasonable. The trouble starts with what you are left asserting. When behavior hides inside state mutations and side effects, there is often no return value that captures what happened – so the test falls back on the only thing it can see: which methods were called, in which order, with which arguments. You stop testing what the system did and start testing how it was wired.

This is the tautology trap. You tell the mock how to behave, you run the code, and then you assert that the code did what you just told the mock to expect. The test passes by construction. It will keep passing even if the real dependency changed its contract last week, because the mock never got the memo.

And that is the deeper danger. A test that turns green because of its mocks is not neutral; it actively misleads. A missing test leaves a visible gap, and everyone knows the code is unverified. A green test that only confirms its own mocks looks like proof. The problem begins the moment someone realizes that nobody ever guaranteed the mock reflects reality. By then the suite has been vouching for code it never actually exercised, and the bug it was supposed to catch is already in production.

The Domain Becomes a Function

Event Sourcing changes the shape of the problem. A command handler in an event-sourced system does not load mutable state and poke at it. It takes the events that have already happened, looks at the command, and returns the new events that should follow. Its entire job is to answer one question: given this history, what happens next?

We explored the underlying structure in Decide, Evolve, Repeat: a decide function with the signature (command, state) -> events, where the state itself is folded from past events. Whether you express it as a decider, a functional core, or a plain method, the essence is the same. The decision is a pure function of events in and events out. It reads nothing, writes nothing, and depends on nothing but its arguments.

Compare that with the state-based handler from earlier. To decide whether an order may ship, it had to load the order, ask a clock for the time, and reach out to a shipping service – three collaborators, three mocks. In an event-sourced handler those concerns simply are not there. The history arrives as an argument, the time the decision depends on is passed in as a value rather than pulled from a clock it has to call, and talking to other systems is somebody else's job, handled at the edges where no decision is made. There is nothing left in the middle to fake.

A pure function is the easiest thing in the world to test. There is no clock to freeze, no repository to fake, no service to intercept. You hand it values and you inspect the values it returns. There is, quite literally, nothing to mock, because there is no collaborator to stand in for.

Given, When, Then, and Nothing Else

Because the handler is pure, tests fall into a shape that reads like a sentence: given a history of past events, when a command arrives, then expect a specific set of new events. The pattern has a name, and it is the backbone of the testing guidance in our documentation.

Take a small library domain, the same one we have used throughout this blog. A book can be acquired, borrowed, and returned. To check that borrowing an available book works, you spell out the history, apply the command, and assert on what comes back:

given(
  BookAcquired { isbn: "978-1491950357" }
)
when(
  BorrowBook   { borrowedBy: "/readers/23" }
)
then(
  BookBorrowed { borrowedBy: "/readers/23" }
)

There is no database in sight, no mock framework, no test doubles to configure. The events on the left are the entire world the handler knows about, and the events on the right are the whole of its behavior.

The rejection cases are just as direct, and this is where the approach earns its keep. To prove that you cannot borrow a book that is already out, you add a BookBorrowed to the history and expect the command to fail, or to produce a rejection event instead. You are asserting on a real outcome, not on whether some isAvailable flag happened to be read. The test pins down behavior, not implementation. Rename a private field, restructure the internals, switch to another language even, and as long as the events going in and out stay the same, the test stays green for the right reason.

But What About the Database?

That handles the first reason people reach for mocks. The second is heavier: infrastructure is slow. Spinning up a real database for every test is the kind of thing that turns a test run into a coffee break, so teams mock the repository, the store, the persistence layer, and accept the fidelity gap as the price of a fast suite. It is the same gap as before, only now it sits between your code and the thing that actually stores your data.

EventSourcingDB removes the trade-off, because it is built to start and stop in milliseconds. You can launch it with the --data-directory-temporary flag, which gives each test an isolated, ephemeral store: no files left behind, no broker to coordinate, no external dependency to provision. The database your test talks to is the very same database that runs in production, only empty and disposable.

That changes what an integration test can be. Instead of asserting against a fake repository that you hope behaves like the real store, you write events to the actual database, read them back, and verify that they were persisted, ordered, and validated exactly as they will be in production. You can subscribe through the streaming API and confirm that observation, reconnection, and continuation behave correctly. None of it is simulated, so none of it can quietly drift away from reality.

If you want to feel the difference, the Getting Started guide has you running EventSourcingDB in about a minute, and the same ephemeral mode that makes the first launch effortless is exactly what you point your test suite at. A database you can start in a millisecond is a database you never have to mock.

Testing the Read Side by Replaying

So far we have looked at the write side, where decisions turn commands into events. The read side has its own testing story, and it leans on the same idea. A projection takes a stream of events and folds them into a read model. That, too, is a function from events to a result, which means you can test it the same way: feed it a known history and assert on the model it produces.

This is where replay becomes a testing tool. You write a series of domain events, run the projection against them from scratch, and check that the resulting read model matches what you expect. There is nothing to mock, because the events are the input and the model is the output. The fixtures are not rows you seeded into tables; they are the events that would really have occurred.

There is a quiet bonus here. A read model is derived data, which means it is disposable: you can throw it away and rebuild it from the events at any time. A replay test is therefore also a rehearsal for that rebuild. If your projection reconstructs correctly from an empty start in a test, it will reconstruct correctly when you deploy a new version of it against the full event history in production. The test and the real operation are the same motion, run at different scales.

Replay-based tests also guard the part of an event-sourced system that tends to rot quietly: compatibility over time. When you introduce a new event version, you want to be sure that old histories still project correctly, and that yesterday's events and today's code still agree. As we discussed in Versioning Events Without Breaking Everything, this is precisely where event-sourced systems need care, and a replay test over a recorded history is the cheapest insurance you can buy.

Events Are the Best Test Fixtures

Step back and a pattern emerges. In a state-based system, preparing a test means constructing the world: insert these rows, build this object graph, set these flags, and then hope the arrangement is one the application could actually have reached. Much of the effort goes into fabricating a present that merely looks plausible.

In an event-sourced system, the fixtures write themselves, because they are just events – and events are not an arbitrary setup, they are the precise sequence of things that happened. A test built from events is a test built from reality, not from a guess about reality. This is the quality that mocks can never have, and it is exactly why events make such good fixtures: they are the same facts the system records in production.

It pays off most when a real bug appears. Because the same events always produce the same state, a problem in production is fully captured by the events that led to it. As we described in Debugging Event-Sourced Systems, you can copy the offending event stream into a test, watch the bug reproduce on the very first run, and then keep those events as a regression test forever. The incident becomes a fixture.

Try doing that with a mock.

When the Mocks Disappear

Go back to that test suite from the beginning, the one with a mock for every collaborator. Picture it with the mocks gone. The repository mock is gone, because the database starts faster than the mock did. The clock and service mocks are gone, because the decision is a pure function of events. What remains is a set of tests that read like specifications: given what happened, when this command arrives, then these events follow.

Those tests are faster, because there is nothing heavy to stand up. They are clearer, because they describe behavior instead of wiring. And above all they are trustworthy, because nothing in them is pretending. The green bar finally means what you always wanted it to mean: the system does the right thing, verified against the real thing. That is the difference between a test that reassures you and a test that protects you.

If you would like to see this in practice, our guide on testing event-sourced systems walks through unit tests, integration tests, and replay-based projection tests in detail, and Decide, Evolve, Repeat shows the functional core that makes the write side so straightforward to test in the first place.