Skip to content

Sagas vs Process Managers

The moment a workflow spans more than one aggregate or service, someone will say: "We need a saga." It has become the default term for anything that coordinates multiple steps in an event-driven system. Book a flight, reserve a hotel, charge the credit card, and if any step fails, roll everything back. That's a saga, right?

Not quite. What most people describe when they say "saga" is actually a different pattern with a different name, different responsibilities, and different trade-offs. The confusion is not just semantic. It leads to architectural decisions based on the wrong mental model, and those decisions are surprisingly hard to reverse.

What Everyone Calls a Saga

Ask a developer to describe a saga, and you will likely hear something like this: a central component that listens for events, keeps track of where the workflow is, and sends commands to the next participant when the previous step completes. If something goes wrong, it coordinates the rollback. It is the brain of the operation, the thing that knows the overall plan and makes sure every step happens in the right order.

This description is clear, intuitive, and widely shared. It appears in blog posts, conference talks, and framework documentation. It matches what most orchestration frameworks implement when they offer a "saga" feature. The only problem is that it does not describe a saga.

What it describes is a process manager. The distinction might sound like academic nitpicking, but it is not. These two patterns solve fundamentally different problems, and conflating them leads to systems that are harder to understand, harder to test, and harder to change than they need to be.

What Garcia-Molina Actually Meant

The term "saga" comes from a 1987 paper by Hector Garcia-Molina and Kenneth Salem. They were not thinking about microservices or event-driven architectures. They were solving a database problem: long-lived transactions that hold locks for too long.

Their solution was elegant. Instead of one long transaction, break it into a sequence of smaller transactions, each of which can be committed independently. If the sequence completes, great. If it fails partway through, run a series of compensating transactions that undo the effects of the ones that already succeeded. A saga is not an orchestrator. It is a compensation strategy.

The key insight is that compensating transactions are not the same as rollbacks. A database rollback undoes a transaction as if it never happened. A compensating transaction acknowledges that something did happen and performs a new action to counteract it. If you shipped an order and need to undo it, you don't magically un-ship it. You initiate a return. The original shipment event stays in the history. Compensation adds to the story. Rollback pretends part of the story never happened.

Think of a travel booking. You reserve a flight, then a hotel, then a rental car. If the rental car is unavailable, you don't somehow un-fly the flight. You cancel the flight reservation. You cancel the hotel reservation. Each cancellation is a new transaction, a forward-moving action that undoes the effect of a previous one. The original reservations remain in the history. The cancellations are added on top. The event log tells the complete story: what was attempted, what succeeded, and what was reversed.

This is a natural fit for Event Sourcing. As we explored in Consistency Is a Business Decision, most systems that think they need distributed transactions can be redesigned to work with eventual consistency and compensating actions. The saga pattern is the formalization of that idea: accept that individual steps will commit independently, and design explicit compensations for when the overall workflow needs to be reversed.

Enter the Process Manager

A process manager is something different. It is a stateful component that receives events, maintains its own internal state based on what has happened so far, and dispatches commands to other parts of the system based on that state. It knows the overall workflow. It tracks progress. It decides what to do next.

Consider an order fulfillment process. When an OrderPlaced event arrives, the process manager sends a ReserveInventory command. When it receives an InventoryReserved event, it sends a ChargePayment command. When it receives a PaymentCharged event, it sends a ShipOrder command. The process manager holds the map of the entire journey and navigates through it step by step.

The process manager is an active participant. It makes decisions. It routes commands. It maintains state that tells it where in the workflow it currently is. This is fundamentally different from the saga pattern, which describes how to handle failure through compensation, not how to coordinate success through orchestration.

If you think of the Decider pattern with its decide and evolve functions, a process manager is essentially a Decider for workflows. It receives events instead of commands as input, but the structure is the same: given the current state, decide what to do next. The state captures where the workflow is. The events move it forward. The commands it dispatches are the actions it decides to take.

This is the pattern that most frameworks implement when they offer "saga" support. If you look at the code and see something that listens for events, updates its own state, and sends commands based on that state, you are looking at a process manager, regardless of what the documentation calls it.

Choreography vs. Orchestration

The distinction between sagas and process managers maps closely to a broader architectural choice: choreography versus orchestration.

In a choreographed system, each participant knows what to do when it receives certain events. The inventory service listens for OrderPlaced and reserves stock. The payment service listens for InventoryReserved and charges the card. The shipping service listens for PaymentCharged and dispatches the package. No one is in charge. Each service reacts to what happened and does its part. The workflow emerges from the interactions, not from a central plan.

In an orchestrated system, a central coordinator tells each participant what to do and when. The process manager sends explicit commands: reserve this, charge that, ship this. The participants don't need to know about the overall workflow. They just execute the commands they receive. The workflow is defined in one place, and the coordinator owns it.

Sagas, in their original sense, lean toward choreography. Each local transaction knows its own compensating transaction, and the compensation chain can be triggered without a central coordinator. Process managers, by definition, are orchestrators. They hold the state, they make the routing decisions, and they own the workflow definition.

In the interview on OpenCQRS, Frank Scheffler put it clearly: "We decided to avoid orchestration, when running OpenCQRS in cloud environments, at all costs." That is a deliberate architectural choice in favor of choreography, and it has real consequences for how workflows are structured, deployed, and scaled.

Neither approach is inherently superior. Choreography gives you loose coupling and independent deployability, at the cost of visibility. Orchestration gives you a clear workflow definition and centralized error handling, at the cost of a coordination bottleneck. The trade-off is always between autonomy and control. Understanding that sagas lean toward the first and process managers toward the second helps you choose deliberately rather than by accident.

Why the Distinction Matters

You might wonder whether this is just terminology. Both patterns coordinate work across boundaries. Both deal with failure. Does it matter what you call them?

It matters because the patterns lead to fundamentally different architectures. A choreographed saga distributes knowledge across participants. Each service knows its own compensating action, but no single component knows the entire workflow. This reduces coupling but makes the overall flow harder to see. You need to trace events across multiple services to understand what happens when an order is placed.

A process manager centralizes knowledge. The workflow is defined in one place, which makes it easy to understand, modify, and test. But that central component becomes a coordination bottleneck and a coupling point. Every participant depends on receiving commands from the process manager, and the process manager depends on receiving events from every participant. Change the workflow, and you change one component. But that one component touches everything.

The failure modes are different too. In a choreographed saga, if one service fails to compensate, the other services don't know. You need monitoring and alerting to detect stuck compensations. In an orchestrated process manager, failure handling is centralized: the process manager knows what happened and can decide how to react. But if the process manager itself fails, the entire workflow stalls.

Testing tells a similar story. A process manager can be tested in isolation: given these events, it should produce these commands. The test is straightforward because the logic is in one place. Testing a choreographed saga requires integration tests that verify the interactions between multiple services. Each service is simple to test individually, but the emergent behavior of the whole system is harder to verify.

As we discussed in Hidden in Plain Sight: The Events You Forgot to Model, the events you choose to model shape everything downstream. The same applies here: whether you design for choreography or orchestration shapes which events matter, which services need to know about them, and how failures propagate through the system.

When to Use Which

Neither pattern is universally better. The right choice depends on what your workflow actually needs.

Sagas with choreography work well when the workflow is simple and stable. If you have a linear sequence of steps, each with a clear compensating action, and the workflow rarely changes, choreography keeps things decoupled and straightforward. Each service owns its piece. There is no central coordinator to maintain or deploy. The reservation pattern that separates intent from outcome fits naturally here: SeatRequested, then SeatReserved or SeatDenied, with each outcome triggering the next step or a compensation.

Process managers work well when the workflow is complex or dynamic. If the next step depends on the combination of previous outcomes, if there are conditional branches, timeouts, or human approval steps, a process manager gives you a single place to define and modify that logic. When a business analyst asks "what happens when the payment fails but the inventory is already reserved?", you can point to one component that contains the answer.

Hybrid approaches are common in practice. You might use choreography for the happy path and a process manager for exception handling. Or you might use a process manager for the core workflow and let individual services handle their own compensations choreographically. The patterns are not mutually exclusive, and real systems often combine them.

The key question is: who needs to know the overall plan? If no one needs to, choreography keeps things simple. If someone needs to, make it explicit with a process manager. What you should avoid is a system where the overall plan exists implicitly, scattered across multiple services, understood by no one, and documented nowhere. That is neither a saga nor a process manager. That is a distributed monolith waiting to surprise you.

One practical indicator: if you find yourself drawing the workflow on a whiteboard to explain it to a new team member, and the diagram has conditional branches, you probably need a process manager. If the workflow is a straight line of steps with compensations, a choreographed saga might be all you need. The complexity of the workflow should guide the complexity of the coordination mechanism, not the other way around.

Call It What It Is

Precision in language leads to precision in thinking. When you say "saga" and mean "process manager," you import the wrong mental model. You think about compensation when you should think about orchestration. You think about independent services when you should think about centralized coordination. You reach for choreography when your workflow actually needs a brain.

The original saga pattern is a powerful idea: break long-lived transactions into compensatable steps, and design for graceful reversal instead of distributed locking. The process manager pattern is equally powerful: centralize workflow coordination in a stateful component that reacts to events and dispatches commands. Both are valuable. Neither is the other.

As we explored in the Decider pattern, clarity in structure leads to clarity in code. The same applies to workflow coordination. Name the pattern you are using. Understand its trade-offs. Choose it deliberately, not by default.

If you want to explore how these patterns fit into a broader Event Sourcing and CQRS architecture, take a look at cqrs.com. And if you are designing a workflow and want to talk through whether a saga, a process manager, or a combination of both is the right fit, reach out at hello@thenativeweb.io. We love these conversations.