Skip to content

Soft Delete Is a Workaround

Three weeks ago, Alex Buchanan published a thoughtful blog post about soft delete strategies. In "The challenges of soft delete", he describes the problem carefully and offers four creative solutions. His analysis is thorough, his examples concrete, and his engineering instincts sound.

His analysis is correct, his solutions well considered. But all four strategies have something in common: they optimize a problem that does not exist with a different architectural approach.

The Problem He Describes

Alex outlines a familiar situation. You implement soft delete because you need to restore data later, comply with regulations, or maintain audit trails. You add a deleted_at column to your tables and update your queries with WHERE deleted_at IS NULL. Sounds simple. Then reality sets in.

Dead data starts accumulating. Millions of "deleted" rows pile up, slowing queries, inflating backups, and consuming storage. Your database grows even though your active data does not. At the same time, query complexity spreads throughout your codebase. Every query needs the deletion filter. Every join needs to consider whether both sides are deleted. Every report needs to check for ghost data.

And ghost data will haunt you. Developers forget to add the filter. Months later, someone notices that reports include "deleted" records, and the debugging session begins. Even when everything works as intended, restoration is not trivial. Bringing back a "deleted" record is not just flipping a flag. External systems may have moved on. Validation rules may have changed. Referential integrity may be broken. Speaking of which: foreign keys point into the void. A "deleted" customer record still has orders pointing to it. Do you cascade soft deletes? Leave orphans? Neither option is clean.

The core problem is simple: CRUD only stores the current state. When you delete a row, the history disappears. Soft delete is an attempt to preserve that history without changing the fundamental architecture.

The Four Strategies and Their Common Thread

Alex proposes four approaches to manage soft deleted data, each with its own trade-offs.

The first approach, Application-Level Archiving, moves deleted records to a separate archive such as S3, a different table, or a message queue. The main table stays clean, but you now have two systems to query when you need historical data. You also need to build and maintain the archiving logic yourself, handle failures gracefully, and ensure that the archive remains consistent with your application's semantics.

The second approach uses Database Triggers to automatically copy deleted rows to a JSON archive table before the actual deletion happens. This keeps the logic close to the data, but triggers can be opaque, hard to test, and easy to forget about. They also add overhead to every delete operation and can cause surprising behavior when developers are not aware of them.

The third approach, WAL/CDC (Change Data Capture), uses tools like Debezium to stream changes to Kafka or another system, capturing deletions before they happen. This is powerful infrastructure, but it is also complex infrastructure. You now depend on external systems for something as fundamental as "what happened to this record." The operational burden is significant.

The fourth approach, the Non-Deleting Replica, maintains a read replica that never executes DELETE statements, preserving all historical data. This is clever, but it means your historical queries go to a different database than your operational queries. You have two sources of truth, with all the consistency challenges that implies.

Each approach is clever engineering. Each solves a real problem. And each shares a common thread: all four attempt to reconstruct history that CRUD destroyed. They build infrastructure around the problem instead of eliminating it. The question is whether that infrastructure is addressing a symptom or the root cause.

The Forgotten Problem: Updates

Here is something the Hacker News discussion on Alex's post highlighted: soft delete only solves the deletion problem. But CRUD also overwrites the past with every UPDATE. And updates happen far more often than deletes.

Think about what happens when a customer changes their address. The old address is gone, overwritten by the new one. When a price is adjusted, the previous price vanishes. When a status changes, the previous status disappears without a trace. You might need that historical information months later, for an audit, for a dispute, for analytics, and it simply does not exist anymore.

You can solve this by versioning rows, adding history tables, or implementing temporal patterns. Some databases even offer built-in temporal features. But now you are building even more infrastructure to compensate for the fundamental limitation: CRUD stores current state, not history. You are bolting on history preservation to a paradigm that was designed to overwrite.

Soft delete is half a workaround for a holistic problem. It addresses deletion while ignoring the equally destructive nature of updates. The architecture remains fundamentally amnesiac.

The Conceptual Error: DELETE Is Not Domain Language

In a previous post, I wrote about how ... And Then the Wolf DELETED Grandma. The title is deliberately absurd because DELETE is absurd as domain language. Nobody in your business talks about "deleting" customers or "deleting" orders. That is database vocabulary, not business vocabulary.

In the real world, nothing gets "deleted." Things get archived, deactivated, cancelled, terminated, withdrawn, suspended, expired, revoked. Each of these words carries meaning. Each implies different consequences, different processes, different business rules. A cancelled order is not the same as a refunded order. A suspended account is not the same as a closed account. A terminated contract has different legal implications than an expired one.

When you model all of these distinct business events as "delete," you lose that semantic richness. AccountClosed tells you what happened. ContractTerminated tells you what happened. SubscriptionCancelled tells you what happened. UserDeleted sounds like a crime report, as I explored in Don't Kill Your Users, and it tells you almost nothing about what actually occurred in your business.

When you use CRUD vocabulary for domain concepts, you flatten the semantic richness of your business into three generic verbs. The nuance disappears. The meaning disappears. And then you spend years building infrastructure to recover that lost meaning, adding reason codes and status fields and audit logs to reconstruct what you could have captured in the first place.

Event Sourcing: The Problem That Does Not Exist

With Event Sourcing, there is no DELETE. There are only events. This is not a philosophical statement; it is a fundamental architectural difference that makes the entire soft delete problem disappear.

When a customer closes their account, you write AccountClosed. When a subscription ends, you write SubscriptionCancelled. When a product is discontinued, you write ProductDiscontinued. Each event captures what actually happened, in the language of the domain. The event is a fact, immutable and permanent. It does not replace anything; it adds to the history.

All the problems from Alex's article dissolve when you think in events. The concern about dead data accumulation does not apply because events are the data, not dead weight. They are your audit trail, your analytics foundation, your regulatory compliance proof. A million events is a million facts about your business, each one potentially valuable.

Query complexity disappears because you separate your write model from your read models. Projections contain only what is relevant. Your "active customers" projection includes only active customers. Your "all customers ever" projection includes everyone. You do not need filters everywhere because you design your projections for their specific use cases. Ghost data is not possible because there is no deletion flag to forget.

The restoration problem vanishes entirely. There is nothing to restore because nothing was removed. The events are still there, exactly as they were written. If you need to "uncancel" a subscription, you write SubscriptionReactivated. The cancellation event remains in the history, followed by the reactivation event. The complete story is preserved.

Update history is captured automatically. Every change is its own event: CustomerAddressChanged, PriceAdjusted, StatusUpdated. Nothing is overwritten. Nothing is lost. When someone asks "What was this customer's address last January?" you can answer with certainty.

Even data model migration becomes simpler. Events are immutable, but projections are derivable. When you need to change how you view your data, you rebuild projections from events. The source of truth never changes; only your interpretation of it evolves.

Additional Benefits

Event Sourcing provides capabilities that no amount of soft delete infrastructure can match. These are not just incremental improvements; they are fundamentally different capabilities that emerge from storing history as the primary data model.

The most obvious benefit is that you get an audit trail without additional infrastructure. The events are the audit trail. There is no separate logging system to maintain, no risk of logs being incomplete or out of sync with the data, no question of whether something was logged or not. If it happened, there is an event. If there is no event, it did not happen. Auditors love this clarity.

Time travel becomes trivial. "What was the state of this account on January 15th?" Replay events up to that moment and you have your answer. The answer is exact, mathematically derived from the event history, not reconstructed from backups that may or may not exist and may or may not be consistent.

Your data gains semantic clarity. Events speak the language of the domain because they describe what happened in business terms. When business stakeholders and IT look at the data, they see the same thing: OrderPlaced, PaymentReceived, ShipmentDispatched. There is no translation layer between what the business thinks happened and what the database recorded.

GDPR and data privacy become more manageable, not less. You might think that storing every event forever makes privacy harder, but the opposite is true. You can delete raw personal data while keeping aggregated projections. You can keep raw data but restrict access. You can apply retention policies to specific event types. The separation of events and projections gives you options that CRUD cannot provide because CRUD conflates the historical record with the current state.

Perhaps most importantly for the future, Event Sourcing provides the foundation for AI and analytics. AI models need rich historical data to find patterns. They need sequences, not snapshots. They need to understand what led to what. Event streams provide exactly that: complete sequences of what happened, with causal relationships intact. CRUD databases offer snapshots, which is like trying to understand a movie by looking at random frames. You might get lucky, but you will miss the story. If you are curious about the intersection of Event Sourcing and AI, visit eventsourcing.ai.

Conclusion

Alex Buchanan's article is clever engineering within a constraining architecture. If you are committed to CRUD, his strategies are practical and well thought through. They represent the best thinking available within that paradigm. But sometimes the best solution is not having the problem in the first place.

Event Sourcing does not answer "How do I do soft delete better?" It answers a different question entirely: "Why should I delete at all?" The framing of the original problem assumes that deletion is necessary, that data must be removed, and that the challenge is doing this gracefully. Event Sourcing challenges that assumption at its root.

The real world does not have a DELETE button. Things happen, and then other things happen. Contracts are signed, then amended, then terminated. Customers sign up, make purchases, change their preferences, eventually leave. Products are launched, modified, discontinued, sometimes brought back. Each of these is a fact that occurred at a specific point in time. None of them erases what came before.

Event Sourcing models software the way reality works: as a sequence of facts, each preserved, each meaningful, each telling part of the story. When you embrace this model, you stop fighting against time and start working with it. The soft delete problem, along with many others, simply dissolves.

Getting Started

If this perspective resonates with you, the Getting Started guide will have you writing your first events in minutes. And if you have questions or want to discuss how Event Sourcing could address your specific challenges, reach out at hello@thenativeweb.io.

Because sometimes the best workaround is not needing one.