Proving Without Revealing: Merkle Trees for Event-Sourced Systems¶
Imagine it's January 2026. You run a platform with millions of users. An auditor walks in with a specific request: "Show me proof that you captured a GDPR consent event for user #12847 on March 15th, 2024." You know the event exists – it's sitting in your event store. But here's the problem: you can't just hand over your complete event log. That log contains millions of events with sensitive customer data, financial transactions, business secrets, and personal information from thousands of other users.
So what do you do? You can't share everything. A screenshot isn't trustworthy – anyone with basic image editing skills could fake that. Exporting a single event proves nothing – you could have just created it five minutes ago. You need a way to prove that a specific event existed at a specific point in time, without revealing anything else. This isn't a hypothetical problem. It's a real challenge that event-sourced systems face when dealing with audits, compliance requirements, B2B contracts, or legal disputes.
The Solution: Cryptographic Proofs¶
The answer lies in cryptography – specifically, in a data structure called a Merkle tree. Named after computer scientist Ralph Merkle, who patented the concept in 1979, Merkle trees have become fundamental to systems that need tamper-proof verification. You might recognize them from Bitcoin, where they verify transactions, or Git, where they track changes in your codebase.
Here's what we need: a single "fingerprint" representing our entire dataset – called the Merkle root. A cryptographic proof that a specific event is part of that dataset – called a Merkle proof. And a way to verify that proof without accessing any other data.
This works because cryptographic hash functions have a crucial property: it's practically impossible to forge data that produces a specific hash. If even a single bit changes in your input, the resulting hash is completely different. This makes Merkle trees perfect for proving data integrity and membership without revealing the data itself.
How Merkle Trees Work¶
Let's break down how this actually works, step by step. Don't worry – no complex mathematics required. Just the core concepts you need to understand.
Every event gets a unique cryptographic hash using SHA-256. Think of a hash as a fingerprint – a fixed-size string that uniquely represents the event's data. Here's the important part: EventSourcingDB already does this automatically for every event you store. When EventSourcingDB hashes an event, it follows the CloudEvents specification. It takes the event's metadata – like type, subject, timestamp – and the event's data payload, hashes them separately, then hashes the combination. The result is a 64-character hexadecimal string that uniquely identifies that event.
Event: UserConsentGiven (ID: 12847)
Hash: a3f7b2c8d4e9f1a6b5c7d8e9f0a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8a9
Change even a single character in the event data, and you get a completely different hash.
Now comes the clever part. We take these event hashes and organize them into a binary tree structure. Take the first two event hashes, concatenate them, and hash the result. That gives you a new hash representing both events. Do the same for the next pair. Then take those combined hashes and hash them together again. Keep going until you have a single hash at the top – the Merkle root.
Let's walk through a simple example with four events. Event 1 has hash H1, Event 2 has H2, Event 3 has H3, and Event 4 has H4. We combine H1 and H2, hash the result, and get H12. We combine H3 and H4, hash that, and get H34. Finally, we combine H12 and H34 and hash them together to get H1234. That final hash, H1234, is your Merkle root. It's a single value that cryptographically represents all four events. Change anything in any event, and the root changes completely.
If you have an odd number of events at any level, you duplicate the last one to make pairs. This ensures you always end up with a complete binary tree and a single root hash.
Here's where it gets really interesting. To prove that Event 3 was part of the original dataset, you don't need to share all the events. You only need the hash of Event 3 (H3), the "sibling" hashes along the path from Event 3 to the root (H4 and H12), and the Merkle root (H1234). That's it. Three hash values, and you can prove membership in a dataset of potentially millions of events.
Anyone can now verify that Event 3 was part of the original dataset. Take H3 and H4 – the sibling at level 0 – hash them together, and you should get H34. Take that result and H12 – the sibling at level 1 – hash them together, and you should get H1234. Compare with the known Merkle root. If it matches, the proof is valid. Event 3 was definitely part of the original dataset. If it doesn't match, either the event was modified, or it was never part of that dataset.
Here's the crucial part: The verifier only sees three hashes. They learn nothing about Event 1, Event 2, or Event 4. They can't reconstruct those events. They can't even tell how many events are in the dataset. They just know that Event 3, with its specific hash, was part of the tree that produced that Merkle root.
Introducing EventSourcingDB Merkle¶
The theory is clear. But how do you actually use this in practice? That's exactly why we built eventsourcingdb-merkle – a CLI tool that makes Merkle tree operations simple and practical for EventSourcingDB users.
The tool is available now on npm and works directly with EventSourcingDB's backup format. It handles all the cryptographic complexity for you, exposing a clean command-line interface for the operations you actually need. You can install it globally with npm install -g eventsourcingdb-merkle, or if you prefer not to install it globally, use npx eventsourcingdb-merkle [command].
The tool provides five core commands. First, there's validate-chain, which verifies the integrity of your event chain. EventSourcingDB links events together using predecessor hashes – each event contains the hash of the previous event. This command checks that every link in the chain is correct and that the first event properly has a null predecessor hash. If everything is valid, you get a confirmation message. If something's wrong, you get a detailed list of errors with non-zero exit code – perfect for CI/CD pipelines.
Second, there's merkle-root, which calculates and outputs the Merkle root for your entire event stream. The root hash is your dataset's cryptographic fingerprint. The command also tells you how many events were processed. This is typically the first thing you do when you want to establish a provable baseline for your data.
Third, validate-event-hash checks whether an event's stored hash matches the hash calculated from its contents. You can pass a file and event ID, or pass the event directly as JSON. This is useful for spot-checking integrity without processing an entire backup file.
eventsourcingdb-merkle validate-event-hash --file=backup.json --event-id=42
eventsourcingdb-merkle validate-event-hash --event='{"specversion":"1.0",...}'
Fourth, get-proof generates a Merkle proof for a specific event ID. By default, you get human-readable output. Add the --json flag for machine-readable JSON format. The proof includes the event's hash, the sibling hashes needed for verification, their positions in the tree, and the Merkle root.
eventsourcingdb-merkle get-proof backup.json 42
eventsourcingdb-merkle get-proof backup.json 42 --json
Finally, verify-proof validates a Merkle proof without needing access to the original backup file. You can pass a proof file or the proof directly as a JSON string. This is the command that auditors, partners, or third parties would use to verify proofs you provide them.
eventsourcingdb-merkle verify-proof proof.json
eventsourcingdb-merkle verify-proof '{"eventId":"42",...}'
The tool uses SHA-256 hashing following EventSourcingDB's exact specification. It works with EventSourcingDB's NDJSON backup format, where each line is a JSON object with type: "event" and a payload field containing a CloudEvents-compliant event. The Merkle tree construction follows standard practices: events are leaf nodes, internal nodes are created by hashing child concatenations, and the tree is completed by duplicating the final node if needed to maintain a complete binary structure.
A Complete Example: The GDPR Audit¶
Let's walk through our opening scenario step by step, showing exactly how you'd use these tools in practice.
It's December 31st, 2024. You want to create a verifiable snapshot of your event store for the year. First, export a backup from EventSourcingDB using the standard backup procedure. Then calculate the Merkle root:
The output shows you something like this:
Merkle Root: 7f3a9b2ce8d4f1a5b9c6d7e8f9a0b1c2d3e4f5a6b7c8d9e0f1a2b3c4d5e6f7a8
Total Events: 1,234,567
This Merkle root is now your cryptographic fingerprint for the entire year's worth of events. Here's the crucial step: publish this hash publicly. Post it on your website. Include it in your annual transparency report. Submit it to a blockchain or a certificate transparency log. Tweet it. The point is to create an immutable, timestamped record that you calculated this specific root hash at this specific time.
Why publish it? Because later, when you need to prove something about your data, you can point to this published hash and say: "See? This is what my dataset looked like on December 31st, 2024. Anyone can verify I didn't change it afterward."
Now it's January 2026. The auditor asks for proof that Event #12847 – a GDPR consent – existed in your December 2024 dataset. You generate a proof:
The resulting proof file looks something like this (abbreviated):
{
"eventId": "12847",
"eventHash": "a3f7b2c8d4e9f1a6b5c7d8e9f0a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8a9",
"merkleRoot": "7f3a9b2ce8d4f1a5b9c6d7e8f9a0b1c2d3e4f5a6b7c8d9e0f1a2b3c4d5e6f7a8",
"siblings": [
{
"hash": "b4c8d3e7f2a6b5c9d8e1f0a4b3c7d6e9f8a2b5c4d7e0f3a6b9c2d5e8f1a4b7",
"position": "right"
},
{
"hash": "e9f1a7b3c5d8e2f6a0b4c8d1e5f9a3b7c0d4e8f2a6b9c3d7e1f5a8b2c6d0e4",
"position": "left"
}
]
}
You hand this JSON file to the auditor. That's it. No other data needed.
The auditor takes your proof file and verifies it:
The output confirms:
The auditor checks the published Merkle root you posted in December 2024. It matches the root in the proof. The proof is valid. Done.
The auditor now knows with cryptographic certainty that Event #12847 existed in your December 31st, 2024 dataset, that the event hasn't been modified since then, and that your December 2024 dataset contained at least 12,847 events. But the auditor did not learn the contents of Event #12847 – only its hash. They learned nothing about any other event in your system. They don't know the exact total number of events, the structure or schema of your events, or any business logic or internal details.
If you want to also share the actual event contents, you can provide Event #12847 separately. The auditor can then calculate its hash manually – or using the validate-event-hash command – and verify it matches the hash in the proof. But that's your choice. The proof itself reveals nothing about the event's contents.
Use Cases in Practice¶
So where is this actually useful? Let's look at some concrete scenarios.
For compliance and auditing, Merkle proofs let you prove that consent events, deletion requests, or data processing records were captured at specific times under GDPR and privacy regulations. You can demonstrate compliance without exposing personal data from other users. For SOC2 and ISO audits, you can show that security events, access logs, or system changes were properly recorded, providing proof of your event-driven audit trail without revealing sensitive operational details. In fintech, healthcare, or other regulated industries, you can prove transaction records, state changes, or compliance events exist without sharing competitive or confidential information.
For B2B contracts and SLAs, the phrase "We guarantee to track all transactions" becomes provable. Publish daily Merkle roots showing you're maintaining the complete event history you promised. When a customer claims something happened or didn't happen, provide cryptographic proof based on your event log. Merkle proofs can serve as evidence in contract disputes because they're verifiable and tamper-proof.
After a data breach, Merkle proofs help with timeline reconstruction. You can prove which events existed before the breach occurred – crucial for forensic analysis and for demonstrating to regulators that you had proper logging in place. You can also prove non-tampering: show that historical events weren't modified after the breach was discovered. A Merkle root published before the breach proves the state of your data at that time.
For supply chain transparency, you can prove a product went through specific steps in your supply chain without revealing supplier relationships, pricing, or other competitive details. Share proof of ethical sourcing or compliance certifications for specific products without exposing your entire supply chain network.
For intellectual property, Merkle proofs provide timestamp proofs. Prove you had a specific piece of information or completed a specific development milestone on a particular date – useful for patent prior art claims or proving innovation timelines. You can also demonstrate you've handled a certain volume or type of transactions without revealing client identities or transaction details.
The common thread in all these scenarios: you need to prove you have certain data without revealing everything else. Merkle proofs make that possible.
Why Event Sourcing and Merkle Trees Are a Perfect Match¶
Event Sourcing and Merkle trees fit together remarkably well. It's not a coincidence – they share fundamental properties that make them natural partners.
Event streams never modify historical data. You only append new events. This is exactly what Merkle trees need. Once you calculate a Merkle root, you know the events it represents will never change. Your proofs remain valid indefinitely. Events are facts – things that happened in the past. They're never deleted or updated. Traditional databases with CRUD operations can't offer the same guarantee. UPDATE and DELETE statements would invalidate your Merkle roots. With event sourcing, the history is stable.
Events have a clear chronological order. Each event has a position in the stream. This makes Merkle tree construction deterministic – everyone building a tree from the same events will get the same root. There's no ambiguity about which events belong where. EventSourcingDB already computes a hash for every event as it's written. These hashes serve multiple purposes: they enable the predecessor chain that links events together, they support event signatures, and now they're the foundation for Merkle trees. You're not adding overhead – you're leveraging infrastructure that already exists.
Contrast this with traditional databases. If you try to build Merkle trees on top of a SQL database with UPDATE and DELETE operations, your roots become invalid as soon as data changes. There's no stable history to prove anything about. You'd need to snapshot the entire database at every point in time you want to prove something about – expensive and complex.
Event Sourcing was already good for auditing. You could always replay your history, see what happened, and understand how you got to the current state. Merkle trees take this further: they make your history cryptographically provable. Not just "here's what our logs say" but "here's mathematical proof that this event existed at this point in time, and here's how you can verify it yourself."
Going Further¶
Once you start working with Merkle proofs, several interesting possibilities open up.
EventSourcingDB also supports Ed25519 cryptographic signatures for events. When you enable event signing, each event gets a digital signature proving it was created by your system and hasn't been tampered with. You can combine both techniques: Event signatures prove the event came from you and hasn't been modified, while Merkle proofs prove the event was part of your dataset at a specific time. Together, they provide end-to-end cryptographic integrity: authenticity – it's your event – plus membership – it was in your system at time X.
Want to take transparency even further? Publish your Merkle roots to a public ledger. Every day, week, or hour, calculate the Merkle root of your current event stream and post it somewhere immutable and publicly visible. A blockchain transaction. A certificate transparency log. A timestamping service. Your public website with web archives capturing it. This creates an undeniable timeline. You can't later claim you had different data at that point in time. The publicly posted Merkle root locks in what your dataset looked like at that moment. Some systems take this further by having customers or partners co-sign Merkle roots, creating multi-party agreement on the state of shared data at specific points in time.
Merkle proofs also enable sophisticated selective disclosure workflows. Imagine a scenario where you collect events from multiple sources, different parties are allowed to see different subsets, and everyone needs to trust the complete dataset exists. With Merkle trees, you can give each party proofs for only the events they're authorized to see, let them verify those events are part of the larger dataset, and maintain privacy for everyone else's events. This is particularly powerful for multi-tenant systems, data marketplaces, or collaborative platforms where data sharing needs fine-grained control.
Getting Started¶
Ready to try it yourself?
The tool is open source under the MIT license and free to use. You can find the source code, report issues, or contribute on GitHub at https://github.com/thenativeweb/eventsourcingdb-merkle or install it from npm at https://www.npmjs.com/package/eventsourcingdb-merkle.
If you're new to EventSourcingDB, start with the Getting Started guide to set up your event store first.
Have questions, feedback, or interesting use cases we haven't covered? We'd love to hear from you at hello@thenativeweb.io.
Trust, but Verify¶
In the physical world, trust is often about reputation, relationships, and track records. In the digital world, we can do better. We can move from "trust me" to "verify it yourself."
Event Sourcing gives you complete history. Every change, every decision, every action captured as immutable facts. Merkle trees give you cryptographic proofs. Not just logs you control, but mathematical evidence anyone can verify independently.
Together, they provide something traditional databases can't: provable history. You don't have to trust that the data is correct – you can verify it. You don't have to trust that events haven't been tampered with – the math proves it. You don't have to reveal everything to prove something – selective disclosure is built in.
The question is no longer "Do you trust me?" The question becomes "Here's the proof – does it check out?"
That's the difference between trust and truth. And in an increasingly complex digital world where data integrity and privacy both matter, that difference is everything.