Skip to content

Data Is the New Gold, Here's How to Mine It

Picture this: you work at a mid-sized e-commerce company. The marketing team needs customer purchase patterns. The logistics team needs order fulfillment timelines. The finance team needs revenue breakdowns by product category. All of this data exists somewhere in your organization. But when marketing asks the data engineering team, they get "file a Jira ticket." When logistics asks the backend team, they get "we can export a CSV next week." Everyone knows data is gold, but getting to it feels like mining with a spoon.

This scenario plays out in companies of every size, every day. The data is there. The value is obvious. But the organizational structure turns every data request into a negotiation. And the solutions we have built over the past two decades have not fixed the problem. They have made it worse.

The reason is surprisingly simple. We kept trying to solve an organizational problem with centralized technology. We built bigger pipes, bigger warehouses, bigger lakes. And we ended up with bigger bottlenecks.

The Promise That Wasn't Kept

Data Warehouses were supposed to be the answer. Take all your data, transform it into a unified schema, load it into one place. One source of truth, accessible to everyone. On paper, it sounded perfect. In practice, it created a new kind of silo: the data team silo.

Every department that needed data had to go through the central data engineering team. That team became the bottleneck for every report, every dashboard, every ad-hoc analysis. They built ETL pipelines that broke when upstream schemas changed. They maintained transformation logic they did not fully understand because it encoded business rules from domains they did not own. And the data they served was always a little stale, always a little off, because the pipeline from source to warehouse introduced delay and sometimes introduced errors.

Data Lakes tried to improve on this by removing the schema constraint. Just dump everything in, figure out the structure later. But "figure out the structure later" turned into "nobody knows what's in here." The lake became a swamp. Data scientists spent eighty percent of their time cleaning and understanding data, and twenty percent doing actual analysis.

The pattern is the same in both cases. A central team owns all the data infrastructure. Every other team depends on them. The central team cannot possibly understand every domain deeply enough to model its data correctly. And so the data is always slightly wrong, slightly late, slightly incomplete.

The problem isn't technical. It's organizational.

Four Pillars, One Shift

In 2019, Zhamak Dehghani introduced the concept of Data Mesh, and it reframed the entire conversation. Instead of asking "how do we build a better central data platform," she asked "what if we stopped centralizing data in the first place?"

Data Mesh rests on four pillars, and each one addresses a specific failure of the centralized approach.

The first pillar is domain ownership. Instead of a central team owning all data, each domain team owns its own data. The marketing team does not ask the data engineers to model customer behavior. The marketing team models it themselves, because they understand their domain better than anyone else. This is the same principle that drives Domain-Driven Design: the people closest to the problem are the best equipped to model it.

The second pillar is treating data as a product. When a team produces data that other teams consume, that data should be treated with the same care as any product. It needs documentation. It needs quality guarantees. It needs a clear interface. It needs an owner who cares about the consumers' experience. Data Mesh treats data like a product, not a byproduct. In our opening scenario, the logistics team would not have to file a Jira ticket. They would consume a well-documented data product published by the order management team.

The third pillar is a self-serve data platform. Domain teams should not have to build their own infrastructure from scratch. There should be a shared platform that makes it easy to publish, discover, and consume data products. The platform handles the plumbing so that teams can focus on the data itself.

The fourth pillar is federated computational governance. Standards and policies are defined centrally but executed locally. Think of it like building codes: the city sets the rules, but each architect designs their own building within those rules. This ensures interoperability without creating a central bottleneck.

Together, these four pillars represent a paradigm shift. Data ownership moves from a central team to the domain teams. The central team shifts from owning data to owning the platform. And the relationship between data producer and data consumer becomes explicit, with clear contracts and quality expectations.

Where Event Sourcing Enters the Picture

If you have been reading this blog, you might already see where this is going. Event Sourcing and CQRS provide a natural architecture for implementing Data Mesh principles. The fit is not accidental. It is structural.

In an event-sourced system, each domain team owns its domain events and stores them in its own event store. This is crucial: each team has its own EventSourcingDB instance, not a shared central one. As we explored in One Database to Rule Them All, shared databases reintroduce the very coupling that Data Mesh aims to eliminate. When the order management team records an OrderPlaced event or a PaymentReceived event, they are capturing the truth of what happened in their domain. These events are stored in the team's own event store, invisible to the outside world.

This is an important distinction: domain events are implementation details. They capture the internal state transitions of your aggregates, the fine-grained facts that help you rebuild state and debug behavior. They are not meant to be shared with other teams. Exposing your domain events would couple consumers to your internal model, making every refactoring a breaking change.

Now consider how CQRS separates the write model from read models. The write side is the aggregate, the internal model optimized for making decisions and enforcing business rules. The read side consists of projections, external models optimized for answering specific questions. One stream of events can feed many different projections, each tailored to a different consumer's needs.

Here is where "data as a product" becomes concrete. The order management team does not publish its domain events for others to consume. Instead, the team builds projections specifically designed for other teams. The marketing team needs customer purchase patterns? The order management team builds a read model that provides exactly that, with a stable interface, clear documentation, and quality guarantees. The logistics team needs fulfillment timelines? Another projection, another data product, owned and maintained by the team that understands the domain.

The projections are the data products. They are derived from the domain events, but shaped and curated for specific consumers. The domain events remain private. The projections become the public interface. And because projections can be rebuilt at any time by replaying the events, they are never permanently stale. If a projection's logic needs to change, you update the logic and replay. The data corrects itself.

This is what "data as a product" truly means. It is not about letting consumers subscribe to your internal events and build their own views. It is about the domain team taking responsibility for the data it provides to others, treating that data with the same care as any other product.

If you want to explore how Event Sourcing applies to your architecture, eventsourcing.ai is a good starting point for understanding the fundamentals.

A Shared Platform and Clear Contracts

The missing piece, the thing that turns a collection of event-sourced services into a Data Mesh, is a shared platform with clear contracts. But "shared platform" does not mean "shared database." It means common tooling, common APIs, common conventions.

Each team runs its own EventSourcingDB instance. The platform provides the infrastructure: the same technology, the same operational patterns, the same way of storing and querying events. But the data itself remains isolated. Each team's event store is its own, just as each team's domain is its own.

The contracts are not the domain events. The contracts are the data products: the projections that teams build for their consumers. When the order management team provides a "Customer Purchase Patterns" read model to the marketing team, that interface is a promise. The schema is documented. Changes follow versioning rules. The owning team takes responsibility for quality and availability. This is federated governance in practice: the platform provides consistency in how data products are built and accessed, but each team controls what it exposes and how.

Each team's event store is its single source of truth. The projections it builds for others become its data products.

Think about what this means for our opening scenario. The marketing team does not ask anyone for raw data. They consume a well-designed data product built by the order management team, tailored to their needs. The logistics team consumes a different data product from the same source, shaped for fulfillment analysis. Each consuming team gets exactly what it needs, in exactly the format it needs, without filing a Jira ticket and without waiting for a central data team to build a pipeline. And the producing team maintains full control over its internal implementation.

And because the events are immutable and complete, the producing team can trace exactly how every data product was derived. When a number looks wrong, you do not dig through ETL pipeline logs hoping to find the transformation that broke. You replay the events through your projection and watch the number being calculated, step by step. This is the same debugging capability that makes Event Sourcing so powerful for individual services, now applied at the organizational level.

Mining Your Own Gold

The gold metaphor from the beginning still holds, but the picture has changed. The gold is not locked in a central vault, nor scattered across inaccessible silos. Each team has its own mine, its own event store, and refines the gold into products that others can use. The platform provides the picks and the carts. The domain teams decide what to extract and how to package it.

If your organization is drowning in data requests, if every cross-team analysis starts with a Jira ticket and ends with a stale CSV, consider that the problem might not be your tools. It might be your topology. Data Mesh, built on Event Sourcing, offers a different topology. One where data flows naturally, where ownership is clear, and where every team can mine the gold it needs. To learn how EventSourcingDB can serve as the foundation for this architecture, reach out at hello@thenativeweb.io.