Event-Driven Data Science: EventSourcingDB Meets Python and Pandas¶

Data analysis is more important than ever. Data science and AI have become essential tools for many companies. The tools keep getting better: more powerful models, faster computers, smarter algorithms.

But here's the problem: the underlying data is often garbage. And as always: garbage in, garbage out. The best models, the fastest computers, the smartest algorithms – none of it matters if your data doesn't tell the real story.

And that's exactly the issue in most companies: the data is just snapshots of the status quo. What's missing is the most essential aspect: the history of how you got to that status quo. You see what is, but you don't see how it became that way, or why. A user table shows you who's registered today, but not the signup patterns, the failed attempts, the behavioral changes over time. An order table shows current orders, but not the cancellations, the modifications, the decision chains that led to each purchase.

Snapshots hide causality. And without causality, your analysis is guesswork.

The Missing Piece: History¶

That's where Event Sourcing comes in. Instead of storing snapshots of the current state, you store the full history: every change, every decision, every action, captured as immutable events. It's not just compliance or auditing. Event data is a goldmine for analysis.

Events capture behavior. They show you not just outcomes, but processes. Not just results, but journeys. And for data science, that changes everything.

But if you're a data scientist working with Python and Pandas, you might think: "Event stores sound nice, but they're not built for ad-hoc analysis. I can't just load events into a DataFrame and go." That's exactly what we thought. Until we built Pandas integration for EventSourcingDB.

We released two tools that make event analysis as simple as working with CSV files:

Pandas support in the Python SDK for EventSourcingDB
The npm package eventsourcingdb-merkle for cryptographic verification

Then we put them to the test on the most honest dataset we had: our internal todo app.

The Dataset: Real Human Behavior¶

Our todo app has been running at the native web GmbH since April 30th, 2024. It's not a team management tool, it's personal. Each of us uses it however we want: work tasks, grocery lists, doctor appointments, things we keep meaning to do. No coordination, no process, no pressure. Just real individual task management.

The numbers:

8,264 events written over 563 days
1,618 todos created
Data range: April 2024 to November 2025

This isn't a toy example. It's production data showing how people actually use a todo app when nobody's watching.

And because the data is sensitive (personal tasks, after all), we computed a Merkle Root (a cryptographic hash proving we haven't manipulated the analysis). If we changed even a single event, the hash would break. The root for our dataset is:

101bbc2d865dfde26d02a2997a6b4b67bed3aacb523dec028ed768d993a2dbba

You can verify this yourself using eventsourcingdb-merkle. If you want to learn more about Merkle Trees and verification, check out our post Proving Without Revealing.

Loading Events into a DataFrame¶

Here's how simple it is to load events from EventSourcingDB into Pandas:

from eventsourcingdb import Client, ReadEventsOptions
from eventsourcingdb.pandas import events_to_dataframe

# Connect to EventSourcingDB
client = Client(
    base_url='http://localhost:3000',
    api_token='secret'
)

# Read all events recursively
events = client.read_events(
    subject='/',
    options=ReadEventsOptions(recursive=True)
)

# Convert to DataFrame
df = await events_to_dataframe(events)

print(f"Loaded {len(df)} events")

That's it. No manual parsing, no schema mapping, no ETL pipeline. Just events straight into a DataFrame, ready for analysis.

The DataFrame contains everything you need: event_id, time, subject, type, data, and cryptographic fields like hash and signature. From here, you have the full power of Pandas: filtering, grouping, aggregating, and visualizing. The events are just data now. And data tells stories.

What the Data Revealed¶

We Postpone More Than We Plan¶

Here's the distribution of event types in our dataset:

Event Type	Count	Percent
postponed	3,111	37.6%
remembered	1,618	19.6%
completed	1,472	17.8%
adjusted	834	10.1%
prioritized	510	6.2%
deprioritized	270	3.3%
advanced	240	2.9%
discarded	147	1.8%
restored	62	0.8%

The most striking number: 37.6% of all events are postponed. That's nearly twice as many as remembered (creating a new todo).

What does this tell us? We are chronically optimistic. We think we'll get things done today, but then we push them forward. And we keep doing it. This isn't laziness or poor planning. It's human nature. Anyone who's ever used a todo app knows this feeling.

But here's where it gets interesting: if we look at event sequences (what happens after what), the most common transition is postponed → postponed, occurring 2,019 times. That means even after we've already postponed something once, we still overestimate our ability to finish it the next time. We stay optimistic even after reality proved us wrong. That's a behavioral pattern you'd never see in a snapshot. A CRUD database would show you "postponed until Friday," but it wouldn't show you that it's been postponed seven times already.

Event Sourcing makes this visible. It captures behavior, not just state.

Saturday is the Second-Strongest Day¶

When is our todo app most active? Monday, as expected, the start of the week, when we plan what we want to accomplish. The peak hour is 7:00 AM, right before the workday begins. That's the natural rhythm of weekly planning: Monday morning, you sort through what's ahead.

But here's the surprise: Saturday is the second-strongest day of the week. Not Tuesday, not Wednesday. Saturday.

Why does this matter? It shows this isn't just a work tool. If it were, Saturday would be quiet. But our todos include grocery lists, personal errands, weekend planning, and things we want to get done outside of work. The app reflects real life, not just professional tasks.

Looking deeper at the time-of-day patterns, activity runs from 4 AM to 10 PM (the waking hours of people managing their lives). There's a morning peak between 6 and 9 AM, when people plan their day. Activity dips from 3 PM to 5 PM (classic afternoon slowdown), then picks up again in the evening after 6 PM. That's when people are planning for tomorrow, updating grocery lists, organizing the next day.

And here's an odd one: Wednesday is the weakest weekday. We can't explain it. It's just there in the data: a mysterious midweek dip. Maybe it's "hump day" fatigue. Maybe it's random noise. But it's real.

This is something you'd never discover by asking users. People can't tell you their exact usage patterns. But the events tell the truth.

Completion Rate: 91.8%¶

Of all todos that reached a final state (completed or discarded), 91.8% were actually completed. Only 8.2% were discarded.

Why is this important? Because Event Sourcing lets us distinguish between "I did it" and "It's no longer relevant." In a traditional CRUD system, you'd just delete a todo. You'd never know if it was finished or abandoned.

Event Sourcing preserves that distinction:

completed means follow-through: "I said I'd do it, and I did."
discarded means context changed: "This no longer matters."

The 91.8% completion rate tells us something meaningful: despite all the postponing, we eventually get things done. It might take longer than we think, but we follow through. That's a different story than "we're bad at finishing tasks." The data shows resilience, not failure.

And this distinction only exists because we store events, not state. State would just show "gone." Events show why it's gone.

The 267-Event Outlier¶

Most todos go through a few events: remembered, maybe postponed once or twice, then completed. The median is 3 events per todo, and the average is 5.1.

But then there's the outlier: one todo with 267 events.

This isn't a complex multi-month project. It's a recurring task that someone found easier to keep alive by repeatedly postponing rather than completing and re-remembering it. Maybe it's something that comes up every few days. Who knows. We're not going to dig into the specifics (it's personal data, after all).

But here's why it's interesting: this usage pattern is only visible through events. In a CRUD system, it's just a todo with a timestamp. With Event Sourcing, you see the behavior: 267 interactions over time, a story of how someone chose to manage a recurring need.

Events don't just show you the data. They show you how people actually use your system.

What This Means for Data Science¶

Event Sourcing isn't just for compliance, auditing, or architectural purity. It's the ideal foundation for data analysis and machine learning.

Here's why:

Immutability means reproducibility. Events never change. You can reproduce any analysis exactly, even months later. No "the data changed since we ran this" problems. No wondering if someone updated a record. The history is locked. That's critical for scientific rigor and regulatory compliance.

Chronology means causality. Events are ordered. You can trace what led to what. You can detect patterns, understand sequences, and model behavior over time. Did users who signed up in January behave differently than those who signed up in June? Did the change we made in March affect retention in April? You can answer these questions because you have the timeline.

Completeness means depth. Nothing is lost. You're not analyzing a subset or a sanitized view. You have the full history. Every failed login, every abandoned cart, every preference change. That's the raw material for understanding user behavior, training predictive models, and discovering patterns you didn't know to look for.

Traditional databases give you "what." Event Sourcing gives you "how" and "why." And for data science, that changes everything.

Think about what you could do with event data:

Behavioral cohort analysis: Group users by event sequences, not just attributes. "Users who postponed 3+ times are 40% more likely to complete eventually."
Predictive models: Train on event sequences to predict outcomes. "This pattern of events indicates 80% churn risk."
Anomaly detection: Spot unusual event sequences that indicate fraud, bugs, or emerging problems.
Time-series forecasting: Use historical event patterns to predict future load, demand, or behavior.
A/B test analysis: Compare not just outcomes, but the full behavioral journey for each variant.

With Pandas and EventSourcingDB, analyzing event data is now as simple as analyzing CSV files. You don't need to build projections or ETL pipelines just to ask exploratory questions. You read the events, load them into a DataFrame, and go.

Event-driven data science means analyzing the full story, not just the ending.

What You Can Do Now¶

Want to analyze event data yourself? Here's how to get started:

Install the Python SDK with Pandas support:

pip install eventsourcingdb[pandas]

Connect and analyze:

The example above shows the basics. From there, you can:

Filter by event type, subject, or time range
Group events by hour, day, or custom dimensions
Compute statistics, trends, and distributions
Visualize patterns with Matplotlib or Seaborn
Build predictive models with scikit-learn or TensorFlow

Find the full documentation in the Python SDK repository.

Verify data integrity:

If you want to compute Merkle Roots for your own datasets, check out eventsourcingdb-merkle on npm.

Learn more about Event Sourcing and AI:

If you're curious how event-based systems unlock new possibilities for analytics and AI, visit eventsourcing.ai.

This is Just the Beginning¶

Event data + Pandas opens a new world of analysis: not just "what is," but "how it became" and "why it matters." The question is: what stories are hiding in your events?

At the native web GmbH, we've been building event-sourced systems since 2012 and help companies unlock the analytical potential of their data. EventSourcingDB is our answer to making event analysis accessible, powerful, and simple.

If you're interested in exploring how Event Sourcing can transform your data strategy (whether for compliance, analytics, or AI), we'd love to hear from you. Reach out at hello@thenativeweb.io.

And if you want to try EventSourcingDB yourself, head over to our Getting Started guide to install, configure, and write your first events.

The data is already there. Now you can finally ask it the right questions.