Skip to content

One Line at a Time

When you build a database for Event Sourcing, one of the early design decisions is deceptively simple: how do you send data from the server to the client? JSON is the obvious answer. Every language has a parser, every developer knows the format, and every HTTP client handles it out of the box. But standard JSON has a fundamental limitation that becomes a showstopper the moment you deal with event streams.

We chose NDJSON as EventSourcingDB's wire format for streaming data. It's not a new technology. It's not exciting. It's barely even a specification. But it turned out to be exactly the right choice, and the story of how we got there is worth telling.

The Problem With JSON Arrays

The standard way to return a collection of items from an HTTP API is to wrap them in a JSON array:

[
  { "type": "OrderPlaced", "data": { "orderId": "42" } },
  { "type": "PaymentReceived", "data": { "amount": 29.99 } },
  { "type": "OrderShipped", "data": { "trackingId": "X7" } }
]

This works well when you know all the items upfront and the collection is small. The server collects everything, serializes the array, and sends it. The client receives the complete response and parses it. Simple.

But this model breaks down in Event Sourcing for three reasons.

First, the server must buffer everything before sending. It has to load all events into memory, serialize the complete array, and only then start transmitting. For a subject with a few dozen events, this is fine. For a backup containing millions of events, it means the server needs gigabytes of RAM just to assemble the response.

Second, the client must wait for the entire response before it can start processing. A JSON array is not valid JSON until the closing bracket arrives. The client sits idle while the server assembles and transmits megabytes or gigabytes of data. For time-sensitive operations like observing new events as they happen, this is not just slow. It is fundamentally incompatible with the use case.

Third, some streams never end. When a client observes events in real time, the connection stays open indefinitely. New events arrive whenever they're written. There is no closing bracket, no end of the array, no complete response. JSON arrays simply cannot represent an open-ended stream.

What NDJSON Is

NDJSON stands for Newline-Delimited JSON. The concept is almost trivially simple: each line is a valid JSON object, and lines are separated by a newline character. That's it.

{"type":"event","payload":{"subject":"/orders/42","type":"OrderPlaced"}}
{"type":"event","payload":{"subject":"/orders/42","type":"PaymentReceived"}}
{"type":"event","payload":{"subject":"/orders/42","type":"OrderShipped"}}

No wrapping array. No commas between items. No opening or closing brackets. Each line is independent and self-contained. You can parse it the moment it arrives, without waiting for anything else.

The server sends each item as soon as it's ready. Read an event from disk, serialize it, write it to the connection, move on. No buffering, no accumulating, no memory pressure. The client processes each item as it arrives. Read a line, parse it, handle it, read the next line. For a stream of a million events, the client starts working on the first event before the server has even reached the hundredth.

The content type is application/x-ndjson, and it uses chunked transfer encoding over standard HTTP. No protocol upgrade, no special handshake, no additional infrastructure. Just HTTP, the way it has always worked.

Why This Matters for Event Sourcing

NDJSON is not just a nice-to-have for EventSourcingDB. It is what makes several core features possible in the first place.

Reading events from a subject is the most basic operation. When you replay a subject's history to rebuild its current state, NDJSON lets the server stream events directly from disk. Each event goes out as it's read, without waiting for the rest. As we discussed in The Snapshot Paradox, most subjects have far fewer events than people expect. But even when a subject has thousands of events, the client can start applying them immediately instead of waiting for all of them to arrive.

Observing events is where NDJSON truly shines. When a client wants to be notified about new events as they happen, the connection stays open. The server pushes each new event as a single NDJSON line the moment it is written. Between events, periodic heartbeat messages keep the connection alive:

{"type":"heartbeat","payload":{}}

This prevents proxies and load balancers from closing idle connections. The heartbeat is just another NDJSON line. No special mechanism, no separate keep-alive channel. We use the same approach in Predicting Failures Before They Happen, where a customer observes machine telemetry events in real time to detect anomalies before they cause downtime.

Backup and restore is where the scale becomes evident. Exporting a full database backup means streaming every event ever recorded. This can be millions or even billions of events. With NDJSON, the backup is simply a text file where each line is an event. You can process it with standard Unix tools. You can grep through it, filter it with awk, or pipe it through custom scripts for GDPR compliance. Try doing that with a multi-gigabyte JSON array.

EventQL queries return results as they are computed. As we described in Designing EventQL, our query language processes event streams and produces results on the fly. Those results flow back to the client line by line, just like any other NDJSON response.

The common pattern across all of these: the amount of data is unknown when the response starts. You do not know how many events a subject has until you have read them all. You do not know when the next observed event will arrive. You do not know how large the backup will be. NDJSON handles all of these cases with the same simple mechanism.

The Alternatives We Didn't Choose

We did not arrive at NDJSON by default. We considered the alternatives carefully, and each had specific reasons why it did not fit.

Server-Sent Events (SSE) was actually the closest contender and in many ways the most natural fit. SSE was designed exactly for this pattern: a server pushing events to a client over HTTP. Browsers even have a native EventSource object built specifically for consuming SSE streams. For an event-sourcing database, using a technology literally called EventSource feels almost poetic.

But SSE has a critical limitation: it does not support custom HTTP headers. The browser's EventSource API provides no way to set an Authorization header with a Bearer token. Since every EventSourcingDB API call requires authentication, this was a dealbreaker. If SSE supported custom headers, it would likely be our technology of choice. But it doesn't, and it has been this way since the specification was first published. It is a frustrating gap that has been discussed for years without resolution.

WebSockets solve the header problem but introduce a host of others. WebSockets are a separate protocol with their own handshake, their own connection lifecycle, and their own security model. This means additional complexity on the server, additional complexity in the client, and additional concerns around authentication and authorization. WebSocket connections also cause problems with many load balancers and reverse proxies because of the protocol upgrade during the handshake. And you cannot test a WebSocket endpoint with curl. You need specialized tools and libraries for every step.

The main strength of WebSockets is bidirectional communication. But our streaming use cases are strictly unidirectional: the server sends, the client receives. Using a bidirectional protocol for a unidirectional problem means paying for complexity you do not use.

GraphQL Subscriptions offer another approach through GraphQL's subscription mechanism. But GraphQL comes with massive overhead: a query language, a type system, resolvers, and an entirely different mental model for API design. It is a huge topic in itself, and adopting it solely for streaming would be disproportionate. More importantly, GraphQL's schema-oriented approach expects you to define the structure of your responses upfront. In an event-sourcing database, the events a query returns depend entirely on what was written. Different subjects have different event types with different payloads. A fixed schema does not fit a world where "whatever was written" is the response.

We also deliberately chose not to evaluate gRPC streaming in depth. Our focus was on HTTP as the underlying protocol: human-readable, debuggable with curl, accessible from any browser, and compatible with every proxy, CDN, and load balancer in existence. gRPC uses HTTP/2 under the hood but wraps it in a binary protocol that sacrifices exactly these properties.

One note on naming: you might have seen the term JSONL (JSON Lines) alongside NDJSON. They are essentially the same format with different names. Both specify one JSON object per line, separated by newlines. We could have picked either one. We went with NDJSON because in our experience, JSONL is easily confused with JSON-LD, which is something entirely different. NDJSON is simply the clearer term. But if you see JSONL elsewhere, know that it is technically the same idea.

What It Looks Like in Practice

Every NDJSON response in EventSourcingDB follows a consistent envelope format. Each line has a type field that tells you what kind of item it is, and a payload with the actual content:

{"type":"event","payload":{"subject":"/orders/42","type":"OrderPlaced","data":{"customerId":"c-1"}}}
{"type":"event","payload":{"subject":"/orders/42","type":"PaymentReceived","data":{"amount":29.99}}}
{"type":"error","payload":{"message":"Read timeout after 30s"}}

Notice the error on the third line. This is a deliberate design choice. The HTTP response started with status 200. The first two events streamed successfully. Then something went wrong. In a traditional JSON API, errors are signaled through HTTP status codes. But when you are mid-stream, the status code has already been sent. You cannot change it retroactively.

NDJSON handles this gracefully. Errors are just another type of line in the stream. The client processes events as they arrive, and when it encounters an error line, it handles it accordingly. No special error channel, no out-of-band signaling. The error is part of the conversation, right where it belongs.

This also means that client code follows a natural pattern: read a line, check its type, dispatch accordingly. Events go to the event handler, heartbeats get acknowledged, errors trigger error handling. The same loop works regardless of what the server sends.

The Format That Disappears

The best infrastructure is the kind you do not think about. NDJSON has this quality. It is so simple that it barely registers as a technology choice. One JSON object per line. A newline in between. That is the entire specification.

But this simplicity is precisely what makes it powerful. It works with curl. It works in browsers. It works with every HTTP library in every programming language. It plays nicely with proxies, load balancers, and CDNs. It can be processed with grep, awk, and jq. It can represent three events or three billion events using the same mechanism. It handles open-ended streams as naturally as finite ones.

When we were designing EventSourcingDB, we wanted the API to be something developers could explore with nothing more than a terminal and curl. NDJSON makes that possible. You fire a request, and events start appearing in your terminal, one per line, readable and parseable. No setup, no specialized tools, no protocol negotiations. Sometimes the simplest choice is not a compromise. It is the best engineering decision you can make.

If you want to see this in action, the API overview in our documentation walks you through the format and shows how to consume NDJSON streams. And if you have questions about integrating with EventSourcingDB's streaming API, or if you are curious how NDJSON compares to your current approach, reach out to us at hello@thenativeweb.io. We always enjoy talking about the details.