The Order That Vanished Into Thin Air (And What Postgres Taught Me About It)

A real incident where a saved order never triggered its event — and how the Outbox Pattern with Postgres WAL finally made my services stop lying to each other.

PostgreSQLBackend EngineeringDistributed SystemsEvent-Driven ArchitectureGo

TL;DR

A user placed an order. My database saved it. My event broker never heard about it.
That silent failure forced me to learn the Outbox Pattern — and wiring it to Postgres WAL made dual writes a problem I'll never have again.

It Started With a Missing Confirmation Email

I was building an order service. Simple enough — user places order, Postgres saves it, Kafka fires an event, downstream handles the email and inventory.

Deployed it. Tested it. Looked clean.

Then my friend placed a test order. The order showed up in the database. His dashboard updated. The confirmation email never arrived.

I checked the Kafka consumer. No message. I checked the producer logs. No error. The order was there — timestamped, persisted, confirmed. The event had simply... not happened.

I retried under load. Every third or fourth order — same thing. No crash. No log. Just silence.

Two hours of staring at code before I saw it:

db.Exec("INSERT INTO orders ...")              // Step 1
kafka.Publish("order.created", payload)        // Step 2

Two operations. Two separate systems. No shared transaction.

Under load, a network hiccup or process crash between Step 1 and Step 2 — the database commits and the event disappears. I had a dual-write problem. I was trusting two independently-failing systems to stay in sync by luck.

That's not reliability. That's hope.

The Fix: Write to Postgres Twice, Not Two Systems Once

The Outbox Pattern is simple. Instead of writing to the DB and publishing to Kafka separately, you write to the DB twice — your main table and an outbox_events table — inside a single transaction. A separate relay then reads the outbox and publishes.

CREATE TABLE outbox_events (
  id          UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  event_type  TEXT NOT NULL,
  payload     JSONB NOT NULL,
  created_at  TIMESTAMPTZ DEFAULT now(),
  published   BOOLEAN DEFAULT false
);

Both writes, one transaction:

err := pgx.BeginFunc(ctx, db, func(tx pgx.Tx) error {
    _, err := tx.Exec(ctx, "INSERT INTO orders ...")
    if err != nil {
        return err
    }
    _, err = tx.Exec(ctx,
        "INSERT INTO outbox_events (event_type, payload) VALUES ($1, $2)",
        "order.created", payload,
    )
    return err
})

If the transaction rolls back — both disappear. If it commits — both are there. The outbox only ever contains events that are 100% persisted.

The question becomes: how does the relay know when to publish?

Polling works. But polling hammers the database every few seconds whether there's anything to publish or not. Under high write volume, the lag adds up. There's a better way.

Why WAL Is the Right Answer

WAL is Postgres's Write-Ahead Log — the internal journal where every committed change is recorded before it touches the actual table. It's what makes crash recovery work.

Postgres also exposes WAL as a logical replication stream. Subscribe to it, and you get a real-time feed of every INSERT, UPDATE, DELETE on any table you choose — the moment the transaction commits.

No polling. No lag. The outbox row lands, your relay gets it instantly.

Setting It Up

Enable logical replication in postgresql.conf:

wal_level = logical

Create a replication slot and publication targeting just the outbox table:

SELECT pg_create_logical_replication_slot('outbox_slot', 'pgoutput');
CREATE PUBLICATION outbox_pub FOR TABLE outbox_events;

The Relay Service

Writing the code to parse logical replication messages can get verbose. In short, the background relay service is completely independent of the main app. It connects to the replication slot, parses the binary WAL data, publishes to your broker (e.g., Kafka), and strictly acknowledges the LSN back to Postgres.

The moment a transaction commits, the relay receives the row and pushes it to the broker. WAL is the ground truth. Everything else just listens.

Try It

I've open-sourced a full reference implementation of this exact pattern in Go. It includes the API, the decoupled WAL Relay service using pglogrepl, and the Postgres configuration.

👉 outbox-pattern-go on GitHub

Before You Ship This

Three things to keep in mind:

At-least-once delivery. If the relay crashes after publishing but before acknowledging the LSN, the event fires again on restart. Make your consumers idempotent — deduplicate on event ID.

Replication slot lag. Unused slots accumulate WAL on disk indefinitely. Monitor lag. Drop slots you're not consuming.

Cleanup. Periodically purge published rows. Don't let outbox_events grow forever.

The Rule I Think About Now

Never trust two systems to agree by luck. Make agreement structurally impossible to break.

The order doesn't fire an event. The committed transaction is the event. Kafka just finds out about it.

If You're Building Event-Driven Services

Don't wait for your first mysteriously missing email.

The silence in my error logs since refactoring around this pattern has been deeply satisfying.

Building reliable systems one embarrassing incident at a time — more posts soon.