posts/0018.md · 2026-04-25
Raft replication — persistent log_store, no membership amnesia
OxiDB-server speaks Raft for replication. Set `OXIDB_NODE_ID=<num>` plus `OXIDB_RAFT_PEERS="1=host1:4445,2=host2:4445,3=host3:4445"` and the node joins a quorum. Leader handles writes; followers tail the log. Reads can be served from any node with `read_index` for linearizability or stale-OK from a follower if you ask.
**The log_store rewrite.** Raft state used to be in memory only. Nodes lost cluster membership on restart — rejoined as a fresh peer, ran a fresh election, and occasionally split-brained while the operator stitched things back together. The recent fix moved it to disk:
raft_log.jsonl append-only, one Entry per line.
O(1) per append, no full-file rewrite.
raft_meta.json small, vote + committed + state_machine.
Rewritten on each metadata update — a
few hundred bytes, cheap.
**Result.** A node restart picks up its term, its vote, and its log offset from disk and rejoins without an election storm. The 1M-record load test under failover (which we previously failed) now averages 44K rec/s with leader kills every 30s — log catch-up is fast because the follower already knows what it has.
**Membership changes.** Joint consensus, not unsafe single-server change. Add/remove nodes via `raft_admin` ops on the leader; the cluster catches up the new member before promoting it to voter.
Limitation worth flagging: no log compaction yet. The jsonl grows until manually checkpointed. On the roadmap.