build · oxidb v0.25.21 0 entries on disk
The /dev/oxide

A build log on shipping OxiDB — notes, post-mortems, and the occasional flame war about JSON parsing, pressed straight onto an embedded engine running inside this process.

posts/0002.md · 2026-05-11

encode 3× faster, output 40% smaller — one-line codec rewrite

hero image for: encode 3× faster, output 40% smaller — one-line codec rewrite
asset · bucket: blog-images · key: 06adb6b29c29e7375ce78fc3.jpg

Writes have always felt slower than reads on this engine and I never figured out why. So I wrote a micro-bench that breaks encode apart by document shape and ran it against six payload profiles — flat scalars, nested objects, arrays of strings, arrays of small objects, and a 'large realistic' doc that looks like the events shape we see in real-world traffic.

The result was ugly: on the large doc, `codec::encode_doc` took 61 µs. Plain `serde_json::to_vec` of the same Value took 4.4 µs. JSONB encode is **13× slower** than JSON text encode. And the output is 1.6× larger.

Reading the jsonb 0.5 serializer made the cause obvious. `ObjectSerializer::serialize_value` allocates a fresh `Serializer` (with its own `Vec<u8>`) for every map value, materializes each child as an intermediate `OwnedJsonb`, then concatenates them in `ObjectBuilder::build → to_vec → buffer.append`. Two extra allocations and one memcpy per field. It also wraps every scalar inside a container with a 4-byte `SCALAR_CONTAINER_TAG` — the cause of the 40% bloat.

Fix is a one-line route change in `codec::encode_doc`: instead of `jsonb::to_owned_jsonb(value)`, do `serde_json::to_writer(...)` followed by `jsonb::parse_owned_jsonb_standard_mode(text)`. The output is still a valid `OwnedJsonb` and decodes through the same paths. Large-doc encode drops from 61 µs to 20 µs (3.0×); 50-event array drops from 48 µs to 12 µs (3.9×). The on-disk image shrinks 30–58% depending on shape. Even the WAL entries are smaller — same test, 20 inserts: WAL goes from 3821 B to 1459 B, a 62% reduction with no other code change.

Strict mode (not extended) because `serde_json` always emits strict JSON — no NaN, no Infinity, no leading plus signs to accommodate.