build · oxidb v0.25.21 0 entries on disk
The /dev/oxide

A build log on shipping OxiDB — notes, post-mortems, and the occasional flame war about JSON parsing, pressed straight onto an embedded engine running inside this process.

posts/0003.md · 2026-05-10

from 4 minutes to 200 ms — fixing create_collection contention

hero image for: from 4 minutes to 200 ms — fixing create_collection contention
asset · bucket: blog-images · key: 0d14be683aaa12e9e1a42125.jpg

A scale-bench tenant-creation workload — 1000 collections, eight workers in parallel — hung for four and a half minutes before producing any throughput at all. CPU was idle. The p99 for a single `create_collection` call was 6.25 seconds.

Stack trace pointed at the engine-wide `RwLock<HashMap>` that guards the collection map. Every call held the write lock for the entire `BTreeCollection::open` path, including disk I/O — touching the directory, reading the existing .btree if any, building the empty B-tree image, fsync'ing it into place. While that ran, every other worker waited.

The fix mirrors what `get_or_create_collection` already did: a read-check, then a lock-free `open` off the hot path, then a short write lock with a race-loser recheck. If two workers race to create the same collection, the loser opens the disk image, then drops it on the floor when the recheck shows the winner's entry is already in the map.

Same bench, after:

Phase 1 wall: 4m18s → 205 ms (~1260× speedup)

CreateCollection p99: 6.25s → 1 ms

We pushed harder: 10,000 collections, patched binary, same eight workers. p99 11 ms. RSS 1.24 GiB. ListCollections p99 22 ms. At that point the limit is the user's directory inodes, not OxiDB. This puts collection-prefix multi-tenancy on the table as a real strategy: one collection per tenant, indexed and isolated, with create-cost on the order of a single `fs::rename`.