build · oxidb v0.25.21 0 entries on disk
The /dev/oxide

A build log on shipping OxiDB — notes, post-mortems, and the occasional flame war about JSON parsing, pressed straight onto an embedded engine running inside this process.

posts/0012.md · 2026-05-01

aggregation pipeline — $match, $group, $sort, $dateHistogram, $percentile

hero image for: aggregation pipeline — $match, $group, $sort, $dateHistogram, $percentile
asset · bucket: blog-images · key: a4145640f48d8960332fc85d.jpg

The aggregation surface is MongoDB-style — same operator names, same input/output shape, same `db.aggregate(coll, [stages...])` API. Stages execute as an iterator chain so intermediate results never need to fit in memory.

Stages implemented today: `$match`, `$group`, `$sort`, `$skip`, `$limit`, `$project`, `$count`, `$unwind`, `$addFields`, `$lookup`, `$dateHistogram`, `$percentile`. Group accumulators: `$sum`, `$avg`, `$min`, `$max`, `$count`, `$push`, `$addToSet`, `$first`, `$last`, plus `$percentile` with exact linear interpolation over the full input.

**$dateHistogram** buckets a date field by interval and applies accumulators per bucket. Intervals: `Ns`/`Nm`/`Nh`/`Nd`/`Nw` (fixed-width) or `1M`/`1y` (calendar). `min_doc_count: 0` fills empty buckets between observed min and max — emitted as a synthetic `$group` chain.

**Index-backed $sort** when the sort field is indexed: the pipeline iterates the BTreeMap directly instead of collecting everything and sorting. `O(limit)` for `$sort + $limit`, no `O(n log n)` price.

**Index-backed $group** when the group key is indexed: buckets are read off the index in key order. The accumulator doesn't need a hash map.

Aggregation is where embedded mode shines hardest. The stages are Rust functions hitting in-process state. No network round-trip per stage, no serialization between them. A four-stage pipeline on 100K docs returns in tens of milliseconds on a laptop.