Bulk operations¶
Single round-trip operations for inserting, updating, or deleting many records at once. All support dry_run (for update / delete) and respect CAS. Parallel index build on bulk-insert.
bulk-insert¶
Shape¶
{
"mode": "bulk-insert",
"dir": "<dir>",
"object": "<obj>",
"records": [
{"id":"<key>","data":{...}},
{"id":"<key>","data":{...}}
]
}
Or from a file:
File content is the same JSON array (not NDJSON — one big array).
Record shape¶
"id"— the record key."data"— an object whose fields match the typed schema.
Upsert semantics¶
bulk-insert is a true upsert: if a key already exists, its record is overwritten and any stale index entries pointing at the old value are dropped before the new entry is written. Pass "if_not_exists": true to make it idempotent (existing keys are skipped, response includes "skipped":N).
Response¶
With if_not_exists and pre-existing keys:
If any records were rejected (type mismatch, missing required field, etc.):
Performance¶
- Single-request bulk loads — ~117 k records/sec single-thread on the invoice schema (1 M records, 14 indexes, splits=64). Add-indexes-from-scratch ≈ 350 k records/sec equivalent. See the README performance tables for the full set.
- Parallel index build — one worker per indexed field, each streaming the per-shard merges sequentially within the worker (per-shard btree layout, 2026.05.1+). 14 fields ×
splits/4shards run as 14 dispatched tasks. - Indexed bulk-insert chunk-size guidance — for parallel inserts into a pre-existing-indexed object, prefer fewer, larger bulk-insert calls. Each call triggers a sequential
bulk_mergeper (field, shard); cumulative work scalesO(R²)in request countR. At 1 M records, 5 connections × 200 K each is the sweet spot. Bigger chunks always win on the indexed path.
Pattern: load + index in one go¶
Create the object with indexes listed, then bulk-insert. Indexes are built as part of the load:
{"mode":"create-object","dir":"default","object":"users","splits":16,"max_key":128,
"fields":["email:varchar:200","name:varchar:100","age:int"],
"indexes":["email","age"]}
bulk-insert-delimited¶
CSV-style flat files. Faster to generate than JSON for large exports.
{
"mode": "bulk-insert-delimited",
"dir": "<dir>",
"object": "<obj>",
"file": "/path/to/data.csv",
"delimiter": "|"
}
Wire format¶
- Every line is a record. No header row.
- First column = key.
- Remaining columns = field values in
fields.conforder (skipping tombstoned fields). - Default delimiter:
|. Pass any single character via"delimiter".
Example file (delimiter |):
For an object with fields name:varchar:100|email:varchar:200|age:int|active:bool.
Use | (default) or , for CSV — any byte that isn't present in your data works.
Value encoding¶
- varchar — raw bytes (no quoting).
- int / long / short — ASCII digits, optional
-prefix. - bool —
true/false. - date —
yyyyMMdd. - datetime —
yyyyMMddHHmmss. - numeric — decimal matching the scale.
No escaping. Pick a delimiter your data doesn't contain.
bulk-delete¶
By key list¶
Or from file:
File = JSON array of keys.
Response: {"deleted": <N>} (keys that weren't present are silently skipped).
By criteria¶
{
"mode": "bulk-delete",
"dir": "<dir>",
"object": "<obj>",
"criteria": [{"field":"status","op":"eq","value":"cancelled"}],
"limit": 1000,
"dry_run": true
}
Response (dry run):
Response (actual):
limitcaps the number of deletions (defensive — prevents runaway deletes).skippedcounts records that matched the criteria but failed anifcheck (not shown above, but possible when combined).- Always dry-run first for criteria-based deletes in production.
bulk-update¶
{
"mode": "bulk-update",
"dir": "<dir>",
"object": "<obj>",
"criteria": [{"field":"status","op":"eq","value":"pending"}],
"value": {"status":"expired"},
"limit": 10000,
"dry_run": true
}
- Updates every record matching
criteriaby mergingvalueinto the existing record. - Per-record CAS applies:
auto_updatefields bump, indexes stay in sync. dry_run:truereturns the match count without writing.limitcaps the update.
Response:
Dry-run workflow¶
For bulk-update and bulk-delete on criteria, always run dry_run:true first in production:
// 1. Dry run — check blast radius
{..., "dry_run":true}
// 2. Execute — same criteria, drop dry_run
{..., "limit":10000}
Crash safety¶
- Bulk ops process records one at a time with per-record writes. A mid-batch crash leaves the already-written records committed — not all-or-nothing.
- For atomic bulk operations across records, there's no option today. Stage intent in a control record and reconcile in the app.
Performance tips¶
- Bulk-insert over many singles — amortizes connection setup, schema lookup, and (critically) enables the parallel index build.
- Create indexes before the bulk load — so the load builds them inline. Adding indexes on a loaded object re-scans everything.
- Delimited (
|-separated) over JSON for huge CSV ingest — parses ~10–20% faster. - Tune
MAX_REQUEST_SIZEif sending multi-million-record bulk loads in one request. 32 MB default holds ~300 k records at typical sizes. - Pipeline many moderate bulk-inserts rather than one huge one — keeps each request under the size limit and lets the server progress-log in info logs.
What bulk ops don't support¶
- Per-record error recovery in JSON — if record #50 fails type validation, records 1–49 are committed and the response's
errorscount goes up. The failed record's details are logged but not returned to the client today. - Heterogeneous objects in one call — a bulk operation is scoped to one
(dir, object)pair.