Skip to content

Bulk operations

Single round-trip operations for inserting, updating, or deleting many records at once. All support dry_run (for update / delete) and respect CAS. Parallel index build on bulk-insert.

bulk-insert

Shape

{
  "mode": "bulk-insert",
  "dir": "<dir>",
  "object": "<obj>",
  "records": [
    {"id":"<key>","data":{...}},
    {"id":"<key>","data":{...}}
  ]
}

Or from a file:

{"mode":"bulk-insert","dir":"<dir>","object":"<obj>","file":"/path/to/data.json"}

File content is the same JSON array (not NDJSON — one big array).

Record shape

  • "id" — the record key.
  • "data" — an object whose fields match the typed schema.

Upsert semantics

bulk-insert is a true upsert: if a key already exists, its record is overwritten and any stale index entries pointing at the old value are dropped before the new entry is written. Pass "if_not_exists": true to make it idempotent (existing keys are skipped, response includes "skipped":N).

Response

{"count": 10000}

With if_not_exists and pre-existing keys:

{"count": 9997, "skipped": 3}

If any records were rejected (type mismatch, missing required field, etc.):

{"count": 9997, "errors": 3, "error": "some_records_dropped"}

Performance

  • Single-request bulk loads — ~117 k records/sec single-thread on the invoice schema (1 M records, 14 indexes, splits=64). Add-indexes-from-scratch ≈ 350 k records/sec equivalent. See the README performance tables for the full set.
  • Parallel index build — one worker per indexed field, each streaming the per-shard merges sequentially within the worker (per-shard btree layout, 2026.05.1+). 14 fields × splits/4 shards run as 14 dispatched tasks.
  • Indexed bulk-insert chunk-size guidance — for parallel inserts into a pre-existing-indexed object, prefer fewer, larger bulk-insert calls. Each call triggers a sequential bulk_merge per (field, shard); cumulative work scales O(R²) in request count R. At 1 M records, 5 connections × 200 K each is the sweet spot. Bigger chunks always win on the indexed path.

Pattern: load + index in one go

Create the object with indexes listed, then bulk-insert. Indexes are built as part of the load:

{"mode":"create-object","dir":"default","object":"users","splits":16,"max_key":128,
 "fields":["email:varchar:200","name:varchar:100","age:int"],
 "indexes":["email","age"]}

bulk-insert-delimited

CSV-style flat files. Faster to generate than JSON for large exports.

{
  "mode": "bulk-insert-delimited",
  "dir": "<dir>",
  "object": "<obj>",
  "file": "/path/to/data.csv",
  "delimiter": "|"
}

Wire format

  • Every line is a record. No header row.
  • First column = key.
  • Remaining columns = field values in fields.conf order (skipping tombstoned fields).
  • Default delimiter: |. Pass any single character via "delimiter".

Example file (delimiter |):

u1|Alice|alice@x.com|30|true
u2|Bob|b@x.com|22|true
u3|Carol|c@x.com|45|false

For an object with fields name:varchar:100|email:varchar:200|age:int|active:bool.

Use | (default) or , for CSV — any byte that isn't present in your data works.

Value encoding

  • varchar — raw bytes (no quoting).
  • int / long / short — ASCII digits, optional - prefix.
  • booltrue / false.
  • dateyyyyMMdd.
  • datetimeyyyyMMddHHmmss.
  • numeric — decimal matching the scale.

No escaping. Pick a delimiter your data doesn't contain.

bulk-delete

By key list

{
  "mode": "bulk-delete",
  "dir": "<dir>",
  "object": "<obj>",
  "keys": ["k1", "k2", "k3"]
}

Or from file:

{"mode":"bulk-delete","dir":"<dir>","object":"<obj>","file":"/path/to/keys.json"}

File = JSON array of keys.

Response: {"deleted": <N>} (keys that weren't present are silently skipped).

By criteria

{
  "mode": "bulk-delete",
  "dir": "<dir>",
  "object": "<obj>",
  "criteria": [{"field":"status","op":"eq","value":"cancelled"}],
  "limit": 1000,
  "dry_run": true
}

Response (dry run):

{"matched":437,"deleted":0,"skipped":0,"dry_run":true}

Response (actual):

{"matched":437,"deleted":437,"skipped":0}

  • limit caps the number of deletions (defensive — prevents runaway deletes).
  • skipped counts records that matched the criteria but failed an if check (not shown above, but possible when combined).
  • Always dry-run first for criteria-based deletes in production.

bulk-update

{
  "mode": "bulk-update",
  "dir": "<dir>",
  "object": "<obj>",
  "criteria": [{"field":"status","op":"eq","value":"pending"}],
  "value": {"status":"expired"},
  "limit": 10000,
  "dry_run": true
}
  • Updates every record matching criteria by merging value into the existing record.
  • Per-record CAS applies: auto_update fields bump, indexes stay in sync.
  • dry_run:true returns the match count without writing.
  • limit caps the update.

Response:

{"matched":2450, "updated":2450, "skipped":0}

Dry-run workflow

For bulk-update and bulk-delete on criteria, always run dry_run:true first in production:

// 1. Dry run — check blast radius
{..., "dry_run":true}

// 2. Execute — same criteria, drop dry_run
{..., "limit":10000}

Crash safety

  • Bulk ops process records one at a time with per-record writes. A mid-batch crash leaves the already-written records committed — not all-or-nothing.
  • For atomic bulk operations across records, there's no option today. Stage intent in a control record and reconcile in the app.

Performance tips

  • Bulk-insert over many singles — amortizes connection setup, schema lookup, and (critically) enables the parallel index build.
  • Create indexes before the bulk load — so the load builds them inline. Adding indexes on a loaded object re-scans everything.
  • Delimited (|-separated) over JSON for huge CSV ingest — parses ~10–20% faster.
  • Tune MAX_REQUEST_SIZE if sending multi-million-record bulk loads in one request. 32 MB default holds ~300 k records at typical sizes.
  • Pipeline many moderate bulk-inserts rather than one huge one — keeps each request under the size limit and lets the server progress-log in info logs.

What bulk ops don't support

  • Per-record error recovery in JSON — if record #50 fails type validation, records 1–49 are committed and the response's errors count goes up. The failed record's details are logged but not returned to the client today.
  • Heterogeneous objects in one call — a bulk operation is scoped to one (dir, object) pair.