Schema mutations¶

Evolve object schemas without downtime. All mutations are atomic at the object level — a per-object write-lock blocks normal ops briefly during the rebuild.

create-object¶

Create a new typed object. See Quick start for a full example.

{
  "mode": "create-object",
  "dir": "<dir>",
  "object": "<obj>",
  "splits": 16,
  "max_key": 128,
  "fields": [
    "name:varchar:100",
    "age:int",
    "balance:numeric:12,2",
    "active:bool"
  ],
  "indexes": ["name","age"]
}

Param	Required	Default	Meaning
`dir`	yes	—	Tenant dir. Auto-registered if new.
`object`	yes	—	Object name.
`splits`	no	`8` (`DEFAULT_SPLITS`)	Initial shard count. Must be a power of 2 in `[8, 4096]` (`MIN_SPLITS`–`MAX_SPLITS`). The default is tuned for sub-1M-row objects; pass `splits` explicitly for larger workloads.
`max_key`	no	`64`	Max key length in bytes. Hard ceiling 1024 (`MAX_KEY_CEILING`).
`fields`	yes	—	Array of typed field specs. See Concepts → Typed records.
`indexes`	no	`[]`	Fields to index at creation. Single or composite (`"a+b"`).
`auto_key`	no	(none)	Opt into server-generated keys at insert time. `"uuid"` → 16-byte UUIDv4 binary, rendered as 36-char dashed string on read (requires `max_key >= 16`). `"seq(<name>)"` → 8-byte int64 BE from a named sequence, rendered as decimal string on read (requires `max_key >= 8`; sequence is pre-initialised to 0 if absent, first `next` returns 1). Set once at create-object and immutable for the object's life — there is no `set-auto-key` mutation. See Auto-generated keys below.

value_size (the per-record payload size, stored in segment files) is always computed as the sum of typed-field sizes — not user-configurable. Returned in create-object and describe-object responses; recorded internally for slot-size accounting.

Response: {"status":"created","object":"...","splits":N,"max_key":N,"value_size":N,"fields":N}.

Auto-generated keys¶

Declare "auto_key":"uuid" or "auto_key":"seq(<name>)" at create-object to turn on server-generated keys.

{
  "mode":"create-object","dir":"<d>","object":"users",
  "splits":16,"max_key":16,
  "fields":["name:varchar:64"],
  "auto_key":"uuid"
}

Insert with the key omitted → server generates per the object's mode, returns the rendered key:

// request
{"mode":"insert","dir":"<d>","object":"users","value":{"name":"Alice"}}

// response
{"status":"inserted","key":"7a8c2f12-9d31-4abc-9c4a-1a2b3c4d5e6f"}

Insert with a key provided → upsert (exists → update, else → insert). The provided key must parse in the rendered form (36-char dashed UUID or decimal int).

// request
{"mode":"insert","dir":"<d>","object":"orders","key":"42","value":{"amount":99}}

// response
{"status":"inserted","key":"42"}

CAS modifiers (if_not_exists, if) apply to provided-key inserts as usual. Omit-key + if:{...} is rejected at parse time (the predicate applies to a specific record; auto-gen doesn't compose).

Storage shape — keys are stored in their on-disk binary form: 16 bytes for uuid, 8 bytes BE int64 for seq. Wire I/O always renders as the canonical string form (UUID dashed, seq decimal). get / delete / find / keys / fetch all accept and emit the rendered form.

bulk-insert — per-record omit-key gets auto-generated; per-record provided-key upserts. The whole batch is refused up front if any provided key is malformed. Generated keys for the batch are allocated in one shot (single /dev/urandom read for UUID; single seq flock for seq) so per-record overhead stays low. Per-record CAS is enforced: omit-key records take the strict-insert path (collision → that single record is condition_not_met and counted in skipped, the other records still write) while provided-key records remain upsert. The check piggybacks on the existing kf-lookup pass — zero extra lookups, zero added latency for non-auto-key bulk-insert.

// request
{"mode":"bulk-insert","dir":"<d>","object":"orders",
 "records":[{"value":{"amount":1}},
            {"key":"500","value":{"amount":500}},
            {"value":{"amount":2}}]}

// response — keys[] preserves input order
{"status":"bulk-inserted","count":3,"skipped":0,"keys":["4","500","5"]}

The dict form ({"k1":{...},"k2":{...}}) has keys baked into the wire shape — every entry is treated as provided-key.

bulk-insert-delimited (CSV / TSV / pipe / etc.) also supports auto-key: per row, an empty first column means "auto-generate" and a non-empty first column is parsed as a wire-form key (upsert). Response shape matches the JSON form ({"status":"bulk-inserted","count":N,"skipped":M,"keys":[...]} for auto-key objects). When using inline data in a JSON request, the standard JSON escapes (\n, \r, \t, \", \\, \uXXXX) are decoded before parsing — so newline-separated records work as expected.

{"mode":"bulk-insert-delimited","dir":"<d>","object":"orders",
 "delimiter":",",
 "data":",100\n42,42\n,200\n"}

// response (seq watermark was at 5):
{"status":"bulk-inserted","count":3,"skipped":0,"keys":["6","42","7"]}

update / delete require a key as today. auto_key only fires on insert; update with no key errors with the usual "Missing key" message.

Constraints:

uuid mode → max_key must be at least 16.
seq(<name>) mode → max_key must be at least 8.
The sequence name must be valid (valid_field_name() rules — no :, +, /, spaces, parens). The sequence file lives at <obj>/metadata/sequences/<name> and is shared with any field that also uses :default=seq(<name>).
auto_key is persisted as the trailing token on the schema.conf line: dir:object:splits:max_key:2:streams:auto_key=....
No retroactive enable — auto_key can only be set at create-object. There is no schema mutation to add or change auto_key later. Revisit only if customers need it.
Seq collisions — for seq mode, if you manually insert records with numeric keys at or above the current sequence value, the next auto-generated insert can collide. Single insert returns {"error":"condition_not_met"} for that record; bulk-insert (JSON + delimited) skips just the colliding record (skipped:N in the response) and inserts every other auto-gen normally — the manual record's data is never silently overwritten. UUID collisions are effectively impossible at any realistic scale.

add-field¶

Append new fields to an existing object.

{
  "mode": "add-field",
  "dir": "<dir>",
  "object": "<obj>",
  "fields": [
    "phone:varchar:20",
    "verified:bool:default=false",
    "rowid:long:default=seq(rowid)",
    "trace_id:varchar:36:default=uuid()",
    "nonce:varchar:18:default=random(8)"
  ]
}

What happens¶

Takes the object's write lock (objlock_wrlock).
Builds a new shard layout with the extra fields appended.
Backfill pass — re-encodes every existing record: prior field values preserved, new fields stamped with their computed default (see table below).
Atomically swaps (.new → original rename).
Rebuilds indexes (none referencing the new field yet; existing indexes are preserved).
Releases the write lock.

Computed defaults on backfill¶

When the new field's spec includes a default modifier, the rebuild walk applies it to every existing record:

Modifier	Backfill behaviour
`:default=<literal>`	Stamped verbatim on every existing record. Goes through the same type-aware encoder used at insert time (int BE, varchar length prefix, numeric scaling, etc.).
`:default=seq(<name>)`	The server reserves a contiguous range `[start, start+live_count)` from the named sequence in one flock, then assigns the values sequentially as the walk progresses. After the rebuild, the next insert-time `seq(<name>)` call resumes from `start+live_count`.
`:default=uuid()`	Fresh UUIDv4 generated per record (`/dev/urandom`). For `varchar:36` fields the canonical 36-char dashed string is stored; for the native `uuid` type the raw 16 bytes are stored.
`:default=random(<N>)`	Fresh `N` random bytes per record (`/dev/urandom`), hex-encoded to `2N` characters. The request is refused pre-flight if `2N` exceeds the field's storage cap — no rebuild is started in that case.
`:auto_create` / `:auto_update`	Inert during backfill. These are insert/update-time generators — the original record's creation timestamp is unknown, so stamping `now()` on every row would lie about history. Existing records keep zero bytes for this field; future inserts/updates fire the generator as normal.
(no modifier)	Existing records' bytes for the new field are zero. Decoders render that as the type's "absent" form (empty string for varchar, `0` for int, etc.).

Notes¶

Existing record count and hash routing are preserved.
Full object rebuild — scales with object size. Not instantaneous on millions of records.
Sequence allocation is exact at walk start: the live count is read from the kf-header summary on the legacy slotcask handle, which is the authoritative source of record counts in v2.

edit-field¶

Edit one or more existing fields in place — same-type only. Used to grow/shrink a varchar, widen/narrow an integer family field, change a numeric's scale, widen float → double, or append / rename / widen an enum.

{
  "mode": "edit-field",
  "dir": "<dir>",
  "object": "<obj>",
  "fields": [
    "name:varchar:200",
    "age:long",
    "balance:numeric:18,4"
  ]
}

For enum renames, set "allow_rename": true at the top level — without it, any change at an existing enum position is rejected (renames are easy to typo and would silently relabel every existing record):

{
  "mode": "edit-field",
  "dir": "acme", "object": "items",
  "fields": ["color:enum(crimson,green,blue,yellow)"],
  "allow_rename": true
}

CLI shortcut (single-field — JSON form covers batch):

./shard-db edit-field <dir> <obj> 'name:varchar:200'

What changes are allowed¶

Edit	Rule
`varchar:N → varchar:M`	Grow always allowed. Shrink refused pre-flight if any live record's content length exceeds `M`.
Integer family (`short ↔ int ↔ long`)	Widen always allowed (sign-extension preserves negatives). Narrow refused pre-flight if any live record's value falls outside the new type's `[-2^(N×8-1), 2^(N×8-1) − 1]` range.
`numeric:P,S1 → numeric:P,S2`	Scale-up multiplies the stored `int64` by `10^(S2−S1)`; refused pre-flight if any value would overflow `int64`. Scale-down divides and truncates toward zero (matches Postgres).
`float → double`	Always allowed; IEEE 754 widen, no validation needed.
`enum(a,b,c) → enum(a,b,c,d,…)` (append)	Always allowed. Existing records keep their byte index; new value gets the next index. No rebuild needed; only `fields.conf` is updated.
`enum(a,b,c) → enum(a,b,c,…257+ values)` (auto-widen 1B → 2B)	Allowed. Triggers a full record rebuild (zero-extends each record's byte index) and rewrites the bitmap with the wider encoding.
`enum(a,b,c) → enum(x,b,c)` (rename at position)	Requires `"allow_rename": true` in the request body. Existing records keep their byte index; the displayed value changes. Without the flag, refused.
`enum(a,b,c) → enum(a,b)` (remove)	Always refused. Records reference values by position; removing would corrupt every record using the dropped value.
`enum(a,b,c) → enum(c,a,b)` (reorder)	Refused (caught by the position-by-position diff — same shape as remove).
`enum(…) → enum(…)` narrow 2B → 1B	Refused. Records that hold an index ≥ 256 would lose data.

Other cross-type edits are hard-refused with the hint: use add-field <new> + remove-field <old> + bulk-update and migrate the data explicitly.

What happens¶

Takes the object's write lock (objlock_wrlock).
Refuses immediately if storage_version != 2 (point to ./migrate).
Parses every edit spec; refuses on unknown field name, tombstoned field, duplicate edit in the same request, invalid type, or cross-type change.
Builds a new TypedSchema by overlaying each edited field onto a clone of the old schema (positions unchanged, only size / offset / numeric_scale move).
No-op fast path — if no field's encoding actually changed (a varchar staying the same size with only :default=... modifier shift, say), skip the data rebuild and rewrite fields.conf only.
Otherwise pre-flight scan: walks every live record across all keyfile shards and verifies each edited field's value fits the new shape. First violation aborts with {"error":"Pre-flight failed on field [<name>]: <reason>"} — no data or schema change.
Rebuild: runs the same v2 rebuild path used by add-field / vacuum --compact, but with transform_field_value() re-encoding the edited fields per record. Atomic — the legacy data tree is preserved until the rebuild succeeds.
Rewrites fields.conf to lock in the new spec.
Smart reindex: walks index.conf and only rebuilds indexes whose referenced fields actually changed encoding. Indexes referencing untouched fields are skipped — the response carries indexes_skipped:N alongside indexes_rebuilt:N so operators can verify.
Releases the write lock.

Response¶

{"status":"edited","fields":N,"rebuilt":true,"slot_size":N,"indexes_rebuilt":R,"indexes_skipped":S}

No-op fast path returns {"status":"edited","fields":N,"rebuilt":false} — fields.conf updated but no data rebuild ran.

dry_run¶

Pass "dry_run":true to run every validation step (cross-type refusal, FT_ENUM prefix check, per-record varchar overflow + integer narrowing pre-flight) without writing anything.

{"status":"ok","dry_run":true,"fields":N,"would_rebuild":true}

would_rebuild:false means every edit is encoding-equivalent — the change would only touch fields.conf (e.g. carrying through a new default modifier on an unchanged type). Useful before running a same-type narrow on a large object.

Notes¶

Default modifier carry-through: when a new edit spec OMITS :default=... / :auto_create / :auto_update, the modifier from the OLD fields.conf line is preserved — edit-field age:long against an existing age:int:default=42 keeps :default=42.
Changing a default: include the new modifier on the edit spec, e.g. edit-field age:int:default=99. Supported forms: :default=<literal>, :default=seq(<name>), :default=uuid(), :default=random(N), :auto_create, :auto_update. The new modifier affects future inserts only — existing records keep their stored values. For a one-shot backfill of existing records, use add-field <new-name> with the modifier, then bulk-update and remove-field <old-name>.
Indexed fields: a varchar grow that doesn't shrink content, an integer widen, or any encoding-changing edit on an indexed field rebuilds only that index (smart reindex). Queries on the indexed field continue to resolve to the same records post-edit.
Full object rebuild — scales with object size, not slot count. Not instantaneous on millions of records. Holds the wrlock for the duration.

rename-field¶

Metadata-only; no data rewrite.

{
  "mode": "rename-field",
  "dir": "<dir>",
  "object": "<obj>",
  "old": "email_addr",
  "new": "email"
}

What happens¶

Rewrites fields.conf with the new name.
Rewrites indexes/index.conf and renames indexes/<old>/ → indexes/<new>/ (per-shard directory rename — all NNN.idx files travel with the rename, no rebuild required).
For composite indexes, rewrites parts that reference the old name and renames the composite directory accordingly.
Invalidates caches.

Constraints¶

Both names must pass valid_field_name() (no :, +, /, spaces).
new must not collide with an existing field.

Response: {"status":"renamed","old":"...","new":"..."}.

remove-field¶

Tombstones one or more fields. Bytes stay reserved in every record's payload until vacuum --compact runs.

{
  "mode": "remove-field",
  "dir": "<dir>",
  "object": "<obj>",
  "fields": ["legacy_status","deprecated_flag"]
}

What happens¶

Appends :removed to each field's line in fields.conf.
Auto-drops any index referencing a removed field (including composites).
Queries treating those fields return empty values; writes silently ignore them.

Why tombstone vs delete¶

Cheap: no data rewrite until vacuum.
Reversible (soft): re-adding the same field via add-field creates a new column; the old tombstoned bytes stay reserved until compact.

Reclaiming the bytes¶

{"mode":"vacuum","dir":"<dir>","object":"<obj>","compact":true}

Response from remove-field: {"status":"removed","fields":N,"indexes_dropped":N}.

vacuum¶

Maintenance — reclaim deleted-record slots, drop tombstoned fields, or reshard.

Flavors¶

Call	What it does
`{"mode":"vacuum",...}`	Direction-C seg compaction — sparse non-active seg files are pair-merged into denser ones via kf-repoint, then unlinked. Active seg of each stream is never touched. Also: if the host's CPU count has changed since `create-object` and `slotcask_streams_for_nproc()` no longer matches `schema.streams`, the call automatically promotes to a full rebuild that re-routes records into the new stream layout. Idempotent.
`{"mode":"vacuum","compact":true}`	Full rebuild. Drops tombstoned fields, shrinks `slot_size`. Indexes preserved.
`{"mode":"vacuum","splits":N}`	Full rebuild with a new shard count. Re-hashes data; hash routing identity is preserved. Triggers a full reindex — see below. Also folds in the streams-mismatch check on the same rebuild, so you never need a second call.
`{"mode":"vacuum","compact":true,"splits":N}`	Both — compact schema and reshard in one pass.

Why `splits` triggers reindex (2026.05.1+)¶

Each indexed field is sharded into index_splits_for(splits) btree files (<obj>/indexes/<field>/<NNN>.idx). Changing splits changes the per-field shard count, so the on-disk NNN.idx files for the old layout become unreachable orphans. vacuum --splits calls reindex_object() after the data rebuild, which:

Wipes every per-field idx directory (indexes/<field>/).
Rebuilds each indexed field at the new index_splits_for(splits) shard count.
Preserves the index list (index.conf) — same indexes, fresh layout.

Plain vacuum --compact (no splits) leaves indexes alone — the per-field shard count doesn't change.

What triggers the need¶

Many deletes → vacuum-check flags objects where tombstoned ≥10% and total ≥1000.
Removed fields → compact to reclaim their bytes.
Shard skew from growth → splits:N to even out load. See shard-stats.
CPU upgrade or container resize (v2) → the next default vacuum self-heals: it detects slotcask_streams_for_nproc() ≠ schema.streams and dispatches to the rebuild path, after which records route by the new stream count. The auto-vacuum thread picks this up automatically once any object also crosses the deletion-pct threshold; for an explicit fix, run ./shard-db vacuum <dir> <obj>.

Locks¶

All vacuum flavors take the object's write lock. Normal ops block for the duration of the rebuild.

Response (v2 light path): {"status":"vacuumed","cleaned":<seg-files-dropped>}. Response (heavy path — compact, splits, or streams-mismatch): {"status":"rebuilt","live":N,"splits":N,"streams":N,"slot_size":N,"compact":true|false,"indexes_rebuilt":N}.

truncate¶

Delete all records; schema and indexes survive.

{"mode":"truncate","dir":"<dir>","object":"<obj>"}

Fast: zeroes out every kf shard (slot array + header counters) and drops every seg file; resets total/deleted to 0. Schema and fields.conf stay intact.
Indexes are emptied.
Field schema (including tombstones) stays intact.

Response: {"status":"truncated","object":"..."}.

recount¶

Rescans every shard, counts live/tombstoned slots, and rewrites metadata/counts. Use when size returns numbers that look wrong (e.g., after a crash or manual disk edit).

{"mode":"recount","dir":"<dir>","object":"<obj>"}

Response: {"count":N,"orphaned":M}.

rebuild-kf¶

New in 2026.07.1. Repairs corrupted/dangling kf (keyfile) entries by rescanning every segment file and re-deriving each live record's kf slot from scratch. Use after suspected kf corruption — e.g. a prior crash mid-compact left kf entries pointing at segment files that no longer exist, causing bulk-insert or find to intermittently report some_records_dropped or miss records that are actually present on disk.

{"mode":"rebuild-kf","dir":"<dir>","object":"<obj>"}

./shard-db rebuild-kf <dir> <obj>

Idempotent — safe to re-run; a clean object reports repaired:0.
./migrate runs this automatically (phase 2/3, before compact) on every upgrade and writes a .kf_rebuild_done sentinel so it isn't repeated on every subsequent embedded startup.
Embedded (npm) clients auto-run this once per db_root on first use, gated by the same sentinel — no manual step needed for npm consumers. Exposed directly as shardDb.rebuildKf(dir, object) for on-demand repair.
Takes no objlock (neither rdlock nor wrlock) — safe to run against a live object, though concurrent writes during the rescan are not guaranteed to be reflected in the repair pass (re-run if you suspect a race).

Response: {"status":"ok","repaired":N} where N is the count of kf entries that were corrected.

backup¶

Copies the object's data/, indexes/, metadata/, and files/ directories into a timestamped snapshot under the same root.

{"mode":"backup","dir":"<dir>","object":"<obj>"}

Response: {"status":"backed_up","path":"<dir>/<obj>.backup-20260418T153012"}.

Snapshot is a point-in-time copy — in-flight writes after the copy starts may or may not be included. For consistent production backups, pause writes first (or accept best-effort consistency).

Lock model summary¶

Mutation	Lock	Blocks
`add-field`, `remove-field`, `vacuum --compact`, `vacuum --splits`	`objlock_wrlock`	All other ops on this object (reads + writes).
`rename-field`	`objlock_wrlock`	Same.
`truncate`	`objlock_wrlock`	Same.
`add-index`, `remove-index`	`objlock_wrlock`	Same — both unlink()+rebuild index files in place; needs exclusivity against concurrent on-the-fly index writes (2026.07.1).
`backup`, `recount`	`objlock_rdlock`	Only schema mutations.
`rebuild-kf`	none	Nothing — safe to run live, but concurrent writes during the rescan aren't guaranteed to be reflected.
Normal CRUD / queries	`objlock_rdlock`	Only schema mutations.

Held only for the rebuild duration. For multi-second rebuilds, clients see temporarily-blocked queries; consider running these in a maintenance window.

See Concepts → Concurrency for the full locking story.

Schema mutations¶

create-object¶

Auto-generated keys¶

add-field¶

What happens¶

Computed defaults on backfill¶

Notes¶

edit-field¶

What changes are allowed¶

What happens¶

Response¶

dry_run¶

Notes¶

rename-field¶

What happens¶

Constraints¶

remove-field¶

What happens¶

Why tombstone vs delete¶

Reclaiming the bytes¶

vacuum¶

Flavors¶

Why splits triggers reindex (2026.05.1+)¶

What triggers the need¶

Locks¶

truncate¶

recount¶

rebuild-kf¶

backup¶

Lock model summary¶

Why `splits` triggers reindex (2026.05.1+)¶