shard-db 2026.05.6¶
Hotfix for two latent JSON-escape bugs in the typed-record path
plus the addition of a new timestamp field type. The
JSON-escape bugs surface whenever varchar field bytes contain
", \, or control characters (< 0x20); any caller round-tripping such
data through insert → find / get saw silent corruption
pre-fix. HN comment text is a typical example, which is how this
got caught while building the public showcase. The timestamp
type addresses the gap between datetime (calendar-packed) and
long (untyped int64) — Unix epoch ms in 8 bytes with
auto-time generators.
What changed¶
Output side. decode_field_to_buf (config.c) and
buf_field_value (query.c) used to write varchar content into
JSON output using raw "%.*s", so a stored ", \, or
newline silently corrupted the response stream — clients hit
Expected ',' or '}' mid-response. A new json_escape_into()
helper in util.c does RFC 8259-compliant escaping (worst-case 6×
expansion as \u00XX); the two FT_VARCHAR emit sites now route
through it. typed_decode and typed_decode_stream switched
from a fixed 512-byte stack vbuf to a heap allocation sized to
the field's worst case so large varchars (e.g. varchar:8192)
don't truncate when escapes expand them.
Input side. typed_encode and typed_encode_defaults used
to strip surrounding quotes from a JSON string value and then
copy the inner bytes verbatim, preserving wire-form escapes
(\", \n, \\, \uXXXX…) as literal byte sequences on
disk. Now both functions route FT_VARCHAR values through the
pre-existing json_unescape_string() so stored bytes match what
the wire intended. Other typed fields (int / long / bool / date
/ datetime / numeric / float / double / byte) are unchanged —
their literal forms have no escapes to decode.
CSV path is untouched. cmd_bulk_insert_delimited reads
spans from the mmap'd CSV buffer and stores them as raw bytes;
CSV has no escape semantics that align with JSON. Behaviour is
unchanged there.
Outer-buffer sizing in typed_decode. Pre-2026.05.6 the outer
per-record JSON buffer was sized as a flat nfields * 300
heuristic, which silently truncated mid-value on records
carrying a varchar field whose content approached its declared
capacity (e.g. comments.text:varchar:8192 with multi-KB HN
comments). SB_APPEND doesn't fail-loud — it just stops writing.
Now sized per-field by type: 6 * (size - 2) + 2 for varchar
(worst-case 6x escape expansion), 64 bytes for everything else.
New: timestamp field type¶
Filed in ~/shard-db-issue-drafts/02-timestamp.md 2026-04-30,
implemented here. Eight bytes, signed int64 big-endian, semantic
"Unix epoch milliseconds". On-disk layout, comparison, and index
key are identical to FT_LONG. What's new is the auto-time
generator path:
created_at:timestamp # explicit ms-since-epoch on insert
created_at:timestamp:auto_create # server stamps Date.now() on INSERT only
updated_at:timestamp:auto_update # server stamps Date.now() on every INSERT + UPDATE
Wire shape on read is a plain integer:
Why not just use long? long works (and many users already do), but
the type carries no semantic — clients can't tell from
describe-object whether a long field is a timestamp or just a
number, and there's no :auto_create / :auto_update shorthand.
timestamp closes both gaps.
Why not datetime? datetime is calendar-packed
(yyyyMMddHHmmss as a 6-byte composite) — fine for "user picked
2026-05-19 14:30 on a form", but can't represent pre-1970 or
post-9999 dates and forces a format conversion every time the
client's wire format is Unix-epoch (which is most JSON event
shapes, log pipelines, telemetry, and the HN API).
The two coexist: datetime for human-calendar timestamps,
timestamp for machine-event timestamps. Both have legitimate
use cases.
Compatibility¶
- Wire format unchanged.
- On-disk schema unchanged. No reindex, no migrate.
- Existing varchar records that contain JSON metacharacters are still on disk in their pre-fix shape. They continue to read as the same corrupted-but-stable form they did before — the encode-side fix only affects records written after the upgrade. If you need clean data, re-ingest through your own pipeline.
Upgrading¶
./shard-db stop
# replace shard-db / shard-cli / migrate with the 2026.05.6 binaries
./shard-db start
No ./migrate step is required for this release. Existing v2
data files load unchanged.
If you run shard-db under systemd / supervisord / docker, swap
the ./shard-db stop and ./shard-db start lines for your
service manager's equivalents.
Test coverage¶
- New
test-json-escape(19 assertions) —",\, raw\n, raw\tround-trips across single get, multi-get dict, and find array response shapes; plus a 4 KB varchar value to lock in the outer-buffer sizing fix. - New
test-timestamp(14 assertions) — explicit insert, auto_create on insert, auto_update advancing on update, indexed range queries, describe-object reportingtimestamp. - Full suite: 81 cases / 3091 assertions, 0 failures.