Skip to content

File storage

Upload and download arbitrary files (PDFs, images, CSVs, blobs) keyed by filename. Files live under <obj>/files/XX/XX/<filename>, hash-bucketed so directories stay shallow.

Two variants for both upload and download:

  1. Bytes-in-JSON (base64) — remote-safe. The default. Works from any host with TCP access to the server.
  2. Server-local path — zero-copy admin fast path. Only useful when the caller can touch the server's filesystem.

Pick #1 unless you have a specific reason.

Where files live

$DB_ROOT/<dir>/<obj>/files/
  6b/7a/medium.bin
  b2/38/small.bin
  b5/e9/hello.txt

Two-level hash-bucketing from the xxh128 of the filename-stem (basename minus last .ext). Collisions within a bucket are allowed — the full filename disambiguates.

put-file — bytes in JSON (remote-safe)

Shape

{
  "mode": "put-file",
  "dir": "<dir>",
  "object": "<obj>",
  "filename": "invoice.pdf",
  "data": "<base64-encoded-bytes>",
  "if_not_exists": true
}
  • filename — plain basename. No /, \, .., control chars, ≤255 bytes. Path-traversal attempts return {"error":"invalid filename"}.
  • data — standard RFC 4648 base64 (with +/=). Whitespace inside the string is tolerated.
  • if_not_exists (optional) — CAS. Fails with {"error":"file exists",...} if a file with this name already exists in the same (dir, obj).

Response

{"status":"stored","filename":"invoice.pdf","bytes":12345}

On CAS conflict:

{"error":"file exists","filename":"invoice.pdf"}

Atomicity

Writes go to <dest>.tmp.<pid>, fsynced, then renamed onto <dest>. A mid-upload crash leaves no partial file. Default is silent overwrite; add if_not_exists:true to refuse.

Size cap

Inherited from MAX_REQUEST_SIZE (default 32 MB). Base64 inflates by 4/3, so the effective file-size cap is ~24 MB at default config. Raise MAX_REQUEST_SIZE in db.env to lift it.

Every connection allocates a read buffer of MAX_REQUEST_SIZE — see Operations → Tuning before setting very high values.

CLI

./shard-db put-file <dir> <obj> <local-path> [--if-not-exists]

The CLI reads the file, base64-encodes, sends the JSON. Works from any host with TCP access to the server.

put-file — server-local path (admin fast path)

Shape

{"mode":"put-file","dir":"<dir>","object":"<obj>","path":"/srv/incoming/invoice.pdf"}
  • path is read on the server's filesystem — not the client's. The server opens the path and copies it into <obj>/files/XX/XX/<filename> (filename = basename of the path).
  • No base64 overhead. Good for batch ingestion from a shared volume.
  • Only useful for same-host callers — a remote client has no way to place a file on the server's filesystem without a separate transport (scp, rsync, shared FS).

Response: {"status":"stored","path":"<dest-path>"}.

No CAS (if_not_exists) on this variant. Silent overwrite.

get-file — bytes in JSON (remote-safe)

Shape

{"mode":"get-file","dir":"<dir>","object":"<obj>","filename":"invoice.pdf"}

Response

{"status":"ok","filename":"invoice.pdf","bytes":12345,"data":"<base64>"}

Not found:

{"error":"file not found","filename":"invoice.pdf"}

CLI

./shard-db get-file <dir> <obj> <filename> [<out-path>]
  • With <out-path>: decodes base64 and writes raw bytes to the file.
  • Without: writes raw bytes to stdout.

get-file-path — server path (admin fast path)

Shape

{"mode":"get-file-path","dir":"<dir>","object":"<obj>","filename":"invoice.pdf"}

Response

{"path":"<db_root>/<dir>/<obj>/files/XX/XX/invoice.pdf"}

No bytes on the wire. The caller is expected to read the returned path directly from the server's filesystem. Useful for:

  • Admin/debug: "where is this file?"
  • Colocated services with shared-FS access.
  • Large files where you want to stream via cat / sendfile instead of base64 over the socket.

delete-file

Shape

{"mode":"delete-file","dir":"<dir>","object":"<obj>","filename":"invoice.pdf"}

Response

{"status":"deleted","filename":"invoice.pdf"}

Not found:

{"error":"file not found","filename":"invoice.pdf"}

Same filename rules as put-file / get-file{"error":"invalid filename"} on /, \, .., control chars, or > 255 bytes.

CLI

./shard-db delete-file <dir> <obj> <filename>

list-files

Paginated, alphabetical inventory of stored files for one object. Optional prefix filter, returns total + page.

Shape

{
  "mode":"list-files",
  "dir":"<dir>",
  "object":"<obj>",
  "prefix":"2026-",
  "offset":0,
  "limit":100
}
  • prefix — optional. Filters by filename prefix (byte-exact).
  • offset / limit — standard pagination. limit defaults to GLOBAL_LIMIT when absent or 0.

Response

{
  "files": ["2026-01-summary.pdf","2026-02-summary.pdf", ...],
  "total": 245,
  "offset": 0,
  "limit": 100
}

total is the unpaginated match count (after prefix filtering, before pagination). Walking the XX/XX bucket tree is O(file count) — fine for filestores up to ~1M files. Beyond that, maintain your own index in a regular object.

Filename rules

Enforced by valid_filename():

  • Non-empty, ≤ 255 bytes.
  • No / or \ (plain basename only).
  • No literal .. as the whole name.
  • No control characters (bytes < 0x20 or 0x7F).

Invalid filenames get {"error":"invalid filename"}.

Choosing a variant

Scenario Recommendation
Remote client (Python, JS, Java, anywhere over TCP) put-file with data, get-file.
Same-host admin script, batch ingestion put-file with path — no base64 overhead.
Need to stream a large existing server-side file to another process get-file-path, then sendfile()/cat the path.
Very large files (> 24 MB at default config) Raise MAX_REQUEST_SIZE or use server-local path.

CLI examples

Upload a PDF from a remote machine

./shard-db put-file acme invoices /home/me/Invoice-001.pdf
# → {"status":"stored","filename":"Invoice-001.pdf","bytes":183721}

Download and verify

./shard-db get-file acme invoices Invoice-001.pdf /tmp/invoice.pdf
md5sum /home/me/Invoice-001.pdf /tmp/invoice.pdf

CAS upload (refuse overwrite)

./shard-db put-file acme invoices /home/me/Invoice-001.pdf --if-not-exists
# → {"error":"file exists","filename":"Invoice-001.pdf"}

Admin fast path (same host)

./shard-db query '{"mode":"put-file","dir":"acme","object":"invoices","path":"/srv/ingest/new.pdf"}'
./shard-db query '{"mode":"get-file-path","dir":"acme","object":"invoices","filename":"new.pdf"}'
# → {"path":"../db/acme/invoices/files/6b/7a/new.pdf"}

Limitations

  • File content isn't indexed or queryable — it's opaque storage. Use list-files for inventory by filename prefix.
  • No ranged reads — every get-file returns the whole file.