File storage¶
Upload and download arbitrary files (PDFs, images, CSVs, blobs) keyed by filename. Files live under <obj>/files/XX/XX/<filename>, hash-bucketed so directories stay shallow.
Two variants for both upload and download:
- Bytes-in-JSON (base64) — remote-safe. The default. Works from any host with TCP access to the server.
- Server-local path — zero-copy admin fast path. Only useful when the caller can touch the server's filesystem.
Pick #1 unless you have a specific reason.
Where files live¶
Two-level hash-bucketing from the xxh128 of the filename-stem (basename minus last .ext). Collisions within a bucket are allowed — the full filename disambiguates.
put-file — bytes in JSON (remote-safe)¶
Shape¶
{
"mode": "put-file",
"dir": "<dir>",
"object": "<obj>",
"filename": "invoice.pdf",
"data": "<base64-encoded-bytes>",
"if_not_exists": true
}
filename— plain basename. No/,\,.., control chars, ≤255 bytes. Path-traversal attempts return{"error":"invalid filename"}.data— standard RFC 4648 base64 (with+/=). Whitespace inside the string is tolerated.if_not_exists(optional) — CAS. Fails with{"error":"file exists",...}if a file with this name already exists in the same(dir, obj).
Response¶
On CAS conflict:
Atomicity¶
Writes go to <dest>.tmp.<pid>, fsynced, then renamed onto <dest>. A mid-upload crash leaves no partial file. Default is silent overwrite; add if_not_exists:true to refuse.
Size cap¶
Inherited from MAX_REQUEST_SIZE (default 32 MB). Base64 inflates by 4/3, so the effective file-size cap is ~24 MB at default config. Raise MAX_REQUEST_SIZE in db.env to lift it.
Every connection allocates a read buffer of MAX_REQUEST_SIZE — see Operations → Tuning before setting very high values.
CLI¶
The CLI reads the file, base64-encodes, sends the JSON. Works from any host with TCP access to the server.
put-file — server-local path (admin fast path)¶
Shape¶
pathis read on the server's filesystem — not the client's. The server opens the path and copies it into<obj>/files/XX/XX/<filename>(filename = basename of the path).- No base64 overhead. Good for batch ingestion from a shared volume.
- Only useful for same-host callers — a remote client has no way to place a file on the server's filesystem without a separate transport (scp, rsync, shared FS).
Response: {"status":"stored","path":"<dest-path>"}.
No CAS (if_not_exists) on this variant. Silent overwrite.
get-file — bytes in JSON (remote-safe)¶
Shape¶
Response¶
Not found:
CLI¶
- With
<out-path>: decodes base64 and writes raw bytes to the file. - Without: writes raw bytes to stdout.
get-file-path — server path (admin fast path)¶
Shape¶
Response¶
No bytes on the wire. The caller is expected to read the returned path directly from the server's filesystem. Useful for:
- Admin/debug: "where is this file?"
- Colocated services with shared-FS access.
- Large files where you want to stream via
cat/sendfileinstead of base64 over the socket.
delete-file¶
Shape¶
Response¶
Not found:
Same filename rules as put-file / get-file — {"error":"invalid filename"} on /, \, .., control chars, or > 255 bytes.
CLI¶
list-files¶
Paginated, alphabetical inventory of stored files for one object. Optional prefix filter, returns total + page.
Shape¶
prefix— optional. Filters by filename prefix (byte-exact).offset/limit— standard pagination.limitdefaults toGLOBAL_LIMITwhen absent or 0.
Response¶
{
"files": ["2026-01-summary.pdf","2026-02-summary.pdf", ...],
"total": 245,
"offset": 0,
"limit": 100
}
total is the unpaginated match count (after prefix filtering, before pagination). Walking the XX/XX bucket tree is O(file count) — fine for filestores up to ~1M files. Beyond that, maintain your own index in a regular object.
Filename rules¶
Enforced by valid_filename():
- Non-empty, ≤ 255 bytes.
- No
/or\(plain basename only). - No literal
..as the whole name. - No control characters (bytes
< 0x20or0x7F).
Invalid filenames get {"error":"invalid filename"}.
Choosing a variant¶
| Scenario | Recommendation |
|---|---|
| Remote client (Python, JS, Java, anywhere over TCP) | put-file with data, get-file. |
| Same-host admin script, batch ingestion | put-file with path — no base64 overhead. |
| Need to stream a large existing server-side file to another process | get-file-path, then sendfile()/cat the path. |
| Very large files (> 24 MB at default config) | Raise MAX_REQUEST_SIZE or use server-local path. |
CLI examples¶
Upload a PDF from a remote machine¶
./shard-db put-file acme invoices /home/me/Invoice-001.pdf
# → {"status":"stored","filename":"Invoice-001.pdf","bytes":183721}
Download and verify¶
./shard-db get-file acme invoices Invoice-001.pdf /tmp/invoice.pdf
md5sum /home/me/Invoice-001.pdf /tmp/invoice.pdf
CAS upload (refuse overwrite)¶
./shard-db put-file acme invoices /home/me/Invoice-001.pdf --if-not-exists
# → {"error":"file exists","filename":"Invoice-001.pdf"}
Admin fast path (same host)¶
./shard-db query '{"mode":"put-file","dir":"acme","object":"invoices","path":"/srv/ingest/new.pdf"}'
./shard-db query '{"mode":"get-file-path","dir":"acme","object":"invoices","filename":"new.pdf"}'
# → {"path":"../db/acme/invoices/files/6b/7a/new.pdf"}
Limitations¶
- File content isn't indexed or queryable — it's opaque storage. Use
list-filesfor inventory by filename prefix. - No ranged reads — every
get-filereturns the whole file.