Skip to content

aggregate

Group-by aggregations with count, sum, avg, min, max. Supports having (post-aggregation filter), order_by, and limit. Numbers are accumulated directly in typed form — no string round-trip.

Shape

{
  "mode": "aggregate",
  "dir": "<dir>",
  "object": "<obj>",

  "criteria": [ ... ],
  "group_by": ["status", "currency"],
  "aggregates": [
    {"fn":"count", "alias":"total"},
    {"fn":"sum",   "field":"amount", "alias":"revenue"},
    {"fn":"avg",   "field":"amount", "alias":"avg_ticket"},
    {"fn":"min",   "field":"amount", "alias":"smallest"},
    {"fn":"max",   "field":"amount", "alias":"largest"}
  ],

  "having":   [{"field":"total","op":"gt","value":"100"}],
  "order_by": "revenue",
  "order":    "desc",
  "limit":    20
}

Required: mode, dir, object, aggregates.

Parameters

Field Type Default Meaning
criteria array [] WHERE — filter records before aggregating. Same shape as find.
group_by array of field names none Grouping keys. Omit for whole-table aggregation.
aggregates array required List of aggregate specs (see below). Max 32.
having array none Filter groups after aggregation. Same shape as criteria, but fields are aggregate aliases (or group-by fields).
order_by string none Sort groups by this alias (or group-by field).
order "asc" / "desc" "asc" Sort direction.
limit int GLOBAL_LIMIT Max groups returned.

Aggregate spec

{"fn":"<function>","field":"<field-name>","alias":"<output-name>"}
Function Needs field Notes
count optional Without field, counts all records in the group. With field, counts non-null/non-empty.
sum yes Numeric types only (int, long, short, double, numeric). Result is a double for speed. Cast back in the app if needed.
avg yes sum / count, returned as double.
min yes Numeric or varchar.
max yes Numeric or varchar.

alias is required — it names the output column. Without it the result would be ambiguous across multiple aggregates on the same field.

Response

Whole-table (no group_by)

// Request
{"mode":"aggregate","dir":"acme","object":"orders",
 "aggregates":[
   {"fn":"count","alias":"total"},
   {"fn":"sum","field":"amount","alias":"revenue"}
 ]}

// Response
{"total": 5000, "revenue": 1250000}

With group_by

// Request
{"mode":"aggregate","dir":"acme","object":"orders",
 "group_by":["status"],
 "aggregates":[
   {"fn":"count","alias":"n"},
   {"fn":"sum","field":"amount","alias":"total"}
 ]}

// Response
[
  {"status":"paid","n":3000,"total":800000},
  {"status":"pending","n":2000,"total":450000}
]

having

Post-aggregation filter. Fields referenced must be aggregate aliases (or group-by fields):

"having": [
  {"field":"n","op":"gte","value":"100"},
  {"field":"total","op":"gt","value":"10000"}
]

Multiple having clauses are AND-combined.

Cost

  • Whole-table, no criteria — O(N) scan of every record. Parallelized across shards.
  • With criteria — same as find: indexed candidate scan + per-record typed comparison.
  • With group_by — accumulates into a hash table keyed by group values; final sort + limit. Per-worker hash tables fold into a global table at the end (lock-free hot path).
  • AND of indexed leaves — uses the same PRIMARY_INTERSECT planner branch as find/count: each leaf's btree walks into a KeySet, the sets intersect, and aggregation streams the survivors only. Order-of-magnitude wins when the intersection is far smaller than any single leaf's candidate set. See find → AND index intersection.
  • NEQ algebraic shortcutcount(neq X) is rewritten internally as count(*) - count(eq X), turning a near-everything scan into two cheap counts. Applies inside aggregate too when the only criterion on an indexed field is neq.

Typical latency on 1 M records: 1–3 ms for indexed, 2–10 ms for full scans with aggregation.

Recipes

Top-10 products by revenue, min 100 units sold

{"mode":"aggregate","dir":"acme","object":"orders",
 "criteria":[{"field":"status","op":"eq","value":"fulfilled"}],
 "group_by":["product_sku"],
 "aggregates":[
   {"fn":"count","alias":"units_sold"},
   {"fn":"sum","field":"line_total","alias":"revenue"},
   {"fn":"avg","field":"line_total","alias":"avg_ticket"}
 ],
 "having":[{"field":"units_sold","op":"gte","value":"100"}],
 "order_by":"revenue","order":"desc",
 "limit":10}

Daily signup counts

{"mode":"aggregate","dir":"default","object":"users",
 "group_by":["signup_date"],
 "aggregates":[{"fn":"count","alias":"new_users"}],
 "order_by":"signup_date","order":"desc"}

Min/max balance by customer tier

{"mode":"aggregate","dir":"default","object":"accounts",
 "group_by":["tier"],
 "aggregates":[
   {"fn":"min","field":"balance","alias":"smallest"},
   {"fn":"max","field":"balance","alias":"largest"},
   {"fn":"avg","field":"balance","alias":"avg_balance"}
 ]}

OR in criteria and having

Both criteria and having accept the full AND/OR tree (see find → OR criteria):

{"mode":"aggregate","dir":"acme","object":"orders",
 "criteria":[
   {"or":[
     {"field":"status","op":"eq","value":"paid"},
     {"field":"status","op":"eq","value":"refunded"}
   ]}
 ],
 "group_by":["region"],
 "aggregates":[{"fn":"count","alias":"n"},{"fn":"sum","field":"amount","alias":"total"}],
 "having":[{"or":[{"field":"n","op":"gte","value":"100"},{"field":"total","op":"gte","value":"10000"}]}]}

The planner paths from find apply here too — when the OR is fully indexed the aggregate source records come from a KeySet rather than a shard scan.

CLI shortcut

./shard-db aggregate <dir> <obj> '<aggregates_json>' [group_by_csv] [criteria_json] [having_json]

group_by is a plain comma-separated field list (whitespace tolerated). Later positional slots can be skipped with an empty '' argument — e.g. to pass having without criteria:

./shard-db aggregate acme orders \
  '[{"fn":"count","alias":"n"}]' \
  'status,region' \
  '' \
  '[{"field":"n","op":"gte","value":"100"}]'

Limitations

  • Max 32 aggregates per query (MAX_AGG_SPECS).
  • limit defaults to GLOBAL_LIMIT (default 100,000) when omitted, but is not clamped to it — pass any value to override.
  • No DISTINCT — emulate with group_by + count.
  • No window functions — aggregate against a query, keep state in the app.
  • No nested aggregates.