Commit Graph

1 Commits

Author SHA1 Message Date
Bilal Godil
81c84289b8 perf(platform-analytics): cut ClickHouse query peak memory
Reduce the worst-case concurrent ClickHouse memory of the internal
platform-analytics route (all 17 CH queries fire in one Promise.all on
the shared admin user, against a 9 GB per-user cap).

- Use sipHash64(user_id) as the distinct key in the uniqExact/uniqExactIf
  aggregates (dauSeries, sparkByProject, mauProjects, activeByProject).
  Exact at this scale (64-bit, negligible collision prob over 1M users),
  verified byte-identical; -40% to -61% peak memory per query.
- Sample the new/retained/reactivated activity split at 1-in-4 users
  (consistent cityHash bucket on both subqueries, counts scaled x4). The
  split's window function + all-history scan made it the heaviest query
  (~1.3 GiB at 1M users / 50M events); sampling cuts it ~78% for a ~0.4%
  mean error. Dashboard chart now notes it is a sampled estimate.

Adds the benchmark/optimization harnesses used to validate these changes
(seed isolated bench_pa DBs, measure peak memory, verify result equality).
2026-06-19 10:54:37 -07:00