mirror of https://github.com/stack-auth/stack.git synced 2026-06-30 21:01:54 +08:00

开源的用户管理解决方案，自带前端组件和管理后台。

Go to file

BilalG1 969bf03c5a perf(platform-analytics): cut ClickHouse query peak memory (#1632 ) ## What Performance pass on the internal platform-analytics route. All 17 ClickHouse queries fire in a single `Promise.all` on the shared `stackframe` admin user, which is subject to a 9 GB per-user memory cap — so the worst case is the sum of per-query peaks, not the max. Benchmarked at 10k projects / 1M users / 50M events (power-law, top project ≈100k users), the sum of peaks was ~6.7 GiB. This PR brings it down to ~3.8 GiB. ## Changes ClickHouse — `sipHash64(user_id)` as the distinct key (exact, verified byte-identical): \| query \| peak mem \| Δ \| \|---\|---\|---\| \| `dauSeries` \| 949 → 373 MiB \| −61% \| \| `mauProjects` \| 715 → 313 MiB \| −56% \| \| `activeByProject` \| 635 → 374 MiB \| −41% \| \| `sparkByProject` \| 1165 → 809 MiB \| −31% \| A 64-bit hash has negligible collision probability over 1M users; the benchmark confirmed identical output. (Same trick already used in the internal-metrics MAU query.) ClickHouse — sample the activity split (`new`/`retained`/`reactivated`): The split was the single heaviest query (~1.3 GiB) — its cost is a window function over ~25.8M `(user, day)` rows plus an all-history scan, which `sipHash` alone barely helped (−7%). It now uses consistent 1-in-4 user sampling (same `cityHash64(user_id) % 4` bucket applied to both subqueries so each sampled user's full activity sequence is preserved; counts scaled ×4): - 317 MiB (−78%) peak memory, ~0.4% mean error (max 1.4% on the smallest day) vs the exact result. This is an approximation — the dashboard "Growth quality" chart now notes it (`subtitle: "… · sampled estimate (~0.4%)"`). `ACTIVITY_SPLIT_SAMPLE` is a single constant in the route; set it to `1` to go back to exact. ## What I tried that did NOT make the cut (documented in the harnesses) - `country` — peak memory is dominated by the per-user `argMax(country, event_at)` payload, not the key, so hashing does nothing. Left exact/unchanged. - PG `authMethods` / `email` — with the production composite PK indexes the original plans are already best; correlated-subquery / anti-join rewrites were far worse. No PG query changes in this PR. ## Benchmark harnesses (added) - `apps/backend/scripts/benchmark-platform-analytics.ts` — full-route baseline (per-query time/memory/rows). - `apps/backend/scripts/optimize-platform-analytics.ts` — sipHash & PG variant comparison with byte-equality checks. - `apps/backend/scripts/optimize-split.ts` — exact vs sampled split variants with accuracy measurement. They seed isolated `bench_pa` databases (server-side, auto-cleaned) and read `system.query_log` / `EXPLAIN (ANALYZE, BUFFERS)`. Run e.g.: `pnpm --filter @hexclave/backend run with-env:dev tsx scripts/optimize-split.ts` ## Testing - Backend `typecheck` passes. (Dashboard has pre-existing typecheck errors on the base branch in unrelated files — auth-methods, team-analytics, user-emails, RDE config — not touched here.) - All exact rewrites verified byte-identical to the originals by the harnesses; the sampled split measured at ~0.4% mean error. Numbers are local warm-cache (relative shape, not production latency). <!-- This is an auto-generated description by cubic. --> --- ## Summary by cubic Cuts worst-case ClickHouse memory for the internal platform analytics route by switching to hashed distinct keys and sampling the heaviest query. On a 10k projects / 1M users / 50M events benchmark, the sum of per-query peaks drops from ~6.7 GiB to ~3.8 GiB with exact results (or ~0.4% error on the sampled chart). - Performance - Use sipHash64(user_id) as the distinct key in uniqExact/uniqExactIf for DAU series, MAU/projects, active-by-project, and sparkline. Exact results (verified). Peak memory down 31–61% per query. - Sample the new/retained/reactivated split at 1-in-4 users (consistent `cityHash64` bucket across subqueries, counts ×4). Peak memory ~−78% (~1.3 GiB → ~0.3 GiB) with ~0.4% mean error. Toggle via `ACTIVITY_SPLIT_SAMPLE` (set to 4; set to 1 for exact). Dashboard subtitle now notes “sampled estimate (~0.4%).” - Added local harnesses to seed isolated data and measure time/memory/equality: `apps/backend/scripts/internal-analytics/benchmark-platform-analytics.ts`, `optimize-platform-analytics.ts`, `optimize-split.ts`. <sup>Written for commit `60ccf1a06f`. Summary will update on new commits.</sup> <a href="https://cubic.dev/pr/hexclave/hexclave/pull/1632?utm_source=github" target="_blank" rel="noopener noreferrer" data-no-image-dialog="true"><picture><source media="(prefers-color-scheme: dark)" srcset="https://www.cubic.dev/buttons/review-in-cubic-dark.svg"><source media="(prefers-color-scheme: light)" srcset="https://www.cubic.dev/buttons/review-in-cubic-light.svg"><img alt="Review in cubic" src="https://www.cubic.dev/buttons/review-in-cubic-dark.svg"></picture></a> <!-- End of auto-generated description by cubic. --> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Updates * Improvements * Enhanced platform analytics calculations for more consistent and efficient user counting across key performance indicators (DAU, MAU, per-project metrics). * Updated the Growth Quality chart to indicate that user counts represent sampled estimates with approximately 0.4% margin of error for improved performance. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: mantrakp04 <mantrakp@gmail.com> Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: mantra <mantra@stack-auth.com>		2026-06-19 12:44:28 -07:00
.agents/skills	feat(hexclave): PR 2 — visible rebrand (Hexclave brand goes public) (#1481 )	2026-05-26 19:18:20 -07:00
.changeset	Disable changesets changelogs	2026-01-12 15:21:56 -08:00
.claude	Support local dashboard in remote SSH and GH Codespaces (#1538 )	2026-06-04 16:36:17 -07:00
.cursor	Update pre-push.md	2026-06-04 10:44:39 -07:00
.devcontainer	feat(hexclave): PR 2 — visible rebrand (Hexclave brand goes public) (#1481 )	2026-05-26 19:18:20 -07:00
.github	feat(hexclave): PR 5 — internal symbol/path/package renames + brand strings (#1547 )	2026-06-03 18:57:09 -07:00
.vscode	Make it clear there are more SDK packages	2026-06-16 10:37:58 -07:00
apps	perf(platform-analytics): cut ClickHouse query peak memory (#1632 )	2026-06-19 12:44:28 -07:00
configs	[Fix] Infinite Loop on handler/sign-in due to useStackApp not being able to find the StackProvider given context (#1248 )	2026-03-12 22:28:47 -07:00
docker	[codex] Add analytics overview filters (#1496 )	2026-06-10 17:50:35 -07:00
docs	chore: update package versions	2026-06-17 20:31:22 +00:00
docs-mintlify	[codex] Add skill context to Ask Hexclave (#1605 )	2026-06-18 11:40:02 -07:00
examples	chore: update package versions	2026-06-17 20:31:22 +00:00
packages	Fix typecheck in template cross-domain test (#1628 )	2026-06-18 17:55:17 -07:00
patches	Fix MS OAuth (#457 )	2025-02-21 19:39:22 +01:00
scripts	[codex] Add skill context to Ask Hexclave (#1605 )	2026-06-18 11:40:02 -07:00
sdks	chore: update package versions	2026-06-17 20:31:22 +00:00
skills/hexclave	feat(hexclave): PR 5 — internal symbol/path/package renames + brand strings (#1547 )	2026-06-03 18:57:09 -07:00
.dockerignore	feat(hexclave): PR 5 — internal symbol/path/package renames + brand strings (#1547 )	2026-06-03 18:57:09 -07:00
.gitignore	feat(hexclave): PR 5 — internal symbol/path/package renames + brand strings (#1547 )	2026-06-03 18:57:09 -07:00
.gitmodules	Update GitHub URL	2026-05-19 10:27:53 -07:00
AGENTS.md	Make it clear there are more SDK packages	2026-06-16 10:37:58 -07:00
CHANGELOG.md	Add 6/12/26 changelog entry (#1589 )	2026-06-16 16:44:03 -07:00
CLAUDE.md	feat(hexclave): PR 2 — visible rebrand (Hexclave brand goes public) (#1481 )	2026-05-26 19:18:20 -07:00
CONTRIBUTING.md	Rename port prefix envvar	2026-05-27 18:09:52 -07:00
LICENSE	feat(hexclave): PR 2 — visible rebrand (Hexclave brand goes public) (#1481 )	2026-05-26 19:18:20 -07:00
package.json	feat(hexclave): PR 5 — internal symbol/path/package renames + brand strings (#1547 )	2026-06-03 18:57:09 -07:00
pnpm-lock.yaml	add platform analytics route to the dashboard (#1626 )	2026-06-19 11:11:06 -07:00
pnpm-workspace.yaml	Fix Docker builds for pnpm v11 (#1532 )	2026-06-02 14:41:18 -07:00
README.md	Add "For AI agents" section to README (#1563 )	2026-06-05 17:01:13 -07:00
skills-lock.json	Auth app redesign (#1367 )	2026-05-19 23:03:46 -07:00
turbo.json	Replace writeConfigObject with AI-aware updateConfigObject (#1537 )	2026-06-15 12:00:24 -07:00
vitest.shared.ts	Fix tests	2026-02-17 19:57:08 -08:00
vitest.workspace.ts	Hosted components (#1229 )	2026-03-10 11:29:05 -07:00

README.md

Hexclave

The user infrastructure platform.

Hexclave handles everything around your users: authentication, teams, payments, emails, analytics, and much more. Start in minutes on the hosted cloud. Your data is always yours to export and self-host.

Website · Docs · Dashboard · Discord

Where Hexclave fits in the infrastructure stack

Get started

Setting up Hexclave is one prompt. Paste this into your coding agent of choice:

Read skill.hexclave.com and help me setup hexclave in this project

For AI agents

If you're an AI coding agent, use your web fetch tool or curl the skill documentation for up-to-date integration instructions:

curl -sSL "https://skill.hexclave.com?question=<your-question>&context=<your-context>"

What's included

Hexclave ships as a catalog of apps you switch on as your product needs them. Each one is built on the same user model, and new apps land regularly.

Authentication

Authentication that just works with passkeys, OAuth, and CLI auth. Drop in one component and ship the whole flow; auth methods toggle from the dashboard with no code changes needed.

Teams

Build for teams, not just users, with workspaces, email invites, and roles that actually gate the work. The workspace switcher remembers selection, invites auto sign up new users, and permissions hold up under audit.

RBAC

Permissions, sorted: roles that nest and one permission check that works the same on server or client. Define them in the dashboard, check them anywhere in your code.

API Keys

API keys without the footguns: leaked keys get auto-revoked, work for users and teams, and show the full secret only once. We never keep the plaintext after creation.

Payments

Payments without the plumbing for subscriptions, one-time charges, and usage metering with credits. Bill a person or a whole team with one model, no separate codepath.

Emails

Email that delivers and tells you so, handling transactional and marketing sends from one API. Edit templates with an AI editor, theme once, and track every open and click.

Analytics

Know your users with no data stack required, with live active user counts and session replays out of the box. Ask in plain English to build dashboards or write SQL to save queries, all with one flag enabled.

Webhooks

React to every user event in real time with signed, tamper-proof webhooks. Retries and backoff are handled for you; verify in five lines and manage endpoints from the dashboard.

Data Vault

A safe for the secrets your users hand you, locked with your secret so we never see the plaintext. Store and retrieve tokens in two lines each, server-only by design.

Launch Checklist

Run through the must-do checks before flipping to production: domain setup, callbacks locked, secrets rotated. The progress tracker keeps your team aligned so nothing critical slips through on launch day.

Contributing

Hexclave is open source, and contributions are welcome. Read CONTRIBUTING.md to get started, and say hello in Discord before picking up anything large. Found a security issue? Email security@hexclave.com.