Commit Graph

3104 Commits

Author SHA1 Message Date
BilalG1
f89b97bc54
fix connected accounts tokens (#1358)
Some checks failed
all-good: Did all the other checks pass? / all-good (push) Has been cancelled
Ensure Prisma migrations are in sync with the schema / check_prisma_migrations (22.x) (push) Has been cancelled
DB migration compat / Check if migrations changed (push) Has been cancelled
Docker Server Build and Push / Docker Build and Push Server (push) Has been cancelled
Docker Server Build and Run / docker (push) Has been cancelled
Runs E2E API Tests (Local Emulator) / E2E Tests (Local Emulator, Node ${{ matrix.node-version }}) (22.x) (push) Has been cancelled
Runs E2E API Tests / E2E Tests (Node ${{ matrix.node-version }}, Freestyle ${{ matrix.freestyle-mode }}) (mock, 22.x) (push) Has been cancelled
Runs E2E API Tests / E2E Tests (Node ${{ matrix.node-version }}, Freestyle ${{ matrix.freestyle-mode }}) (prod, 22.x) (push) Has been cancelled
Runs E2E API Tests with custom port prefix / build (22.x) (push) Has been cancelled
Runs E2E Fallback Tests / E2E Fallback Tests (Node ${{ matrix.node-version }}) (22.x) (push) Has been cancelled
Lint & build / lint_and_build (24) (push) Has been cancelled
TOC Generator / TOC Generator (push) Has been cancelled
DB migration compat / Back-compat — Current branch migrations with ${{ needs.check-migrations-changed.outputs.base_branch }} branch code (push) Has been cancelled
DB migration compat / Forward-compat — Current branch code with ${{ needs.check-migrations-changed.outputs.base_branch }} branch migrations (push) Has been cancelled
DB migration compat / No migration changes (skipped) (push) Has been cancelled
<!--

Make sure you've read the CONTRIBUTING.md guidelines:
https://github.com/stack-auth/stack-auth/blob/dev/CONTRIBUTING.md

-->


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Bug Fixes**
* OAuth flows now consistently block extra scopes and access tokens for
shared OAuth keys, enforcing restrictions earlier in the request
processing and across all environments.
* **Tests**
* Added end-to-end regression tests to verify requests with extra scopes
against shared OAuth providers return a 400 response indicating extra
scopes/access tokens are not allowed.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2026-04-20 19:33:47 -07:00
promptless[bot]
d72aca6216 Merge branch 'dev' into promptless/document-developer-tools 2026-04-21 02:08:19 +00:00
Konstantin Wohlwend
3ea8052d35 chore: update package versions 2026-04-20 19:06:56 -07:00
promptless[bot]
ff1bfebcf6 Merge branch 'dev' into promptless/document-developer-tools 2026-04-21 02:02:39 +00:00
Konstantin Wohlwend
d9492ac5f1 Update submodules 2026-04-20 19:01:16 -07:00
promptless[bot]
7208f4b355 Merge branch 'dev' into promptless/document-developer-tools 2026-04-21 01:55:03 +00:00
Konstantin Wohlwend
6f1df1a0c7 Update submodules 2026-04-20 18:53:36 -07:00
promptless[bot]
76bd837029 Merge branch 'dev' into promptless/document-developer-tools 2026-04-20 21:26:21 +00:00
BilalG1
37ee5ec320
Fast-start local emulator via RAM snapshot + live secret rotation (#1340)
## Summary

`stack emulator start` now resumes a fully-warm VM snapshot instead of
cold-booting, bringing startup from 30–120s down to ~5–8s with
per-install secret rotation, or ~2.5s with rotation opt-out. The
snapshot is captured **locally on first `stack emulator pull`**, not
shipped from CI — QEMU migration state isn't portable across
accelerators (KVM/HVF/TCG) or `-cpu max` feature sets, so a CI-captured
snapshot couldn't resume reliably on arbitrary user hardware.

Also bundles a pile of CLI QoL fixes (progress bars, PR/run artifact
pulls, PR-build download, native-TS ISO writer replacing
`hdiutil`/`mkisofs`/`genisoimage` host dep, unit tests).

| Scenario | Before | After |
|---|---|---|
| Cold boot (no snapshot) | 30–120s | same, works as fallback |
| `stack emulator pull` (one-time, includes local snapshot capture) |
~30s download | ~30s download + ~1–3 min cold-boot capture |
| Snapshot resume, normal start | — | **~5–8s** |
| Snapshot resume, `EMULATOR_NO_ROTATION=1` | — | **~2.5s** |

Backend (`/health?db=1`) and dashboard (`/handler/sign-in`) return 200
on all paths. Two successive snapshot resumes produce different rotated
PCK/SSK/SAK/CRON_SECRET values per install.

## How it works

**Build (CI)** — `docker/local-emulator/qemu/build-image.sh`:

1. Cloud-init provisioning runs to completion (migrations, seed,
slim-image) producing `stack-emulator-<arch>.qcow2`.
2. Image is built with a topology compatible with later snapshot capture
(pinned SMP=4, phantom seed/bundle ISOs, STACKCFG runtime ISO mounted at
build time, qemu-guest-agent running, placeholder hex secrets baked in
under `STACK_EMULATOR_BUILD_SNAPSHOT=1`).
3. CI publishes **only the qcow2** — no `.savevm.zst` ships.

**Pull (user's machine)** —
`packages/stack-cli/src/commands/emulator.ts` + `run-emulator.sh
capture`:

1. `stack emulator pull` downloads the qcow2 with a progress bar (or
from a PR / workflow run via `--pr` / `--run`).
2. CLI invokes `run-emulator.sh capture`: cold-boots the qcow2 with a
matching device layout (phantom ISOs, fsdev, pcie-root-port, virtfs
detached — migration-incompatible), waits for backend+dashboard health,
then drives QMP: `stop` → set `mapped-ram` + `multifd` caps → `migrate
file:state.raw` → poll `query-migrate` → `quit`. Raw mapped-ram file is
zstd-compressed to `stack-emulator-<arch>.savevm.zst` in the images dir.
3. `--skip-snapshot` opts out (first `start` will then cold-boot).

**Runtime** — `run-emulator.sh start`:

1. Launch QEMU with `-incoming defer` when a `.savevm.zst` is present;
decompress on first use, keep the `.raw` cached for subsequent starts.
2. QMP: same `mapped-ram` + `multifd` caps → `migrate-incoming
file:<.raw>` → poll for `paused` → `cont`.
3. Generate fresh per-install secrets on the host; pipe them
base64-encoded through QGA `guest-exec input-data` →
`trigger-fast-rotate` in the guest → `docker exec -e … rotate-secrets`.
4. `rotate-secrets` in the container: validate keys (hex-only), targeted
`sed` on the placeholder PCK across built JS, `UPDATE ApiKeySet`,
`supervisorctl restart stack-app cron-jobs` (with
`stopasgroup`/`killasgroup` so the Node children actually die and
release their ports).
5. Poll backend+dashboard health; if anything fails, clean up and fall
back to cold boot transparently.

**Security model**: placeholder hex values are baked into the snapshot
(`00…ff` PCK, `00…ee` SSK, `00…dd` SAK, `00…cc` CRON_SECRET). They are
non-secret by construction. Real per-install secrets are generated at
each `emulator start` and never leave the host.

## CLI changes (`packages/stack-cli`)

- **`src/lib/iso.ts`** (new): native TypeScript ISO 9660 + Joliet
writer, replacing the host-side `hdiutil`/`mkisofs`/`genisoimage`
dependency for generating the STACKCFG runtime config disk. Unit tests
in `src/lib/iso.test.ts`.
- **`src/commands/emulator.ts`**:
- `pull`: streamed downloads with progress bar + ETA; `--pr <number>`
and `--run <id>` to pull from a PR build's CI artifacts (uses
`extract-zip` for the nested zip); `--skip-snapshot` to opt out of the
one-time local capture.
- `start` (existing, extended): auto-pulls AND auto-captures when no
image exists, so first-ever `start` is self-bootstrapping; emits
`STACK_EMULATOR_CLI_WROTE_ISO=1` so the shell helper skips its own ISO
regen (avoids the genisoimage host dep).
- `capture` (new, invoked by `pull` and the auto-pull path of `start`):
drives the local snapshot capture via `run-emulator.sh`.
- `status`, `stop`, `reset`, `list-releases`: preflight +
path-resolution tightening (`STACK_EMULATOR_HOME` → images/run dirs).
  - Unit tests in `src/commands/emulator.test.ts`.
- **`EMULATOR_NO_ROTATION=1`** env var skips the post-resume rotation
(intended for tests/CI where the placeholder secrets are fine — comes
with a loud warning).

## CI (`.github/workflows/qemu-emulator-build.yaml`)

- Builds **QEMU 10.2.2 from source** (cached), because
`mapped-ram`/`multifd` migration capabilities aren't available in the
distro's QEMU. Enables KVM on ubicloud runners so amd64 boots at
hardware speed.
- amd64 + arm64 both build on the same amd64 matrix
(`ubicloud-standard-8`); arm64 runs under cross-arch TCG (provisioning
only — boot/verify smoke test is amd64-only).
- Verification now runs through the CLI: `emulator start` → `emulator
status` → `emulator stop` against the freshly-built qcow2 (via
`STACK_EMULATOR_HOME` pointing at the workspace, so the CLI doesn't
silently auto-pull a prior release).
- Packages **only** the qcow2. No `.savevm.zst` upload / publish.
- Release notes updated.

## Key files

**Shell / guest:**
- `docker/local-emulator/qemu/build-image.sh` — snapshot-compatible
device topology + STACKCFG runtime ISO at build time
- `docker/local-emulator/qemu/run-emulator.sh` — `start`, `capture`,
`stop`, `reset`, `status`; `-incoming defer`, `.raw` cache, QGA-driven
rotation, cold-boot fallback
- `docker/local-emulator/qemu/common.sh` (new) — shared `qmp_session` +
`capture_vm_state` (factored out so build-image.sh and run-emulator.sh
share the capture path)
- `docker/local-emulator/qemu/cloud-init/emulator/user-data` —
placeholder secrets in snapshot mode, `wait-for-stack-ready`,
`trigger-fast-rotate`, qemu-guest-agent enabled
- `docker/local-emulator/rotate-secrets.sh` (new) — in-container
rotation (sed + UPDATE + supervisorctl)
- `docker/local-emulator/supervisord.conf` — `stopasgroup`/`killasgroup`
on `stack-app` and `cron-jobs`
- `docker/local-emulator/entrypoint.sh` — only mint CRON_SECRET if unset
(placeholder supplied in snapshot mode via --env-file)
- `docker/local-emulator/Dockerfile` — ships `rotate-secrets` to
`/usr/local/bin`
- `docker/server/entrypoint.sh` — source
`/run/stack-auth/rotated-secrets.env`; skip full-tree sentinel scan on
warm restarts via marker

**CLI:**
- `packages/stack-cli/src/lib/iso.ts` (new) + `iso.test.ts` (new)
- `packages/stack-cli/src/commands/emulator.ts` + `emulator.test.ts`
(new)
- `packages/stack-cli/vitest.config.ts` (new)

**CI:**
- `.github/workflows/qemu-emulator-build.yaml`

## Test plan

- [x] `docker/local-emulator/qemu/build-image.sh {amd64,arm64}` produces
`stack-emulator-<arch>.qcow2` with snapshot-compatible topology
- [x] `stack emulator pull` downloads qcow2 with progress, then captures
locally (~1–3 min) and writes `stack-emulator-<arch>.savevm.zst` in the
images dir
- [x] `stack emulator pull --skip-snapshot` stops after download
- [x] `stack emulator pull --pr <n>` / `--run <id>` pull from PR /
workflow run artifacts
- [x] `stack emulator start` on a fresh dir auto-pulls **and**
auto-captures, then starts; subsequent starts fast-resume in ~5–8s;
backend + dashboard return 200
- [x] `EMULATOR_NO_ROTATION=1 stack emulator start` completes in ~2.5s;
backend + dashboard return 200 with warning printed
- [x] Two consecutive `emulator start` invocations produce different PCK
values in the internal `ApiKeySet` row
- [x] `stack emulator status` / `stop` / `reset` resolve paths from
`STACK_EMULATOR_HOME`
- [x] Verified end-to-end on arm64 macOS under HVF (capture ~50s,
fast-resume ~6.5s)
- [x] `pnpm lint` and `pnpm typecheck` pass; stack-cli unit tests (iso +
emulator) pass
- [ ] CI green on this PR (qemu-emulator-build matrix, smoke test)
- [ ] `gh release download emulator-<branch>-latest` contains only
`stack-emulator-<arch>.qcow2` once this PR merges and publish runs

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Snapshot fast-start/resume with optional warm-snapshot assets, runtime
ISO generation, and a cached QEMU build to speed emulator setup.
* CLI: streamed artifact downloads with progress, improved release/asset
handling, stronger preflight checks, and start/status/stop emulator
commands.
* Automated secret rotation and ability to apply rotated secrets at
container startup; supervisor control socket enabled.

* **Bug Fixes**
* More robust start/stop/resume flows with automatic fallback to cold
boot and improved process-group shutdown behavior.

* **Tests**
  * New tests for CLI utilities and ISO image generation.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2026-04-20 14:24:49 -07:00
promptless[bot]
d91af98d58 Merge branch 'dev' into promptless/document-developer-tools 2026-04-20 19:10:59 +00:00
Madison
9f3ea9a766 Tutorial style docs init 2026-04-20 14:09:27 -05:00
Madison
24f32e6845 init tutorial docs 2026-04-20 14:09:27 -05:00
promptless[bot]
b39355029a Merge branch 'dev' into promptless/document-developer-tools 2026-04-20 18:56:25 +00:00
BilalG1
dad752a1cb
Add pr-visual-writeup skill to .agents/skills (#1354)
Shared agent skill under the cross-client `.agents/skills/` convention
so Claude Code, Codex, and other compliant agents can discover it.

<!--

Make sure you've read the CONTRIBUTING.md guidelines:
https://github.com/stack-auth/stack-auth/blob/dev/CONTRIBUTING.md

-->


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **New Features**
* Added an end-to-end workflow skill for automating visual-rich PR
descriptions with screenshot capture, asset processing, and upload
capabilities.

* **Documentation**
* Added reference guides covering screenshot capture patterns, asset
upload workflows, and structured PR body templates.

* **Chores**
* Added helper scripts for development server detection, media
conversion, and GitHub gist uploads.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2026-04-20 11:54:55 -07:00
promptless[bot]
0cfaa3606b Merge branch 'dev' into promptless/document-developer-tools 2026-04-20 05:58:59 +00:00
BilalG1
6bc1836e66
fix(dashboard): resolve UI issues across email-* pages (#1345)
## Summary

Six UI issues found across the email-* dashboard pages, ranked by
impact, fixed here:

1. **email-sent layout** — the email log table and domain reputation
card were forced side-by-side at all widths. A fixed-width sidebar plus
a flex-1 table meant that on tablet the table got crushed, and on mobile
the row overflowed horizontally. Fix: stack vertically below `lg`, and
let the reputation card span full width on narrow viewports.
2. **Domain status enum leaks to the UI** — `<span>Status:
{domain.status}</span>` rendered raw values like `pending_dns` /
`pending_verification`. Added a `MANAGED_DOMAIN_STATUS_LABELS` map and
route through it before rendering.
3. **email-themes dialog grid cramped on mobile** — the Change Theme
dialog hardcoded `grid-cols-2`, so at 375px each theme card had ~150px
and the preview images were illegible. Changed to `grid-cols-1
sm:grid-cols-2`.
4. **Template name row overflow** — long template names pushed the Edit
Template button off the right edge of the card because the flex row had
no `min-w-0` / `truncate`. Fixed both, and made the action column
`shrink-0`.
5. **Boosted-capacity label was color-only** — during an active boost
the label used a red strikethrough for the base value and a blue number
for the boosted value with no non-color cue. Added an explicit `→` arrow
between the two numbers, `title` tooltips on each, and a visible
\"(boosted)\" marker after `/h max`.
6. **Draft progress bar overflowed at mobile width** — the 4-step
progress bar used fixed 80px connectors, giving a minimum width of
~400px that clipped off both ends at 375px. Changed connectors to `w-8
sm:w-20` (32px on mobile, 80px otherwise) so all four steps and their
labels fit below 640px.

## Before / after

Each GIF below loops \"before\" (1s) → \"after\" (1s) with a red pill in
the top-right indicating which frame is which. Full-size stills (before
+ after + extra viewports) are listed under **All screenshots** at the
bottom.

### 1. email-sent — two-column layout collapses on narrow viewports

Mobile (375px):

![email-sent
mobile](https://gist.githubusercontent.com/BilalG1/edb04740a19c3f2d048da6e602209d45/raw/gif-01-email-sent-mobile.gif)

Tablet (900px):

![email-sent
tablet](https://gist.githubusercontent.com/BilalG1/edb04740a19c3f2d048da6e602209d45/raw/gif-01-email-sent-tablet.gif)

### 2. email-settings — managed-domain status label

![domain
status](https://gist.githubusercontent.com/BilalG1/edb04740a19c3f2d048da6e602209d45/raw/gif-02-domain-status.gif)

### 3. email-themes — Change Theme dialog on mobile

![themes
mobile](https://gist.githubusercontent.com/BilalG1/edb04740a19c3f2d048da6e602209d45/raw/gif-03-themes-mobile.gif)

### 4. email-templates — long name overflow

![templates
overflow](https://gist.githubusercontent.com/BilalG1/edb04740a19c3f2d048da6e602209d45/raw/gif-04-templates-overflow.gif)

### 5. email-sent — boosted capacity label

![capacity
label](https://gist.githubusercontent.com/BilalG1/edb04740a19c3f2d048da6e602209d45/raw/gif-05-capacity-label.gif)

### 7. email-drafts — draft progress bar on mobile

![draft progress
bar](https://gist.githubusercontent.com/BilalG1/edb04740a19c3f2d048da6e602209d45/raw/gif-07-draft-progress-mobile.gif)

## Test plan

- [x] \`pnpm --filter @stackframe/dashboard lint\` — clean
- [x] \`pnpm --filter @stackframe/dashboard typecheck\` — clean
- [x] Manual verification in a browser at 375px / 900px / 1440px, light
+ dark mode, for each fixed page
- [ ] Reviewer sanity check of the remaining email-* pages
(email-outbox, email-viewer) for similar responsive regressions

## Notes

- The initial review flagged a \"white-on-white capacity boost timer\" —
on closer look the label sits on a deliberately dark `bg-zinc-900/0.82`
overlay inside the boost card, so it reads fine in light and dark mode.
Not fixing; that part of the review was a false positive.
- The initial review also flagged a missing empty state on
email-templates. Because Stack seeds built-in templates, the empty
branch is unreachable in practice — skipping that fix to avoid dead
code.

## All screenshots

Gist with all the individual before/after PNGs and the GIFs themselves:
https://gist.github.com/BilalG1/edb04740a19c3f2d048da6e602209d45

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

## Release Notes

* **New Features**
* Added human-readable status labels for managed domains in domain
settings

* **Improvements**
* Enhanced responsive layouts across dashboard pages for improved mobile
experience
* Improved email capacity display with visual indicators and tooltips
for boost status
* Refined template and theme selection layouts with better text handling
and spacing

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2026-04-19 22:58:07 -07:00
BilalG1
85ae4b1c9e
Fix ClickHouse OOM in MAU query + optimize /internal/metrics route (#1344)
## Summary

Fixes the Sentry `StackAssertionError: Failed to load monthly active
users for internal metrics` crash (ClickHouse OOM at the 7.2 GiB
per-query cap) and applies two related optimizations to other queries in
the same route while here. Adds a local benchmark harness that validates
correctness and measures peak memory / duration before & after.

## Root cause (the original Sentry error)

`loadMonthlyActiveUsers` was written as `SELECT user_id … GROUP BY
user_id` and then counting in Node via a `Set`. On a large project that
ships back millions of user_ids. Two failure modes stacked:

1. **Result materialization** — every distinct user_id had to be
buffered in the server before streaming to Node (~20 MiB of result for
450k users; much more at real scale).
2. **`JSONExtract(toJSONString(data), 'is_anonymous', 'UInt8')`** — the
`toJSONString(data)` per-row re-serialization of the entire nested JSON
column, billions of times, just to pull one boolean. Dominates
bytes-read.

Combined, on a single partition read from S3-backed MergeTree, this can
exceed ClickHouse's 7.2 GiB per-query memory cap. That's exactly what
the Sentry trace showed.

## Changes

### 1. Fix MAU query (`loadMonthlyActiveUsers`)

Moved counting to the server with
`uniqExact(sipHash64(normalized_user_id))` and pulled the JS-side
normalization (`lower`, `trim`, `isUuid`) into SQL. Picked `sipHash64`
after benchmarking 7 variants — it's exact (at <<2³² users) and halves
the uniqExact hash-state vs. raw string keys.

### 2. Fix 1 — `JSONExtract(toJSONString(data), …)` → direct
`CAST(data.is_anonymous, …)`

Applied everywhere the pattern appeared in the metrics route:
- `loadDailyActiveUsers`
- the `analyticsUserJoin` subquery
- the `nonAnonymousAnalyticsUserFilter`
- `analyticsOverview:topRegion`
- `analyticsOverview:online`

Semantics preserved (`coalesce(CAST(data.is_anonymous,
'Nullable(UInt8)'), 0)` matches `JSONExtract(…, 'UInt8')` behavior when
the field is missing).

### 3. Fix 3 — server-aggregate the split queries

`loadDailyActiveUsersSplit` and `loadDailyActiveTeamsSplit` used to ship
1.2M+ `(day, user_id)` rows back to Node just so the JS could bucket
them into new / retained / reactivated. Rewrote both as one CTE-style
query that returns 31 rows (one per day in the 30-day window) with the
counts precomputed.

**Minor semantic shift** (documented inline in `route.tsx`): \"new\" is
now based on the user's first-ever `\$token-refresh` event rather than
their Postgres `signedUpAt`. Agrees for users who log in immediately
after sign-up (the common case). Disagrees for the rare edge case of an
account that existed pre-window but never generated a `\$token-refresh`
until now — old code classified as \"reactivated,\" new code classifies
as \"new.\" Judged acceptable; can be revisited.

Postgres round-trips for `ProjectUser.signedUpAt` / `Team.createdAt` are
no longer needed for the split, and the 76 MiB-ish wire ship is gone.

### 4. Benchmark harness
(`apps/backend/scripts/benchmark-internal-metrics.ts`)

Local-only tool. Three modes:
- **MAU equivalence matrix** — 13 edge cases (empty, dedup, anonymous
filter, window boundary, null user_id, non-UUID user_id, case variation,
project isolation, missing/null `is_anonymous`, wrong event_type).
Asserts OLD pipeline and NEW query return the **same set** of users, not
just the same count.
- **MAU perf** — OLD vs NEW plus 6 other candidate variants (inline
regex, UUID keys, sipHash64, HLL sketches), reads `memory_usage` /
`read_rows` / `result_bytes` from `system.query_log` for each, prints a
ranked table.
- **Full-route benchmark** (`BENCH_ROUTE_QUERIES=1`) — runs every
ClickHouse query in `/internal/metrics` in three stages (BEFORE, AFTER,
candidate OPTIMIZED) against the same seed and prints per-query deltas
plus endpoint-level totals.

Seeds under a synthetic `project_id` so real data is never touched;
cleans up on exit via `ALTER TABLE … DELETE`.

## Benchmark results

### MAU query alone

Ran at two scales; set-equality verified (new query identifies the same
individual users, not just the same count).

| seed | MAU | peak memory (old → new) | bytes read | duration |
|---|---|---|---|---|
| 500k events | 89,939 | 158.7 MiB → 46.7 MiB (**3.4×**, −70%) | 175.7
MiB → 63.0 MiB (2.8×) | 483 ms → 76 ms (**6.4×**) |
| 2.5M events | 449,990 | 439.2 MiB → 281.4 MiB (1.56×, −36%) | 865.0
MiB → 310.9 MiB (2.8×) | 783 ms → 126 ms (**6.2×**) |

MAU variant bake-off at 2.5M events (all exact, all set-equal to OLD):

| variant | memory | duration | notes |
|---|---|---|---|
| v0_old (baseline) | 440 MiB | 567 ms | — |
| v1_uniqExact_string | 284 MiB | 110 ms | naive fix |
| v3_uniqExact_toUUID | 244 MiB | 153 ms | UUID keys, slower per-row |
| **v4_uniqExact_sipHash64** | **125 MiB** | **95 ms** | **shipped** |
| v5_uniq (HLL) ~approx | 30 MiB | 86 ms | −0.25% error |
| v6_uniqCombined ~approx | 31 MiB | 67 ms | −0.15% error |

### Full `/internal/metrics` route (2.7M events, 300k users + page-views
+ clicks + teams)

Ranked by BEFORE peak memory:

| query | mem BEFORE | mem AFTER | Δ mem | dur BEFORE | dur AFTER | Δ
dur |
|---|---|---|---|---|---|---|
| analyticsOverview:topReferrers | 588.1 MiB | 411.1 MiB | 1.43× | 1833
ms | 110 ms | **16.66×** |
| analyticsOverview:totalVisitors | 584.3 MiB | 403.5 MiB | 1.45× | 1829
ms | 121 ms | 15.12× |
| analyticsOverview:dailyEvents | 584.1 MiB | 403.7 MiB | 1.45× | 1897
ms | 140 ms | 13.55× |
| loadUsersByCountry | 393.1 MiB | 385.4 MiB | ≈same | 74 ms | 80 ms |
≈same |
| loadDailyActiveUsersSplit | 363.4 MiB | 396.8 MiB | *+9%* | 1966 ms |
356 ms | 5.52× |
| analyticsOverview:topRegion | 269.9 MiB | 106.4 MiB | 2.54× | 1602 ms
| 65 ms | 24.65× |
| loadDailyActiveUsers | 268.3 MiB | 84.0 MiB | 3.19× | 1111 ms | 44 ms
| 25.25× |
| loadDailyActiveTeamsSplit | 59.6 MiB | 78.1 MiB | *+31%* | 70 ms | 123
ms | *+76%* |
| loadMonthlyActiveUsers | 54.9 MiB | 54.9 MiB | ≈same | 68 ms | 56 ms |
≈same |
| analyticsOverview:online | 18.4 MiB | 5.8 MiB | 3.17× | 58 ms | 4 ms |
14.50× |

**Endpoint-level totals**

| metric | BEFORE | AFTER | Δ |
|---|---|---|---|
| Sum peak ClickHouse memory | 3.11 GiB | 2.28 GiB | **−27%** |
| **Max query duration** (endpoint wall-clock floor) | **1966 ms** |
**356 ms** | **−82%** (5.5×) |
| Sum query duration (total CPU) | 10508 ms | 1099 ms | **−90%** (9.6×)
|
| Bytes read | 10.70 GiB | 4.55 GiB | −57% |
| Bytes shipped to Node | 94.8 MiB | 44.2 KiB | **−99.95%** |

Both split queries show a small memory *regression* at this seed size
(the new server-side window-function + self-join has its own state cost
that's near break-even with \"materialize + ship\" at 300k users); at
prod scale the 76 MiB-ship saving dominates. Duration is unambiguously
better.

## Why we don't need to drop the `analyticsUserJoin` in this PR

The benchmark includes an OPTIMIZED stage that drops the LEFT JOIN and
trusts `e.data.is_anonymous` directly, which would shave another **1.2
GiB / 1.9× duration** off the endpoint. **But we can't ship that here**
— an audit of the client tracker
(`packages/js/src/lib/stack-app/apps/implementations/event-tracker.ts`)
confirmed `is_anonymous` is never set on client-emitted `$page-view` /
`$click` events. The JOIN is currently load-bearing. A follow-up PR will
enrich `is_anonymous` at the batch ingest endpoint using
`auth.user.is_anonymous`; after one metrics-window cycle (~30 days) the
JOIN can be dropped.

## Follow-up work (out of scope for this PR)

- **Batch-endpoint enrichment** + drop the analytics-overview LEFT JOIN
(est. further −53% endpoint memory, −46% duration per the benchmark).
- **Teams-split hash-variant count mismatch** — `sipHash64(team_id)`
variant of the teams split shows a count discrepancy vs. the
string-keyed version in the benchmark. Not blocking since teams-split is
only #8 by memory; needs a root-cause pass before shipping that
particular optimization.
- **`loadUsersByCountry` window bound** — currently scans every
`$token-refresh` event ever for the tenancy (no time filter). Bounding
to 30 days would bound memory growth with project age, but changes
semantics (\"country of latest login ever\" → \"in last 30 days\").
Deferred because it's product-facing.

## Snapshot changes in `internal-metrics.test.ts.snap`

The `should return metrics data with users` test signs in 10 users
today, then deletes one of them mid-test. Two small snapshot values
change on today's date; both are just a reclassification of that single
deleted user — the total (10 active users) is unchanged.

- **`daily_active_users_split.new[today]`: 9 → 10**
All 10 users really did sign in for the first time today. The old code
only counted 9 because the deleted user's Postgres row was gone by the
time the metrics query ran, so the old classifier couldn't see they were
created today. The new query looks at ClickHouse events directly, sees
the deleted user's first event was today, and counts them as new like
everyone else.

- **`daily_active_users_split.reactivated[today]`: 1 → 0**
No user was "reactivated" today — nobody was active on an earlier day
and came back. The old "1" was the deleted user falling into this bucket
by default (the old classifier had no other rule that fit them). The new
code correctly reports zero.

Totals match either way (9 + 1 = 10 + 0). We're moving one deleted user
out of the "returning visitor" bucket and into the "brand-new user"
bucket, which is what they actually were.

## Test plan

- [x] `pnpm typecheck` and `pnpm lint` pass on the backend package
- [x] MAU equivalence matrix: 13/13 cases return the same set of users
(not just the same count) between OLD and NEW pipelines
- [x] Set-equality verified at 500k-MAU perf scale
- [x] Full-route benchmark confirms the expected memory / duration
improvements
- [ ] Sanity-check the dashboard rendering after deploy (split charts,
MAU counter, analytics overview)
- [ ] Monitor Sentry for the assertion error — should drop to zero

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Performance Improvements**
* Monthly and daily active metrics are now computed entirely server-side
for faster queries and reduced client-side processing.

* **Bug Fixes**
* More consistent handling of anonymous/missing IDs and stricter ID
filtering to improve accuracy across edge cases.

* **Tests**
* Added a comprehensive benchmark and validation harness to measure
query performance and verify result equivalence across variants.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2026-04-19 22:57:46 -07:00
BilalG1
0621ad2032
ai proxy fix (#1343)
<!--

Make sure you've read the CONTRIBUTING.md guidelines:
https://github.com/stack-auth/stack-auth/blob/dev/CONTRIBUTING.md

-->


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Refactor**
* Request sanitization now includes an extra proxy-specific
preprocessing step for safer AI proxying.
* **New Features**
* Initialization prompts centralized into a shared helper, with a
web-specific prompt variant.
* Authenticated requests can optionally route via a provided external
API key to access alternate models.
* **Chores**
* Added and exposed a preprocessing hook with a default no-op
implementation.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2026-04-19 22:57:38 -07:00
promptless[bot]
1743caca4e Merge branch 'dev' into promptless/document-developer-tools
Some checks failed
DB migration compat / Check if migrations changed (push) Has been cancelled
DB migration compat / Back-compat — Current branch migrations with ${{ needs.check-migrations-changed.outputs.base_branch }} branch code (push) Has been cancelled
DB migration compat / Forward-compat — Current branch code with ${{ needs.check-migrations-changed.outputs.base_branch }} branch migrations (push) Has been cancelled
DB migration compat / No migration changes (skipped) (push) Has been cancelled
2026-04-19 06:52:11 +00:00
Konstantin Wohlwend
be6970cd81 Fix Docker build scripts
Some checks failed
all-good: Did all the other checks pass? / all-good (push) Has been cancelled
Ensure Prisma migrations are in sync with the schema / check_prisma_migrations (22.x) (push) Has been cancelled
DB migration compat / Check if migrations changed (push) Has been cancelled
Docker Server Build and Push / Docker Build and Push Server (push) Has been cancelled
Docker Server Build and Run / docker (push) Has been cancelled
Runs E2E API Tests (Local Emulator) / E2E Tests (Local Emulator, Node ${{ matrix.node-version }}) (22.x) (push) Has been cancelled
Runs E2E API Tests / E2E Tests (Node ${{ matrix.node-version }}, Freestyle ${{ matrix.freestyle-mode }}) (mock, 22.x) (push) Has been cancelled
Runs E2E API Tests / E2E Tests (Node ${{ matrix.node-version }}, Freestyle ${{ matrix.freestyle-mode }}) (prod, 22.x) (push) Has been cancelled
Runs E2E API Tests with custom port prefix / build (22.x) (push) Has been cancelled
Runs E2E Fallback Tests / E2E Fallback Tests (Node ${{ matrix.node-version }}) (22.x) (push) Has been cancelled
Lint & build / lint_and_build (24) (push) Has been cancelled
TOC Generator / TOC Generator (push) Has been cancelled
DB migration compat / Back-compat — Current branch migrations with ${{ needs.check-migrations-changed.outputs.base_branch }} branch code (push) Has been cancelled
DB migration compat / Forward-compat — Current branch code with ${{ needs.check-migrations-changed.outputs.base_branch }} branch migrations (push) Has been cancelled
DB migration compat / No migration changes (skipped) (push) Has been cancelled
2026-04-18 23:50:50 -07:00
promptless[bot]
038789db4f Merge branch 'dev' into promptless/document-developer-tools 2026-04-19 06:15:48 +00:00
Konstantin Wohlwend
f0bbdb1c34 Make access token warning just a log 2026-04-18 23:14:27 -07:00
promptless[bot]
d3c7d8ef44 Merge branch 'dev' into promptless/document-developer-tools 2026-04-19 05:29:25 +00:00
Konstantin Wohlwend
82c923e03c waitUntil Sentry flush is complete 2026-04-18 22:28:02 -07:00
promptless[bot]
2b73bda425 Merge branch 'dev' into promptless/document-developer-tools 2026-04-19 05:22:26 +00:00
Konstantin Wohlwend
560ee4c16e Fix memory leak 2026-04-18 22:21:05 -07:00
promptless[bot]
4f142bfb3b Merge branch 'dev' into promptless/document-developer-tools 2026-04-19 04:47:33 +00:00
Konstantin Wohlwend
d568ad5149 Increase Clickhouse request timeout 2026-04-18 21:46:10 -07:00
promptless[bot]
3df259ffa8 Merge branch 'dev' into promptless/document-developer-tools 2026-04-19 02:32:40 +00:00
Konstantin Wohlwend
cf67d37611 Don't override 5xx errors 2026-04-18 19:31:13 -07:00
promptless[bot]
cefa440756 Merge branch 'dev' into promptless/document-developer-tools 2026-04-19 02:10:15 +00:00
Konstantin Wohlwend
ac9707b89e Update metrics endpoint to no longer trigger global error boundary on failure
Some checks failed
all-good: Did all the other checks pass? / all-good (push) Has been cancelled
Ensure Prisma migrations are in sync with the schema / check_prisma_migrations (22.x) (push) Has been cancelled
Docker Server Build and Push / Docker Build and Push Server (push) Has been cancelled
Docker Server Build and Run / docker (push) Has been cancelled
Runs E2E API Tests (Local Emulator) / E2E Tests (Local Emulator, Node ${{ matrix.node-version }}) (22.x) (push) Has been cancelled
Runs E2E API Tests / E2E Tests (Node ${{ matrix.node-version }}, Freestyle ${{ matrix.freestyle-mode }}) (mock, 22.x) (push) Has been cancelled
Runs E2E API Tests / E2E Tests (Node ${{ matrix.node-version }}, Freestyle ${{ matrix.freestyle-mode }}) (prod, 22.x) (push) Has been cancelled
Runs E2E API Tests with custom port prefix / build (22.x) (push) Has been cancelled
Runs E2E Fallback Tests / E2E Fallback Tests (Node ${{ matrix.node-version }}) (22.x) (push) Has been cancelled
Lint & build / lint_and_build (24) (push) Has been cancelled
Mirror main branch to main-mirror-for-wdb / lint_and_build (push) Has been cancelled
Publish npm packages / publish (push) Has been cancelled
Publish Swift SDK to prerelease repo / publish (push) Has been cancelled
Sync Main to Dev / sync-commits (push) Has been cancelled
TOC Generator / TOC Generator (push) Has been cancelled
2026-04-18 19:08:52 -07:00
promptless[bot]
000ac53798 Merge branch 'dev' into promptless/document-developer-tools 2026-04-19 00:34:01 +00:00
Konstantin Wohlwend
ee68908111 Skip Swift tests temporarily 2026-04-18 17:32:41 -07:00
promptless[bot]
d0ef20698a Merge branch 'dev' into promptless/document-developer-tools 2026-04-19 00:30:41 +00:00
Konstantin Wohlwend
1594ed94d5 Speed up seed script by a lot 2026-04-18 17:29:21 -07:00
promptless[bot]
0a114ed9c4 Merge branch 'dev' into promptless/document-developer-tools 2026-04-18 23:44:50 +00:00
Konstantin Wohlwend
f85b4f3997 Make Bulldozer SQL statements deterministic 2026-04-18 16:43:26 -07:00
promptless[bot]
77b9c5fd9e Merge branch 'dev' into promptless/document-developer-tools 2026-04-18 23:20:16 +00:00
Konstantin Wohlwend
8046a7dd8f Fix dashboard sidebar hover states 2026-04-18 16:18:55 -07:00
promptless[bot]
9d1e15236d Merge branch 'dev' into promptless/document-developer-tools 2026-04-18 21:56:01 +00:00
Konstantin Wohlwend
2e247dd06d Improve dashboard sidebar styling 2026-04-18 14:54:40 -07:00
promptless[bot]
f2f52eaaf9 Merge branch 'dev' into promptless/document-developer-tools 2026-04-18 21:47:39 +00:00
Konstantin Wohlwend
fd68701097 Fix bigint serialization error on tracing 2026-04-18 14:46:03 -07:00
promptless[bot]
da7166848e Merge branch 'dev' into promptless/document-developer-tools 2026-04-18 21:22:05 +00:00
Konstantin Wohlwend
91fbf63f7f chore: update package versions 2026-04-18 14:20:39 -07:00
promptless[bot]
9fc7631804 Merge branch 'dev' into promptless/document-developer-tools 2026-04-18 21:18:46 +00:00
Aman Ganapathy
847d14df70
[Fix]: Assortment of Bugs with Timefold Table and Payments (#1348) 2026-04-18 14:17:24 -07:00
promptless[bot]
729ca530dc Merge branch 'dev' into promptless/document-developer-tools 2026-04-18 00:58:50 +00:00
Konstantin Wohlwend
f4ca6cb4c7 More tracing for replication-related functions 2026-04-17 17:57:34 -07:00