Commit Graph

85 Commits

Author SHA1 Message Date
BilalG1
609579abab
feat(hexclave): PR 3 — native @hexclave/* source rename + delete dual-publish wiring (#1482)
Some checks failed
all-good: Did all the other checks pass? / all-good (push) Has been cancelled
Ensure Prisma migrations are in sync with the schema / check_prisma_migrations (22.x) (push) Has been cancelled
DB migration compat / Check if migrations changed (push) Has been cancelled
Docker Server Build and Push / Docker Build and Push Server (push) Has been cancelled
Docker Server Build and Run / docker (push) Has been cancelled
Runs E2E API Tests (Local Emulator) / E2E Tests (Local Emulator, Node ${{ matrix.node-version }}) (22.x) (push) Has been cancelled
Runs E2E API Tests / E2E Tests (Node ${{ matrix.node-version }}, Freestyle ${{ matrix.freestyle-mode }}) (mock, 22.x) (push) Has been cancelled
Runs E2E API Tests / E2E Tests (Node ${{ matrix.node-version }}, Freestyle ${{ matrix.freestyle-mode }}) (prod, 22.x) (push) Has been cancelled
Runs E2E API Tests with custom port prefix / build (22.x) (push) Has been cancelled
Runs E2E Fallback Tests / E2E Fallback Tests (Node ${{ matrix.node-version }}) (22.x) (push) Has been cancelled
Lint & build / lint_and_build (24) (push) Has been cancelled
TOC Generator / TOC Generator (push) Has been cancelled
DB migration compat / Back-compat — Current branch migrations with ${{ needs.check-migrations-changed.outputs.base_branch }} branch code (push) Has been cancelled
DB migration compat / Forward-compat — Current branch code with ${{ needs.check-migrations-changed.outputs.base_branch }} branch migrations (push) Has been cancelled
DB migration compat / No migration changes (skipped) (push) Has been cancelled
2026-05-29 15:21:59 -07:00
BilalG1
57ff5d3ce9
feat(hexclave): PR 2 — visible rebrand (Hexclave brand goes public) (#1481)
Some checks failed
all-good: Did all the other checks pass? / all-good (push) Has been cancelled
Ensure Prisma migrations are in sync with the schema / check_prisma_migrations (22.x) (push) Has been cancelled
DB migration compat / Check if migrations changed (push) Has been cancelled
Docker Server Build and Push / Docker Build and Push Server (push) Has been cancelled
Docker Server Build and Run / docker (push) Has been cancelled
Runs E2E API Tests (Local Emulator) / E2E Tests (Local Emulator, Node ${{ matrix.node-version }}) (22.x) (push) Has been cancelled
Runs E2E API Tests / E2E Tests (Node ${{ matrix.node-version }}, Freestyle ${{ matrix.freestyle-mode }}) (mock, 22.x) (push) Has been cancelled
Runs E2E API Tests / E2E Tests (Node ${{ matrix.node-version }}, Freestyle ${{ matrix.freestyle-mode }}) (prod, 22.x) (push) Has been cancelled
Runs E2E API Tests with custom port prefix / build (22.x) (push) Has been cancelled
Runs E2E Fallback Tests / E2E Fallback Tests (Node ${{ matrix.node-version }}) (22.x) (push) Has been cancelled
Lint & build / lint_and_build (24) (push) Has been cancelled
TOC Generator / TOC Generator (push) Has been cancelled
DB migration compat / Back-compat — Current branch migrations with ${{ needs.check-migrations-changed.outputs.base_branch }} branch code (push) Has been cancelled
DB migration compat / Forward-compat — Current branch code with ${{ needs.check-migrations-changed.outputs.base_branch }} branch migrations (push) Has been cancelled
DB migration compat / No migration changes (skipped) (push) Has been cancelled
## Summary

**Stacked on [#1475](https://github.com/hexclave/stack-auth/pull/1475)**
(`cl/hexclave-pr1`, the invisible compatibility layer). Diff vs that
base = the actual PR 2 code.

This is **PR 2 of the Stack Auth → Hexclave rebrand: the visible flip**.
Old wire identifiers (cookies, request/response headers, Bearer prefix,
JWT issuers, MCP tool name) keep working indefinitely via PR 1's
dual-accept. This PR flips every user-visible surface — package names
taught in docs, SDK class names in code examples, dashboard setup
snippets, page titles, error messages, email content, CLI binary,
default base URLs, GitHub repo slug, contributor guidance — to the
Hexclave brand.

See [`RENAME-TO-HEXCLAVE.md`](./RENAME-TO-HEXCLAVE.md) → *"PR 2: Rebrand
to Hexclave (visible)"* for the full per-work-area spec.

## What's implemented (per the plan's PR 2 scope)

- **SDK base URLs** flipped: `defaultBaseUrl` and
`defaultAnalyticsBaseUrl` in
[common.ts](packages/template/src/lib/stack-app/apps/implementations/common.ts:127)
→ `https://api.hexclave.com` / `https://r.hexclave.com`. PR 1's
[`getHardcodedFallbackUrls`](packages/stack-shared/src/utils/urls.tsx:199)
table now keys on the Hexclave domain.

- **Domain inventory sweep** (16 subdomains from the plan): every
`api/app/docs/discord/demo/mcp/skill/feedback/test/preview/r/api2/api.staging/idp-jwk-audience/built-with.stack-auth.com`
reference in production code, docs-mintlify, examples, READMEs, and
contributor guidance flipped to `*.hexclave.com`. Carve-outs: PR 1's
intentional JWT issuer dual-accept table in
[tokens.tsx](apps/backend/src/lib/tokens.tsx), the legacy `./docs/`
folder, the `unified-docs-widget` allowlist (deliberately accepts both
during DNS transition), and `url-targets.ts` hosted-component default
(baked into existing customer deploys).

- **`@deprecated` JSDoc** on every `Stack*` public export
([packages/template/src/lib/stack-app/index.ts](packages/template/src/lib/stack-app/index.ts)
+ [packages/template/src/index.ts](packages/template/src/index.ts)) —
`StackClientApp`, `StackServerApp`, `StackAdminApp` + every
constructor/options/JSON type, `StackHandler`, `StackProvider`,
`StackTheme`, `useStackApp`, `defineStackConfig`, `StackConfig`.
Hexclave\* aliases are now canonical.

- **Runtime `console.warn`**
([packages/template/src/internal/deprecation-warning.ts](packages/template/src/internal/deprecation-warning.ts))
— once-per-process when the SDK is loaded from a `@stackframe/*`
artifact. Detection uses the existing
`STACK_COMPILE_TIME_CLIENT_PACKAGE_VERSION_SENTINEL` (rewritten at build
time to e.g. `js @stackframe/stack@2.8.92` or `js
@hexclave/next@1.0.0`); `@hexclave/*` mirror artifacts short-circuit the
warning.

- **Tier 3 data migration**: new idempotent SQL migration
[`20260523000000_rename_internal_project_to_hexclave`](apps/backend/prisma/migrations/20260523000000_rename_internal_project_to_hexclave/migration.sql)
— updates the internal Project `displayName` 'Stack Dashboard' →
'Hexclave Dashboard' and `description` only if both still hold the
pre-rebrand defaults. Operator-renamed projects untouched, missing row
no-ops, re-runs are no-ops. [`seed.ts`](apps/backend/prisma/seed.ts:87)
default flipped. `getSharedEmailConfig("Stack Auth")` → `("Hexclave")`.

- **Tier 4 brand strings** (mechanical sweep, ~340 files):
- Page + OpenAPI titles (Hexclave API / Dashboard / REST API / Webhooks
API / Documentation). OpenAPI `info.description` documents
`X-Hexclave-*` headers as canonical with compat note on `X-Stack-*`.
- `HexclaveAssertionError` message text
([errors.tsx:71](packages/stack-shared/src/utils/errors.tsx:71)) — "an
error in Stack." → "an error in Hexclave."
- Known-error message templates
([known-errors.tsx](packages/stack-shared/src/known-errors.tsx)) flipped
to lead with `x-hexclave-*` + the new `docs.hexclave.com` URL; legacy
`x-stack-*` mentioned as compat aliases. **25 e2e test files updated in
lockstep**.
- Email content: failed-emails-digest body, sendTestEmail recipient (now
`sent-with-hexclave.com`), test-email-recipient default.
  - `CHANGELOG.md` title → "Hexclave Changelog".
- `AGENTS.md` env var convention: new vars prefix `HEXCLAVE_` /
`NEXT_PUBLIC_HEXCLAVE_` for Category A/B; legacy `STACK_*` explicitly
noted as accepted via PR 1's dual-read.

- **CLI / init wizard**:
- Every dashboard setup snippet, init-stack template, and docs-mintlify
page teaches `npx @hexclave/cli@latest init` (was
`@stackframe/stack-cli`).
[setup-page.tsx](apps/dashboard/src/app/(main)/(protected)/projects/[projectId]/(overview)/setup-page.tsx)
+
[link-existing-onboarding](apps/dashboard/src/app/(main)/(protected)/(outside-dashboard)/new-project/page-client-parts/link-existing-onboarding.tsx).
- [init-stack](packages/init-stack/src/index.ts:634)
`STACK_*_INSTALL_PACKAGE_NAME_OVERRIDE` defaults flipped to
`@hexclave/*`.
- Generated `stack/client.ts` / `stack/server.ts` import from
`@hexclave/next` and reference `HexclaveClientApp` /
`HexclaveServerApp`.
- Internal `StackAuthKeys` dashboard component renamed to
`HexclaveKeys`.

- **docs-mintlify rewrite** (legacy `./docs/` intentionally untouched
per scoping decision):
- **78 MDX files swept**.
`@stackframe/{react,stack,js,tanstack-start,...}` →
`@hexclave/{react,stack,js,...}` in install snippets and code blocks;
`Stack*` SDK class names → `Hexclave*` in all code examples; 'Stack
Auth' brand phrase → 'Hexclave'.
- `openapi/{server,admin,client,webhooks}.json` titles → 'Hexclave REST
API' / 'Hexclave Webhooks API'.

- **Generators flipped before regeneration**:
-
[`packages/stack-shared/src/helpers/init-prompt.ts`](packages/stack-shared/src/helpers/init-prompt.ts),
[`/ai/prompts.ts`](packages/stack-shared/src/ai/prompts.ts),
[`apps/backend/src/lib/ai/prompts.ts`](apps/backend/src/lib/ai/prompts.ts),
[`apps/backend/src/lib/ai/tools/create-email-{template,draft}.ts`](apps/backend/src/lib/ai/tools/create-email-template.ts),
[`apps/skills/src/app/route.ts`](apps/skills/src/app/route.ts) (taught
MCP tool → `ask_hexclave` with compat note; CLI binary teach →
`hexclave`),
[`docs-mintlify/snippets/home-prompt-island.jsx`](docs-mintlify/snippets/home-prompt-island.jsx),
[`packages/template/README.md`](packages/template/README.md) +
integrations/convex/component/README.md.
  - `generate-sdks` propagated changes to `packages/{react,stack,js}`.

- **OpenAPI dual-documentation**:
[`apps/backend/src/app/api/latest/route.ts`](apps/backend/src/app/api/latest/route.ts)
now lists `X-Hexclave-*` headers as primary documented schemas with
`X-Stack-*` duplicates marked `.optional()` (both accepted at runtime by
PR 1's normalize-at-proxy shim).

- **`@stackframe/emails` virtual module**: dual-aliased to
`@hexclave/emails` at the bundler boundary
([email-rendering.tsx:89](apps/backend/src/lib/email-rendering.tsx:89)).
Stored email templates continue to import from either name; new
AI-generated templates and the system prompt teach `@hexclave/emails`.

- **Tier 2 mirror-publish wiring** (new this PR, lays the groundwork for
`@hexclave/*` first publish):
-
[`scripts/rewrite-packages-to-hexclave.ts`](scripts/rewrite-packages-to-hexclave.ts)
— rewrites 9 publishable `@stackframe/*` → `@hexclave/*` `package.json`
files (reads `HEXCLAVE_VERSION` env or `--version=` flag), pins
cross-deps to the shared `@hexclave` version, registers `hexclave` bin
alongside `stack` for `@hexclave/cli`.
-
[`.github/workflows/npm-publish.yaml`](.github/workflows/npm-publish.yaml)
appended with rewrite-then-republish step. `pnpm publish` skips
already-on-npm versions so reruns are safe.

- **Sender email domain**: `noreply@stackframe.co` →
`noreply@sent-with-hexclave.com` (the dedicated transactional-sender
domain split per the plan, to isolate bulk deliverability from
`hexclave.com` reputation); `security@` / `team@stack-auth.com` inbound
mailboxes → `@hexclave.com`.

- **Self-host docs**: docker network / container names in the bash
examples flipped from `stack-auth` to `hexclave` (`hexclave-postgres`,
`hexclave-clickhouse`, `hexclave.env`). The docker image tag
`stackauth/server:latest` stays per the plan's locked decision.

- **GitHub repo slug**: `hexclave/stack-auth` → `hexclave/hexclave` in
every `package.json` `repository` field, README link, CHANGELOG
raw-asset URL.

## Carve-outs (deliberately untouched)

-
**[`apps/backend/src/lib/tokens.tsx`](apps/backend/src/lib/tokens.tsx)**
JWT issuer dual-accept table — PR 1 intentional infrastructure, kept
indefinitely.
- **Legacy `./docs/` folder** — per scoping decision (only
`docs-mintlify/` rewritten).
- **`unified-docs-widget` hostname allowlist** — accepts both
`.hexclave.com` (canonical) and `.stack-auth.com` (transition window)
for DNS rollout.
- **`url-targets.ts`** hosted-domain default
`.built-with-stack-auth.com` — wire identifier baked into existing
customer deploys; indefinite read-fallback.
- **Binary visual assets** (logos, favicons, OG images, README
screenshots) — out of scope for this PR. Need design work; tracked
separately.

## Verification

- **`pnpm typecheck`** on
`packages/{template,stack-shared,react,stack,js}` + `apps/dashboard`:
**all green**. The remaining backend / e-commerce-demo typecheck errors
are pre-existing (Prisma codegen output +
`./generated/api-versions.json` not present in fresh worktrees without
`pnpm run codegen-prisma` + a live DB) and unrelated to this diff.
- **`pnpm lint`** on the same 6 packages: all green.
- **Final grep** for residual `Stack Auth` / `stack-auth.com` /
`@stackframe/stack-cli@latest` references: zero outside the intentional
carve-outs above.
- **25 e2e test files updated in lockstep** with the known-error message
changes (asserted strings flipped to match the new x-hexclave-* +
compat-note messages).

## Deploy blockers (ops sequencing before this rebrand goes live)

This PR is code-complete, but the rebrand's visible surfaces (SDK
default URLs, dashboard links, npm READMEs, REST error messages, runtime
deprecation warning) all point at `*.hexclave.com` / `@hexclave/*`
resources that don't exist yet. None of these are fixable from a PR —
they're ops/registrar/npm work that has to be sequenced before merging
this to a release tag.

Suggested ordering, hardest blockers first:

### Tier 1 — required before customer-facing deploy (everything below
this line *will visibly break customers on day 1* if skipped)

1. **DNS + TLS for `api.hexclave.com` + `api1./api2.hexclave.com`** →
must point at the same backend that serves `api.stack-auth.com` (or a
backend that mirrors PR 1's dual-accept). The SDK's new `defaultBaseUrl`
is `https://api.hexclave.com`; every customer that relied on the old
default and upgrades to a post-PR2 SDK build sends API requests here.
Until this resolves, every default-config customer's API call NXDOMAINs.
2. **DNS for `app.hexclave.com`** → the dashboard. Referenced in the
SDK's default-error messages ("Please create a project on the Hexclave
dashboard at https://app.hexclave.com"), the init-stack flow's
`wizard-congrats` redirect, and the OAuth dashboard handoff.
3. **DNS for `docs.hexclave.com`** + Mintlify deploy → the SDK runtime
deprecation warning (`https://docs.hexclave.com/migration`), every
README, every "Learn more" link in the dashboard, and every REST API
error body (`/api/overview#authentication`) points here. The MDX is in
this PR; the docs build target needs DNS.
4. **DNS for `mcp.hexclave.com`** → the MCP server endpoint that every
taught agent integration (`claude mcp add ...`, `cursor`, `codex`,
`vscode`) registers. Until this resolves, every `npx
@hexclave/cli@latest init` MCP-registration step fails.
5. **Reserve the `@hexclave` npm scope + set repo variable
`HEXCLAVE_VERSION`** → the mirror-publish step in
`.github/workflows/npm-publish.yaml` is gated on this variable. Without
it, the entire taught onboarding command `npx @hexclave/cli@latest init`
404s from the npm registry, *and* every README that says "install
`@hexclave/next`" leads to install failure. Pick the initial version
intentionally (`1.0.0` or aligned to `@stackframe/stack`); don't accept
a silent default.

### Tier 2 — required before announcing the rebrand publicly (lookalike
or low-traffic surfaces, but visibly broken)

6. **DNS for `r.hexclave.com`** → the analytics beacon
`defaultAnalyticsBaseUrl`. Silent failure if missing (analytics drops),
but should land alongside Tier 1.
7. **Register `sent-with-hexclave.com` + full email auth (SPF / DKIM /
DMARC)** → the new default sender domain for shared-sender transactional
emails. Without it the dashboard "send test email" path emits bounces,
and shared-sender flows (`getSharedEmailConfig("Hexclave")`) deliver to
spam at best.
8. **MX + SPF / DMARC for `hexclave.com`** → `team@hexclave.com` and
`security@hexclave.com` mailboxes. The security disclosure mailbox is
referenced in [`.github/SECURITY.md`](.github/SECURITY.md);
`team@hexclave.com` is the actual recipient of internal feedback emails
sent at runtime by
[`apps/backend/src/lib/internal-feedback-emails.tsx`](apps/backend/src/lib/internal-feedback-emails.tsx).
Today, every runtime feedback email bounces.
9. **DNS for `skill.hexclave.com`** → the canonical AI-agent skill fetch
URL (the agent bootstrap pivot). Without it, the entire "agent downloads
`SKILL.md` from a known URL" flow taught in
[`packages/stack-shared/src/helpers/init-prompt.ts`](packages/stack-shared/src/helpers/init-prompt.ts)
fails.
10. **Create `github.com/hexclave/hexclave` as a public repo** (even as
a redirect to `hexclave/stack-auth`) **OR** rewrite every `package.json`
`"repository"` field + dashboard footer "view on GitHub" link to point
at `hexclave/stack-auth` (which already exists). Currently every npm
package page's "Repository" link is dead, and the dashboard's GitHub
button + dev-tool repo link are dead.

### Tier 3 — broken but low-visibility / low-traffic

11. **DNS for `discord.hexclave.com`** → Discord invite redirect, used
in every README's chip and the dashboard footer.
12. **DNS for `demo.hexclave.com`** → " Demo" badge in every npm
package README. Broken-image badge on the package page.
13. **DNS + TLS for `built-with-hexclave.com`** → optional
hosted-handler domain (the default reverted to
`.built-with-stack-auth.com` in this PR's carve-outs, so this only
matters for projects that manually flip).

## Other follow-ups (not deploy-blocking)

- **E2E snapshot regen across the full suite** for the dual-emitted
`x-hexclave-*` response headers (PR 1 follow-up; `vitest -u` in CI
absorbs).
- **Binary visual assets** — logos, favicons, OG images, README
screenshots; need design pass.
- **Backend OpenAPI fumadocs regen** in CI flow — the JSON files in
`docs-mintlify/openapi/` are committed but regen runs in CI. Verify the
workflow that does this still works against the post-PR2 source.
- **Backend typecheck infra debt** — needs `codegen-prisma` +
`codegen-route-info` to clear; pre-existing, unaffected by this PR.

## Test plan

- [ ] CI runs full e2e suite (with `vitest -u` to absorb residual
snapshot deltas, then committed back).
- [ ] Spot-check: new `@hexclave/cli init` (once published) generates
`hexclave.config.ts` and works against a fresh project.
- [ ] Spot-check: existing customer with `@stackframe/stack` import sees
the once-per-process `console.warn` recommending `@hexclave/next` on SDK
init.
- [ ] Manual: dashboard setup page renders the `npx @hexclave/cli@latest
init` snippet and the `x-hexclave-publishable-client-key` API header in
the curl example.
- [ ] Manual: a fresh `pnpm run prisma migrate` against a clean DB sets
the internal project displayName to 'Hexclave Dashboard'.

---------

Co-authored-by: Konstantin Wohlwend <n2d4xc@gmail.com>
2026-05-26 19:18:20 -07:00
BilalG1
f7e389809e
feat(hexclave): PR 1 — wire compatibility layer (invisible) (#1475)
Some checks failed
all-good: Did all the other checks pass? / all-good (push) Has been cancelled
Ensure Prisma migrations are in sync with the schema / check_prisma_migrations (22.x) (push) Has been cancelled
DB migration compat / Check if migrations changed (push) Has been cancelled
Docker Server Build and Push / Docker Build and Push Server (push) Has been cancelled
Docker Server Build and Run / docker (push) Has been cancelled
Runs E2E API Tests (Local Emulator) / E2E Tests (Local Emulator, Node ${{ matrix.node-version }}) (22.x) (push) Has been cancelled
Runs E2E API Tests / E2E Tests (Node ${{ matrix.node-version }}, Freestyle ${{ matrix.freestyle-mode }}) (mock, 22.x) (push) Has been cancelled
Runs E2E API Tests / E2E Tests (Node ${{ matrix.node-version }}, Freestyle ${{ matrix.freestyle-mode }}) (prod, 22.x) (push) Has been cancelled
Runs E2E API Tests with custom port prefix / build (22.x) (push) Has been cancelled
Runs E2E Fallback Tests / E2E Fallback Tests (Node ${{ matrix.node-version }}) (22.x) (push) Has been cancelled
Lint & build / lint_and_build (24) (push) Has been cancelled
TOC Generator / TOC Generator (push) Has been cancelled
DB migration compat / Back-compat — Current branch migrations with ${{ needs.check-migrations-changed.outputs.base_branch }} branch code (push) Has been cancelled
DB migration compat / Forward-compat — Current branch code with ${{ needs.check-migrations-changed.outputs.base_branch }} branch migrations (push) Has been cancelled
DB migration compat / No migration changes (skipped) (push) Has been cancelled
## Summary

**Stacked on #1468** (`docs/hexclave-rename-plan` — the plan doc). Diff
vs that base = the actual PR 1 code.

This is **PR 1 of the Hexclave rebrand: the invisible compatibility
layer**. Everything is additive. Old SDKs, old wire identifiers, and old
env var names keep working unchanged. The backend dual-accepts and
dual-emits; new SDK code emits `x-hexclave-*` headers and the
`hexclave_` Bearer prefix; cookies dual-write; env vars dual-read across
every category. **No user-visible rebranding lands here** — that's PR 2.

See [`RENAME-TO-HEXCLAVE.md`](./RENAME-TO-HEXCLAVE.md) → *"PR 1
implementation guide"* for the full per-work-area spec, file pointers,
and chosen approach.

## What's implemented (all 14 PR-1 work-areas)

- **SDK export aliases** — `Hexclave*` aliases for the user-facing
`Stack*` exports added in `packages/template`; codegen propagates them
to `@stackframe/{js,stack,react,tanstack-start}`. React-only aliases
correctly excluded from `@stackframe/js`. (`e60550a2`)
- **JWT issuer dual-accept** — `decodeAccessToken` accepts both
`api.stack-auth.com` and `api.hexclave.com` issuers. Signing unchanged.
(`fc781def`)
- **Request-header dual-accept** — backend + dashboard proxies normalize
`x-hexclave-*` → `x-stack-*` at the existing empty proxy hook (so
`smart-request.tsx` and every route schema keep working unchanged); CORS
allowlists extended via a derive-once helper. (`2a056eac`)
- **MCP `ask_hexclave`** — registered alongside `ask_stack_auth` via a
shared helper; `ask_stack_auth` behavior byte-identical. (`30ffd604`)
- **Dev-tool** — DOM ids + header emit switched.
`window.HexclaveDevTool` exposed alongside `window.StackDevTool`.
(`32131ea7`)
- **The big consolidated commit** (`7fed864a`):
- **Env vars** — central `getEnvVariable` prefix-transform (HEXCLAVE
first, STACK fallback); dashboard + template client env files dual-read;
`turbo.json` globalEnv; `NEXT_PUBLIC_STACK_PORT_PREFIX` renamed outright
across ~82 files including docker.
- **Cookies** — dual-write/dual-read auth (`stack-access`/`-refresh-*`
and custom-domain variants), OAuth-state
(`stack-oauth-{inner,outer}-*`), and low-risk cookies (`stack-is-https`,
`stack-last-seen-changelog-version`). Bypass sites patched (backend
OAuth callback, dashboard remote-dev auth route, impersonation snippets,
snapshot serializer).
- **Bearer prefix** — SDK token parser accepts both `stackauth_` and
`hexclave_`; emits `hexclave_`. Discovery correction: this is purely
SDK-internal — the backend never parses it.
- **Response headers** — backend dual-emits
`x-hexclave-{request-id,actual-status,known-error}`; SDKs dual-read (new
first, stack fallback).
- **SDK request-header emit switch** —
`client/server/admin-interface.ts` + dashboard `api-headers.ts` +
`internal-project-headers.ts` + `feedback-form.tsx` switched to
`x-hexclave-*`. Plus `stack_response_mode` query param.
- **Storage keys** — dev-tool / cli-auth / oauth-button / docs keys
renamed (straight); `stack:session-replay:v1` dual-read so in-progress
recordings survive SDK upgrades; `stack_mfa_attempt_code` dual-read.
- **Query params** — cross-domain params dual-emit/dual-accept via
shared helpers; backend `oauth/authorize` accepts
`hexclave_response_mode` and `stack_response_mode`; `stack-init-id`
renamed.
- **`Symbol.for`** — app-internals symbol gets a parallel
`Symbol.for("Hexclave--app-internals")` getter on each attach site (no
read-site churn — old symbol still attached). 3 file-private symbols
renamed outright.
- **Config discovery** — prefer `hexclave.config.ts`, fall back to
`stack.config.ts` at every discovery site (CLI / dashboard / backend /
local-emulator); `init` writes the new filename; CLI credentials path
migrates.
- **Internal renames** — `StackAssertionError`,
`StackClient/Server/AdminInterface` renamed outright (no alias, per the
"internal-only → rename" rule). ~264 files touched.
- **Review-pass fixes** (`21217fbe`) — three real bugs found by parallel
review agents and fixed:
- `snapshot-serializer.ts` was interpolating the whole
`keyedCookieNamePrefixes` array (`${arr}`) — adding a second prefix
would have corrupted **every** OAuth-cookie snapshot, not just new ones.
- **Docker port-prefix producer/consumer mismatch** —
`entrypoint.sh`/`run-emulator.sh`/cloud-init `user-data` were still
producing `NEXT_PUBLIC_STACK_PORT_PREFIX` while the dashboard sentinel +
consumers had been renamed; silent self-host regression (custom port
prefix would be ignored).
- **Missing `hexclave-oauth-inner-*` dual-write** in the OAuth authorize
route — callback's fallback masked it but the dual-write was specified
by the plan.
- Plus: `mcp.test.ts` tool-list assertions updated to include
`ask_hexclave`; two dashboard header-emit sites switched to
`x-hexclave-*` for consistency.
- **E2E snapshot serializer follow-up** (`4b16cc5d`) —
`x-hexclave-request-id` added to the hidden-headers list (mirroring
`x-stack-request-id` treatment), and 2 sample inline snapshots
regenerated in `projects.test.ts` to include the new dual-emitted
headers.

## Verification

- **`pnpm typecheck`** — clean (the fresh-worktree `@/.source` / Prisma
codegen gap in `stack-docs` is pre-existing and unrelated).
- **`pnpm lint`** — 29/29 packages green.
- **`pnpm exec turbo run build --filter=./packages/*`** — 13/13 packages
build (including `@stackframe/stack-cli` once the dashboard standalone
is present).
- **Live E2E** against a running backend on `cl/hexclave-pr1`:
- `pnpm test run
apps/e2e/tests/backend/endpoints/api/v1/internal/mcp.test.ts` — **6/6
pass** (verifies the new `ask_hexclave` tool — the hand-written inline
snapshot matched actual MCP server output).
- `pnpm test run
apps/e2e/tests/backend/endpoints/api/v1/internal/projects.test.ts` —
**11/11 pass** (verifies wire dual-accept + dual-emit end-to-end; the
snapshot serializer fix was found and applied during this check).

A four-agent parallel **review pass** also audited the full diff for
logic/runtime bugs across the work-areas (wire headers + JWT, cookies +
bearer + symbols, env vars, query params + config + MCP + aliases). All
in-slice review verdicts were ✓ except the three bugs listed above,
which are now fixed.

## Known follow-ups (out of scope for this PR)

- **E2E snapshots across the rest of the suite** — backend now
dual-emits `x-hexclave-{known-error,actual-status}` alongside
`x-stack-*`, which legitimately appears in inline snapshots throughout
`apps/e2e`. Two were regenerated here as a sample; the rest should regen
with `vitest -u` in CI.
- **Docker shell env vars beyond `PORT_PREFIX`** — `entrypoint.sh` still
reads `STACK_*` env vars directly (the JS-side `getEnvVariable`
transform doesn't help the shell). JS consumers dual-read so it works in
practice; full shell-level dual-read is a deeper self-host follow-up.
- **`@stackframe/stack-cli` build ordering** — pre-existing; needs
`build:rde-standalone` first. Not affected by this PR.

## Test plan

- [ ] CI runs full e2e suite (with `vitest -u` to absorb dual-emit
snapshot deltas, then committed back)
- [ ] Spot-check: an old SDK build (emitting only `x-stack-*`) still
authenticates against the new backend
- [ ] Spot-check: a new SDK (emitting `x-hexclave-*` / `Bearer
hexclave_*`) still authenticates against an old backend during deploy
ordering
- [ ] Manual: `npx @stackframe/stack-cli@latest init` (new onboarding
entrypoint) generates `hexclave.config.ts`
- [ ] Manual: existing `stack.config.ts`-only project still resolves (no
migration required)

---------

Co-authored-by: bilal <bilal@stack-auth.com>
2026-05-23 17:24:55 -07:00
BilalG1
91b8e4caa4
Fix /internal/metrics ClickHouse OOM (#1457)
Fixes Sentry
[STACK-BACKEND-16H](https://stackframe-pw.sentry.io/issues/STACK-BACKEND-16H)
— the `/api/v1/internal/metrics` endpoint was triggering the cluster's
10.8 GiB OvercommitTracker kill on tenants with months of
`$token-refresh` history.

## Root cause

Three queries in `loadAnalyticsOverview` plus `loadUsersByCountry` did
`GROUP BY user_id` over the events table with **no lower `event_at`
bound**, so their hash table working set scaled with
cumulative-distinct-users-ever-seen instead of the 30-day metrics
window.

## Changes

- Add 30-day `event_at` lower bound to `loadUsersByCountry` and to the
`analyticsUserJoin` inner subquery (used by `dailyEvents`,
`totalVisitors`, `topReferrers`).
- New `getClickhouseAdminClientForMetrics()` factory in
`lib/clickhouse.tsx` with connection-level safety net: per-query +
per-user memory caps, external GROUP BY spill, and `join_algorithm:
'grace_hash,parallel_hash,hash'` (grace_hash measured to give 48% memory
reduction at zero latency cost — see benchmark notes in the file).
- Inline comment + concrete next steps for the long-term fix (option C:
stamp `is_anonymous` at ingest on page-view/click events, then drop the
join entirely).
- Extend `scripts/benchmark-internal-metrics.ts` with the
historical-seed knob and three new modes (`BENCH_BACKFILL_COMPARE`,
`BENCH_JOIN_ALGO_COMPARE`, plus the existing `BENCH_ROUTE_QUERIES`
updated) used to validate the choices above.

## Benchmark — pre-PR vs post-PR

Synthetic seed: 300k users × 9 events spread over 365 days (~2.7M
events).

| | pre-PR | post-PR | delta |
|---|---:|---:|---:|
| Sum peak memory | 2.18 GiB | 515 MiB | **4.3× less** |
| Max query duration | 1293 ms | 101 ms | **12.8× faster** |
| Sum CPU duration | 5119 ms | 394 ms | 13× less work |
| Sum bytes read | 3.87 GiB | 929 MiB | 4.3× less I/O |

Per-query at 300k users:
- `analyticsOverview:dailyEvents` 561 → 44 MiB (12.8× less)
- `analyticsOverview:totalVisitors` 560 → 50 MiB (11.2× less)
- `analyticsOverview:topReferrers` 546 → 50 MiB (10.9× less)
- `loadUsersByCountry` 388 → 44 MiB (8.9× less)

## Caveats

- `loadDailyActiveSplitFromClickhouse` still scans all-history on its
`min(event_at)` subquery. It can't be naively bounded — `first_date` is
used to classify entities as new vs reactivated, and a 30d bound would
silently mislabel old-but-active entities as "new." The new SETTINGS
cap+spill it; the proper fix is option C (documented inline).
- A user with a page-view but no `$token-refresh` in the last 30 days
now falls through to `coalesce(NULL, 0)` and is classified
non-anonymous. Token-refresh fires every few minutes per active session,
so this is rare but not impossible (embedded SDKs that poll less
frequently, sessions straddling the 30d boundary).
- `max_memory_usage_for_user: 9 GB` trades "cluster-wide
OvercommitTracker kill of a random query" for "clean per-user memory
error attributed to the specific query." After our 30d bounds, no query
is anywhere near 9 GB.

## Test plan

- [x] `pnpm typecheck` passes
- [x] `pnpm lint` passes
- [x] `pnpm test run
apps/e2e/tests/backend/endpoints/api/v1/internal-metrics.test.ts` — 9/10
pass; the 1 failure (`risk_scores` snapshot drift) reproduces on clean
`dev` and is unrelated
- [x] `pnpm test run
apps/e2e/tests/backend/endpoints/api/v1/analytics-{events,events-batch,query}.test.ts
apps/e2e/tests/backend/endpoints/api/v1/token-refresh-events.test.ts
apps/e2e/tests/backend/performance/metrics.test.ts` — all passing tests
pass; 10 pre-existing `PRODUCT_DOES_NOT_EXIST` setup failures reproduce
on clean `dev`
- [x] Benchmark `BENCH_ROUTE_QUERIES=1` at 300k users shows the deltas
above

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Chores**
* Improved internal metrics collection to use metrics-specific DB
settings for more reliable, safer analytical reads.
* Added guardrails to metrics queries to enforce time-window bounds and
avoid unbounded scans.
* Expanded benchmark modes (backfill and join-algo comparisons),
extended perf seeding, and improved logging/retry behavior to capture
more complete stats and reduce missing log rows.

<!-- review_stack_entry_start -->

[![Review Change
Stack](https://storage.googleapis.com/coderabbit_public_assets/review-stack-in-coderabbit-ui.svg)](https://app.coderabbit.ai/change-stack/hexclave/stack-auth/pull/1457?utm_source=github_walkthrough&utm_medium=github&utm_campaign=change_stack)

<!-- review_stack_entry_end -->
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2026-05-21 13:47:32 -07:00
Aman Ganapathy
a9623d976a
[Refactor] [Fix] Remove default prod creation (#1350)
With the new bulldozer rework we dont support default products anymore.
Users are encouraged to currently manually handle granting products to
their end users.

We block api requests and new product creations that attempt to set no
price, and we remove any options to set include-by-default. We also
migrate users' existing product snapshots in `Subscriptions`,
`OneTimePurchases`, and `ProductVersions` to have no price set if it's
an include-by-default product. This will make it so that next time a
user goes onto their products page, they will be informed that the
pricing is invalid and it is no longer delivered by default.

Note, however, that these products will still be providing items and the
like to the users who have them.

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Bug Fixes**
* Migrated legacy product snapshots so missing included-items no longer
break readers.
* Removed deprecated "include-by-default" pricing sentinel; pricing now
requires explicit price entries and write validation rejects the old
sentinel.

* **Chores**
* Simplified dashboard pricing flows: create/edit/save now use explicit
prices and surface an alert when a formerly implicit free plan needs an
explicit $0 price.
* Config overrides and stored data are auto-normalized to explicit price
objects.

* **Tests**
* Updated and added tests covering migration, validation, and switching
behavior for explicit prices.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: mantrakp04 <mantrakp@gmail.com>
Co-authored-by: Mantra <87142457+mantrakp04@users.noreply.github.com>
2026-05-15 10:38:33 -07:00
Aman Ganapathy
4648fc1899
[Feat] new scripts on migrate/seed/init run for internal (#1421)
### Context

One script grants free plan to any team which is a customer of the
internal project who doesnt have it already.
We also want to migrate our users (internal) to the latest version of
their products.

Needed because some subs on dev right now dont have a plan. And internal
isnt using latest version of its own growth plan.

### Describing the Paths we want to Account for

1. Users on production who currently don't have a plan should get free
plans, since this script is run with every migrate
2. Users on production should get the latest version of each plan of
ours. So a forced migration to latest version of internal project plans
3. No other project's products/product lines should be affected. They
will continue to have product versioning
4. 2 should apply to test mode subscriptions as well, on top of stripe
subscriptions. All of them should be refreshed
5. Internal project itself should get latest version of its own growth
plan
6. If the bulldozer write fails, we should be able to recover on next
migration (this should already be handled by init bulldozer script,
because it checks if prisma db and bulldozer db are out of sync)
7. if the regenerate or backfill fail, we should be able to recover just
by rerunning the script
8. Product version table should not balloon. No table should really
balloon



### What I've tested on local
1. Put in 1000 db subscription rows, made them all stale and then ran
the regen script. It took about 6 minutes to update all of them, and it
was idempotent so rerunning it again did nothing.
2. With proper stripe keys I switched off of test mode on the internal
app, granted a product to a new team and updated the product's item
list. At this point I checked and the new team had the outdated version
of the product. Then I ran the regen script and the new team was moved
to latest product version.
3. Tried the above with the internal team's growth plan too and it
worked as well.
4. Backfill actually grants free plan


### Deployment strategy in prod
Run the backfill and the regen scripts once each after your migrations
on the prod db.
`pnpm db:backfill-internal-free-plans` will make sure every team has a
free plan at least if they dont have an existing plan (and it is
idempotent).
After that, run `pnpm db:regen-internal-subscriptions-to-latest` which
will migrate every user to the latest version of their plan (i.e latest
snapshot). This should also be idempotent.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Automated backfill to grant internal free plans to qualifying billing
teams.
* Regeneration tool to refresh internal subscription snapshots to the
latest product versions.

* **Chores**
* Added CLI commands and package scripts to run backfill and regen jobs.
  * Database init now runs payment initialization before backfill/regen.

* **Tests**
* Integration and unit tests added/updated to validate backfill,
regeneration, and free-plan idempotency.

[![Review Change
Stack](https://storage.googleapis.com/coderabbit_public_assets/review-stack-in-coderabbit-ui.svg)](https://app.coderabbit.ai/change-stack/hexclave/stack-auth/pull/1421)
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2026-05-12 16:05:45 -07:00
Konsti Wohlwend
765b0f4e29
New setup (#1413) 2026-05-06 12:03:06 -07:00
BilalG1
85ae4b1c9e
Fix ClickHouse OOM in MAU query + optimize /internal/metrics route (#1344)
## Summary

Fixes the Sentry `StackAssertionError: Failed to load monthly active
users for internal metrics` crash (ClickHouse OOM at the 7.2 GiB
per-query cap) and applies two related optimizations to other queries in
the same route while here. Adds a local benchmark harness that validates
correctness and measures peak memory / duration before & after.

## Root cause (the original Sentry error)

`loadMonthlyActiveUsers` was written as `SELECT user_id … GROUP BY
user_id` and then counting in Node via a `Set`. On a large project that
ships back millions of user_ids. Two failure modes stacked:

1. **Result materialization** — every distinct user_id had to be
buffered in the server before streaming to Node (~20 MiB of result for
450k users; much more at real scale).
2. **`JSONExtract(toJSONString(data), 'is_anonymous', 'UInt8')`** — the
`toJSONString(data)` per-row re-serialization of the entire nested JSON
column, billions of times, just to pull one boolean. Dominates
bytes-read.

Combined, on a single partition read from S3-backed MergeTree, this can
exceed ClickHouse's 7.2 GiB per-query memory cap. That's exactly what
the Sentry trace showed.

## Changes

### 1. Fix MAU query (`loadMonthlyActiveUsers`)

Moved counting to the server with
`uniqExact(sipHash64(normalized_user_id))` and pulled the JS-side
normalization (`lower`, `trim`, `isUuid`) into SQL. Picked `sipHash64`
after benchmarking 7 variants — it's exact (at <<2³² users) and halves
the uniqExact hash-state vs. raw string keys.

### 2. Fix 1 — `JSONExtract(toJSONString(data), …)` → direct
`CAST(data.is_anonymous, …)`

Applied everywhere the pattern appeared in the metrics route:
- `loadDailyActiveUsers`
- the `analyticsUserJoin` subquery
- the `nonAnonymousAnalyticsUserFilter`
- `analyticsOverview:topRegion`
- `analyticsOverview:online`

Semantics preserved (`coalesce(CAST(data.is_anonymous,
'Nullable(UInt8)'), 0)` matches `JSONExtract(…, 'UInt8')` behavior when
the field is missing).

### 3. Fix 3 — server-aggregate the split queries

`loadDailyActiveUsersSplit` and `loadDailyActiveTeamsSplit` used to ship
1.2M+ `(day, user_id)` rows back to Node just so the JS could bucket
them into new / retained / reactivated. Rewrote both as one CTE-style
query that returns 31 rows (one per day in the 30-day window) with the
counts precomputed.

**Minor semantic shift** (documented inline in `route.tsx`): \"new\" is
now based on the user's first-ever `\$token-refresh` event rather than
their Postgres `signedUpAt`. Agrees for users who log in immediately
after sign-up (the common case). Disagrees for the rare edge case of an
account that existed pre-window but never generated a `\$token-refresh`
until now — old code classified as \"reactivated,\" new code classifies
as \"new.\" Judged acceptable; can be revisited.

Postgres round-trips for `ProjectUser.signedUpAt` / `Team.createdAt` are
no longer needed for the split, and the 76 MiB-ish wire ship is gone.

### 4. Benchmark harness
(`apps/backend/scripts/benchmark-internal-metrics.ts`)

Local-only tool. Three modes:
- **MAU equivalence matrix** — 13 edge cases (empty, dedup, anonymous
filter, window boundary, null user_id, non-UUID user_id, case variation,
project isolation, missing/null `is_anonymous`, wrong event_type).
Asserts OLD pipeline and NEW query return the **same set** of users, not
just the same count.
- **MAU perf** — OLD vs NEW plus 6 other candidate variants (inline
regex, UUID keys, sipHash64, HLL sketches), reads `memory_usage` /
`read_rows` / `result_bytes` from `system.query_log` for each, prints a
ranked table.
- **Full-route benchmark** (`BENCH_ROUTE_QUERIES=1`) — runs every
ClickHouse query in `/internal/metrics` in three stages (BEFORE, AFTER,
candidate OPTIMIZED) against the same seed and prints per-query deltas
plus endpoint-level totals.

Seeds under a synthetic `project_id` so real data is never touched;
cleans up on exit via `ALTER TABLE … DELETE`.

## Benchmark results

### MAU query alone

Ran at two scales; set-equality verified (new query identifies the same
individual users, not just the same count).

| seed | MAU | peak memory (old → new) | bytes read | duration |
|---|---|---|---|---|
| 500k events | 89,939 | 158.7 MiB → 46.7 MiB (**3.4×**, −70%) | 175.7
MiB → 63.0 MiB (2.8×) | 483 ms → 76 ms (**6.4×**) |
| 2.5M events | 449,990 | 439.2 MiB → 281.4 MiB (1.56×, −36%) | 865.0
MiB → 310.9 MiB (2.8×) | 783 ms → 126 ms (**6.2×**) |

MAU variant bake-off at 2.5M events (all exact, all set-equal to OLD):

| variant | memory | duration | notes |
|---|---|---|---|
| v0_old (baseline) | 440 MiB | 567 ms | — |
| v1_uniqExact_string | 284 MiB | 110 ms | naive fix |
| v3_uniqExact_toUUID | 244 MiB | 153 ms | UUID keys, slower per-row |
| **v4_uniqExact_sipHash64** | **125 MiB** | **95 ms** | **shipped** |
| v5_uniq (HLL) ~approx | 30 MiB | 86 ms | −0.25% error |
| v6_uniqCombined ~approx | 31 MiB | 67 ms | −0.15% error |

### Full `/internal/metrics` route (2.7M events, 300k users + page-views
+ clicks + teams)

Ranked by BEFORE peak memory:

| query | mem BEFORE | mem AFTER | Δ mem | dur BEFORE | dur AFTER | Δ
dur |
|---|---|---|---|---|---|---|
| analyticsOverview:topReferrers | 588.1 MiB | 411.1 MiB | 1.43× | 1833
ms | 110 ms | **16.66×** |
| analyticsOverview:totalVisitors | 584.3 MiB | 403.5 MiB | 1.45× | 1829
ms | 121 ms | 15.12× |
| analyticsOverview:dailyEvents | 584.1 MiB | 403.7 MiB | 1.45× | 1897
ms | 140 ms | 13.55× |
| loadUsersByCountry | 393.1 MiB | 385.4 MiB | ≈same | 74 ms | 80 ms |
≈same |
| loadDailyActiveUsersSplit | 363.4 MiB | 396.8 MiB | *+9%* | 1966 ms |
356 ms | 5.52× |
| analyticsOverview:topRegion | 269.9 MiB | 106.4 MiB | 2.54× | 1602 ms
| 65 ms | 24.65× |
| loadDailyActiveUsers | 268.3 MiB | 84.0 MiB | 3.19× | 1111 ms | 44 ms
| 25.25× |
| loadDailyActiveTeamsSplit | 59.6 MiB | 78.1 MiB | *+31%* | 70 ms | 123
ms | *+76%* |
| loadMonthlyActiveUsers | 54.9 MiB | 54.9 MiB | ≈same | 68 ms | 56 ms |
≈same |
| analyticsOverview:online | 18.4 MiB | 5.8 MiB | 3.17× | 58 ms | 4 ms |
14.50× |

**Endpoint-level totals**

| metric | BEFORE | AFTER | Δ |
|---|---|---|---|
| Sum peak ClickHouse memory | 3.11 GiB | 2.28 GiB | **−27%** |
| **Max query duration** (endpoint wall-clock floor) | **1966 ms** |
**356 ms** | **−82%** (5.5×) |
| Sum query duration (total CPU) | 10508 ms | 1099 ms | **−90%** (9.6×)
|
| Bytes read | 10.70 GiB | 4.55 GiB | −57% |
| Bytes shipped to Node | 94.8 MiB | 44.2 KiB | **−99.95%** |

Both split queries show a small memory *regression* at this seed size
(the new server-side window-function + self-join has its own state cost
that's near break-even with \"materialize + ship\" at 300k users); at
prod scale the 76 MiB-ship saving dominates. Duration is unambiguously
better.

## Why we don't need to drop the `analyticsUserJoin` in this PR

The benchmark includes an OPTIMIZED stage that drops the LEFT JOIN and
trusts `e.data.is_anonymous` directly, which would shave another **1.2
GiB / 1.9× duration** off the endpoint. **But we can't ship that here**
— an audit of the client tracker
(`packages/js/src/lib/stack-app/apps/implementations/event-tracker.ts`)
confirmed `is_anonymous` is never set on client-emitted `$page-view` /
`$click` events. The JOIN is currently load-bearing. A follow-up PR will
enrich `is_anonymous` at the batch ingest endpoint using
`auth.user.is_anonymous`; after one metrics-window cycle (~30 days) the
JOIN can be dropped.

## Follow-up work (out of scope for this PR)

- **Batch-endpoint enrichment** + drop the analytics-overview LEFT JOIN
(est. further −53% endpoint memory, −46% duration per the benchmark).
- **Teams-split hash-variant count mismatch** — `sipHash64(team_id)`
variant of the teams split shows a count discrepancy vs. the
string-keyed version in the benchmark. Not blocking since teams-split is
only #8 by memory; needs a root-cause pass before shipping that
particular optimization.
- **`loadUsersByCountry` window bound** — currently scans every
`$token-refresh` event ever for the tenancy (no time filter). Bounding
to 30 days would bound memory growth with project age, but changes
semantics (\"country of latest login ever\" → \"in last 30 days\").
Deferred because it's product-facing.

## Snapshot changes in `internal-metrics.test.ts.snap`

The `should return metrics data with users` test signs in 10 users
today, then deletes one of them mid-test. Two small snapshot values
change on today's date; both are just a reclassification of that single
deleted user — the total (10 active users) is unchanged.

- **`daily_active_users_split.new[today]`: 9 → 10**
All 10 users really did sign in for the first time today. The old code
only counted 9 because the deleted user's Postgres row was gone by the
time the metrics query ran, so the old classifier couldn't see they were
created today. The new query looks at ClickHouse events directly, sees
the deleted user's first event was today, and counts them as new like
everyone else.

- **`daily_active_users_split.reactivated[today]`: 1 → 0**
No user was "reactivated" today — nobody was active on an earlier day
and came back. The old "1" was the deleted user falling into this bucket
by default (the old classifier had no other rule that fit them). The new
code correctly reports zero.

Totals match either way (9 + 1 = 10 + 0). We're moving one deleted user
out of the "returning visitor" bucket and into the "brand-new user"
bucket, which is what they actually were.

## Test plan

- [x] `pnpm typecheck` and `pnpm lint` pass on the backend package
- [x] MAU equivalence matrix: 13/13 cases return the same set of users
(not just the same count) between OLD and NEW pipelines
- [x] Set-equality verified at 500k-MAU perf scale
- [x] Full-route benchmark confirms the expected memory / duration
improvements
- [ ] Sanity-check the dashboard rendering after deploy (split charts,
MAU counter, analytics overview)
- [ ] Monitor Sentry for the assertion error — should drop to zero

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Performance Improvements**
* Monthly and daily active metrics are now computed entirely server-side
for faster queries and reduced client-side processing.

* **Bug Fixes**
* More consistent handling of anonymous/missing IDs and stricter ID
filtering to improve accuracy across edge cases.

* **Tests**
* Added a comprehensive benchmark and validation harness to measure
query performance and verify result equivalence across variants.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2026-04-19 22:57:46 -07:00
Konstantin Wohlwend
f85b4f3997 Make Bulldozer SQL statements deterministic 2026-04-18 16:43:26 -07:00
Aman Ganapathy
665870a144
[Fix] Bulldozer Studio and SpaceTime DB port conflict (#1346) 2026-04-17 17:56:11 -07:00
Aman Ganapathy
1de8a17183
Payments bulldozer txn rework (#1315)
### Object of this PR
This PR is NOT a monolithic series of fixes for the payments suite + a
complete rework. Its aims were
a) introducing and robustly testing the bulldozer db system 
b) reworking the payments underlying architecture to use bulldozer for
correctness and scalability
c) Achieving parity with the old payments system excepting a few changes
like ensuring correctness of the ledger algo
There may still be some work to do with handling refunds, decoupling the
concepts of purchases from that of products, and some other things.

### Ledger Algorithm
This has been tuned and fixed. Item removals i.e negative item quantity
changes will apply to the soonest expiring item grant i.e positive item
quantity change. This is what is best for the user. Item grants can also
expire, and when they expire we obviate whatever is left of their
original capacity (meaning after all the removals that were applied to
it). Our ledger algo is applied via Bulldozer, so automatic
re-computation is handled when a new grant/ removal is inserted in the
middle of the existing ones.

### Things we got rid of 
* No more automatic support for default products. You can use $0 plan
provisions to accomplish the same effect but it's manual
* Negative item quantity changes (i.e item removals) no longer can have
expiries



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Enhanced payment processing pipeline with improved data consistency
and state management.
  * Advanced refund handling with comprehensive transaction tracking.
* Better tracking and management of customer item quantities and owned
products.
* Improved subscription lifecycle management including period-end
handling.

* **Bug Fixes**
  * Fixed payment data integrity verification.
  * Improved handling of edge cases in refund scenarios.

* **Chores**
  * Updated cSpell configuration with additional words.
  * Expanded developer documentation for linting workflows.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Konstantin Wohlwend <n2d4xc@gmail.com>
Co-authored-by: Aadesh Kheria <kheriaaadesh@gmail.com>
Co-authored-by: Mantra <87142457+mantrakp04@users.noreply.github.com>
2026-04-17 22:11:21 +00:00
BilalG1
9e342da0f2
Fix cron jobs using dev env instead of test env in CI workflows (#1319)
The custom-base-port and db-migration-backwards-compatibility workflows
were running cron jobs with `with-env:dev` instead of `with-env:test`,
causing ClickHouse sync mismatches in verify-data-integrity.

<!--

Make sure you've read the CONTRIBUTING.md guidelines:
https://github.com/stack-auth/stack-auth/blob/dev/CONTRIBUTING.md

-->


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Chores**
* Streamlined CI test workflows to standardize background cron job
startup for more consistent test runs.
* **Tests**
* Improved end-to-end test reliability by aligning background process
behavior across suites.
* **Bug Fixes**
* Enhanced data verification reliability by ensuring external database
sync before integrity checks and tightening comparison ordering for
certain records, reducing false mismatch detections.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2026-04-09 21:27:18 -07:00
BilalG1
4f99c469fe
stack auth preview mode (#1307)
<!--

Make sure you've read the CONTRIBUTING.md guidelines:
https://github.com/stack-auth/stack-auth/blob/dev/CONTRIBUTING.md

-->


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Preview mode: sandboxed experience with mock projects, placeholder
data, and disabled external integrations (payments, webhooks, email
rendering, session replays).
* One-click preview project creation and automatic preview sign-in for
quick access.

* **New Features — Walkthrough**
* Interactive guided walkthroughs with spotlight, animated cursor,
step-driven navigation, and targeted element hooks.

* **Style**
* UI/UX adjustments for preview: theme behavior, conditional
banners/alerts, informational alerts, and walkthrough attributes added
across pages.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2026-04-08 16:57:42 -07:00
Madison
63296fd3e0 chore(backend): align OpenAPI output with Mintlify and mirror specs to docs-mintlify.
- Normalize empty route path to / for valid OpenAPI path keys
- Drop invalid OAS3 top-level type on header parameters
- Write client/server/admin/webhooks JSON to docs-mintlify/openapi on codegen
2026-04-08 17:12:27 -05:00
BilalG1
8857dbaa48
clickhouse new syncs and verify-data (#1304)
<!--

Make sure you've read the CONTRIBUTING.md guidelines:
https://github.com/stack-auth/stack-auth/blob/dev/CONTRIBUTING.md

-->


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* External DB sync now covers teams, team members, permissions,
invitations, email outbox, session replays, refresh tokens, and
connected accounts.
* New sequence ID fields and automatic change-flagging added to many
record types to enable incremental sync.

* **Improvements**
* Added concurrent indexes, faster/parallelized sync pipelines,
verification tooling, and richer observability.
* Dashboard sequencer stats expanded and end-to-end sync tests
significantly extended.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2026-04-08 14:43:22 -07:00
Mantra
d22593d535
private files n sm build shit (#1276)
- Introduced a fallback mechanism for the private sign-up risk engine,
allowing for zero-score assessments when the primary engine is
unavailable.
- Updated Next.js configuration to support dynamic resolution of the
private risk engine, including aliasing for both Turbopack and Webpack.
- Added a new fallback implementation in
`private-sign-up-risk-engine-fallback.ts` to ensure consistent behavior
during builds.
- Adjusted `risk-scores.tsx` to utilize the new compiled engine,
improving error handling and logging for risk assessment failures.

This update improves the robustness of the sign-up risk scoring system
and enhances the development experience by streamlining engine
resolution.

<!--

Make sure you've read the CONTRIBUTING.md guidelines:
https://github.com/stack-auth/stack-auth/blob/dev/CONTRIBUTING.md

-->


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Improvements**
* Sign-up risk engine is initialized and validated at startup for more
predictable performance.
* If the risk engine is unavailable or invalid, the system immediately
returns safe zero-risk scores to avoid runtime failures.
* **Tests**
* End-to-end tests updated to match the new engine initialization and
detection behavior.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Konstantin Wohlwend <n2d4xc@gmail.com>
2026-03-23 12:31:36 -07:00
Aman Ganapathy
1d00ed2c64
[Fix]: Investigate Memory Leak on Verify Data Integrity (#1269)
### Context
We encountered an out of memory error when running verify-data-integrity
against the prod database. This was the error:
`FATAL ERROR: Ineffective mark-compacts near heap limit Allocation
failed - JavaScript heap out of memory`. This was one of the things
preventing verify-data-integrity from running successfully in prod.

### Summary of Changes
Local stress testing with constrained heap and memory telemetry revealed
that the rise in used heap memory was directly proportional to the
number of api calls. Investigation revealed that the `currentOutputData`
array was growing with each api call and was kept in memory. Since it
was still being appended to, it was actively kept in the heap. We
refactor the script to no longer use it, and for the two flags
`--save-output` and `--verify-output` that used it before, we refactor
them to not need to. `--save-output` now streams responses to disk as
JSONL and `--verify-output` now compares each response immediately and
discards it.
We also note a potential source of a future memory leak in the
`allUsers` array that is populated in memory for each project. We
refactor to paginate instead. Note that this didn't cause a memory leak
on local, this is a preventive measure.

### Out of Scope
fetching all transactions in the payments section of the script is
another potential cause for concern, but since the payments section of
the script will be refactored soon, we defer that discussion.
2026-03-23 08:55:10 -07:00
Konstantin Wohlwend
10a03a31ad Fix Docker build 2026-03-09 10:49:42 -07:00
Konstantin Wohlwend
00fd0eb4c8 Revert Docker build fix 2026-03-09 10:06:14 -07:00
Konstantin Wohlwend
48ac83e858 Fix Docker script 2026-03-08 14:34:55 -07:00
Konstantin Wohlwend
973e190875 Don't bundle @prisma/client
Some checks failed
all-good: Did all the other checks pass? / all-good (push) Has been cancelled
Ensure Prisma migrations are in sync with the schema / check_prisma_migrations (22.x) (push) Has been cancelled
DB migration compat / Check if migrations changed (push) Has been cancelled
Docker Server Build and Push / Docker Build and Push Server (push) Has been cancelled
Docker Server Build and Run / docker (push) Has been cancelled
Runs E2E API Tests / E2E Tests (Node ${{ matrix.node-version }}, Freestyle ${{ matrix.freestyle-mode }}) (mock, 22.x) (push) Has been cancelled
Runs E2E API Tests / E2E Tests (Node ${{ matrix.node-version }}, Freestyle ${{ matrix.freestyle-mode }}) (prod, 22.x) (push) Has been cancelled
Runs E2E API Tests with custom port prefix / build (22.x) (push) Has been cancelled
Lint & build / lint_and_build (latest) (push) Has been cancelled
Dev Environment Test With Custom Base Port / restart-dev-and-test-with-custom-base-port (push) Has been cancelled
Dev Environment Test / restart-dev-and-test (push) Has been cancelled
Run setup tests with custom base port / setup-tests-with-custom-base-port (push) Has been cancelled
Run setup tests / setup-tests (push) Has been cancelled
TOC Generator / TOC Generator (push) Has been cancelled
DB migration compat / Back-compat — Current branch migrations with ${{ needs.check-migrations-changed.outputs.base_branch }} branch code (push) Has been cancelled
DB migration compat / Forward-compat — Current branch code with ${{ needs.check-migrations-changed.outputs.base_branch }} branch migrations (push) Has been cancelled
DB migration compat / No migration changes (skipped) (push) Has been cancelled
2026-03-02 18:01:21 -08:00
Konstantin Wohlwend
ba51f19d6f Fix lint 2026-02-27 09:59:26 -08:00
Konstantin Wohlwend
37dea79fda Another build issue 2026-02-27 02:04:02 -08:00
Konstantin Wohlwend
74a4f5a601 More build stuff 2026-02-27 01:55:43 -08:00
Konstantin Wohlwend
48f0e998d5 More fix build? 2026-02-27 01:47:01 -08:00
Konstantin Wohlwend
48a8f0b072 Fix build 2026-02-27 00:48:07 -08:00
Konstantin Wohlwend
e0ea6834d0 Upgrade TypeScript 2026-02-27 00:28:35 -08:00
Konstantin Wohlwend
d63db64e19 Migrate from tsup to tsdown 2026-02-26 17:42:09 -08:00
BilalG1
145bcb7e92
Analytics event tracking (#1208)
<!--

Make sure you've read the CONTRIBUTING.md guidelines:
https://github.com/stack-auth/stack-auth/blob/dev/CONTRIBUTING.md

-->


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Browser-side event tracker with batching, navigation & click capture
and background/keepalive delivery
* Server endpoint to accept batched analytics events and associate them
with session replay segments
* Client APIs to send analytics batches and integrate with session
replay

* **Bug Fixes / UX**
* Pausing replay now uses the UI-facing playback time for more accurate
pause positions
* Replay endpoint now returns a clear analytics-disabled error
(ANALYTICS_NOT_ENABLED) when analytics is off

* **Tests**
* End-to-end tests covering batch ingestion, validation, and replay
timing behavior
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2026-02-17 18:33:01 -08:00
BilalG1
fa27c80319
rename tabId to sessionReplaySegmentId (#1206)
<!--

Make sure you've read the CONTRIBUTING.md guidelines:
https://github.com/stack-auth/stack-auth/blob/dev/CONTRIBUTING.md

-->


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **New Features**
* Added new session replay analytics columns to ClickHouse for enhanced
tracking and reporting

* **Refactor**
* Renamed session recording segment identifier across APIs and data
models from `tab_id` to `session_replay_segment_id`
* Updated internal data structures and type definitions to align with
new naming convention

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2026-02-17 11:00:07 -08:00
Konsti Wohlwend
d319285403
Queries view (#1145) 2026-02-16 11:39:21 -08:00
BilalG1
907a98320a
Clickhouse sync fixing (#1198)
<!--

Make sure you've read the CONTRIBUTING.md guidelines:
https://github.com/stack-auth/stack-auth/blob/dev/CONTRIBUTING.md

-->
2026-02-16 11:30:38 -08:00
BilalG1
5b149bebaa
fix clickhouse flaky tests (#1196)
<!--

Make sure you've read the CONTRIBUTING.md guidelines:
https://github.com/stack-auth/stack-auth/blob/dev/CONTRIBUTING.md

-->
2026-02-13 13:05:35 -08:00
BilalG1
d09a180dfe
clickhouse user sync (#1159)
Some checks failed
all-good: Did all the other checks pass? / all-good (push) Has been cancelled
Ensure Prisma migrations are in sync with the schema / check_prisma_migrations (22.x) (push) Has been cancelled
DB migrations are backwards-compatible / Check if migrations changed (push) Has been cancelled
Docker Server Build and Push / Docker Build and Push Server (push) Has been cancelled
Docker Server Build and Run / docker (push) Has been cancelled
Runs E2E API Tests / E2E Tests (Node ${{ matrix.node-version }}, Freestyle ${{ matrix.freestyle-mode }}) (mock, 22.x) (push) Has been cancelled
Runs E2E API Tests / E2E Tests (Node ${{ matrix.node-version }}, Freestyle ${{ matrix.freestyle-mode }}) (prod, 22.x) (push) Has been cancelled
Runs E2E API Tests with custom port prefix / build (22.x) (push) Has been cancelled
Lint & build / lint_and_build (latest) (push) Has been cancelled
Dev Environment Test With Custom Base Port / restart-dev-and-test-with-custom-base-port (push) Has been cancelled
Dev Environment Test / restart-dev-and-test (push) Has been cancelled
Run setup tests with custom base port / setup-tests-with-custom-base-port (push) Has been cancelled
Run setup tests / setup-tests (push) Has been cancelled
TOC Generator / TOC Generator (push) Has been cancelled
DB migrations are backwards-compatible / Test migrations with ${{ needs.check-migrations-changed.outputs.base_branch }} branch code (push) Has been cancelled
DB migrations are backwards-compatible / No migration changes (skipped) (push) Has been cancelled
<!--

Make sure you've read the CONTRIBUTING.md guidelines:
https://github.com/stack-auth/stack-auth/blob/dev/CONTRIBUTING.md

-->


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Real-time AI search with project-scoped analytics and dynamic query
execution; streaming AI responses replace the placeholder flow.
* External DB sync adds ClickHouse support: users sync, sync metadata
tracking, tenancy-aware status, and per-mapping throttling.
* AI assistant UI shows expandable tool-invocation results and streams
via the real AI pipeline.

* **Chores**
* Dashboard dependencies and workspace exclusions updated; development
OpenAI env var added; editor config flag toggled.

* **Tests**
* E2E coverage extended to validate ClickHouse user sync and analytics
queries.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: aadesh18 <110230993+aadesh18@users.noreply.github.com>
Co-authored-by: Konsti Wohlwend <n2d4xc@gmail.com>
2026-02-12 16:52:20 -08:00
aadesh18
2055d98dea
External db sync (#1036)
<img width="1920" height="969" alt="Screenshot 2026-02-04 at 9 47 16 AM"
src="https://github.com/user-attachments/assets/d7d0cd04-0051-4fc4-b857-e6f87ee97a59"
/>

**This PR revolves around the following components**
1. Sequencer - sequences the updates in the internal db
2. Poller - polls for the latest updates to sync with the external db
3. Outgoing Request Handler - essentially a trigger that can make http
requests based on a change in the internal db
4. Sync Engine - syncs with the latest changes from the internal db to
the external db

**What has been done**
- Added a global sequence id for ProjectUser, ContactChannel and
DeletedRow.
- Added the deletedRow table to keep track of the rows that were deleted
across ProjectUser and ContactChannel.
- Added the OutgoingRequest table to keep track of the outgoing requests
- Added function for the sequencer to call to sequence updates
- Added a sequencer that sequences all the changes in the internal db
every 50 ms
- Added a poller that polls for the latest changes in the internal db
every 50 ms, and adds to a queue
- Added a Vercel cron that calls sequencer and poller every minute
- Added a queue that fulfills the outgoing requests by making http calls
(for external db sync, it calls the sync engine endpoint)
- Added a sync engine that uses the defined sql mapping query in the
user's schema to pull in the changes for the user, and sync them with
the external db
- Added tests to test out each functionality


**How to review this PR:**
1. Review the migrations (sequence id, deletedRow, triggers, backlog
sync) (all files created under the migrations folder)
2. Review sequencer
3. Review poller
4. Review the changes in schema
5. Review sync-engine (the function, and it's helper file)
6. Review the schema changes, and query mappings
7. Review the tests (basic, advanced and race, along with the helper
file)
8. Review the changes made in Dockerfile to support local testing using
the postgres docker

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> Introduces a cron-driven external DB sync pipeline with global
sequencing, internal poller and webhook sync engine, new DB
tables/functions, config schema/mappings, and comprehensive e2e tests.
> 
> - **Database (Prisma/Migrations)**:
> - Add global sequence (`global_seq_id`) and
`sequenceId`/`shouldUpdateSequenceId` to `ProjectUser`,
`ContactChannel`, `DeletedRow` with partial indexes.
> - Create `DeletedRow` (capture deletes) and `OutgoingRequest` (queue)
tables; add unique/indexes.
> - Add triggers/functions: `log_deleted_row`,
`reset_sequence_id_on_update`, `backfill_null_sequence_ids`,
`enqueue_tenant_sync`.
> - **Backend/API**:
> - New internal routes: `GET
/api/latest/internal/external-db-sync/sequencer`, `GET /poller`, `POST
/sync-engine` (Upstash-verified) for sync orchestration.
> - Add cron wiring: `vercel.json` schedules and local
`scripts/run-cron-jobs.ts`; start in dev via `dev` script.
> - Tweak route handler (remove noisy logging) without behavior change.
> - **Sync Engine**:
> - Implement `src/lib/external-db-sync.ts` to read tenant mappings and
upsert to external Postgres (schema bootstrap, param checks,
sequencing).
> - Add default mappings `DEFAULT_DB_SYNC_MAPPINGS` and config schema
`dbSync.externalDatabases` in shared config.
> - **Testing/Infra**:
> - Add extensive e2e tests (basics, advanced, race conditions) for
sequencing, idempotency, deletes, pagination, multi-mapping, and
permissions.
> - Docker compose: add `external-db-test` Postgres for tests; e2e deps
for `pg` types.
> 
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
3f2a8efcfb. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* External PostgreSQL sync: automatic, batched replication with
mappings, resume/idempotency, and on-demand enqueueing.

* **Admin UI**
* Real-time External DB Sync dashboard and status API showing
per-mapping backlog, sequencer/poller/sync-engine telemetry, and fusebox
controls.

* **Tests**
* Large e2e suite: basic, advanced, race, high-volume tests and test
utilities for external DB sync.

* **Chores**
* DB migrations, CI/workflow updates, background cron runner and
local/dev test support.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Konsti Wohlwend <n2d4xc@gmail.com>
Co-authored-by: Bilal Godil <bg2002@gmail.com>
2026-02-05 12:04:31 -08:00
Konstantin Wohlwend
097c0310c4 Check all users when verifying data integrity 2026-02-03 10:00:30 -08:00
Konstantin Wohlwend
4c22b37fdf --no-bail for verify-data-integrity script 2026-01-28 13:53:28 -08:00
Konstantin Wohlwend
8fd5b13a3b TokenRefreshEventType 2026-01-28 11:18:15 -08:00
BilalG1
484c3a6332
clickhouse setup (#1032) 2026-01-28 09:12:33 -08:00
BilalG1
e439bd0b7e
verify payment transactions integrity (#1128)
<!--

Make sure you've read the CONTRIBUTING.md guidelines:
https://github.com/stack-auth/stack-auth/blob/dev/CONTRIBUTING.md

-->


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Added a comprehensive payments data-integrity verifier, Stripe payout
reconciliation, API validation helpers, and a throttled progress utility
for long-running checks.

* **Bug Fixes**
* Improved subscription/product filtering to correctly respect customer
type during verification.

* **Chores**
* Reorganized verification scripts and updated the verification
entrypoint invocation.

* **Tests**
* Enhanced test fixtures to include full product data for subscriptions.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2026-01-27 21:17:43 +00:00
Konstantin Wohlwend
e574f526fa Import fixes 2026-01-23 11:52:54 -08:00
Konstantin Wohlwend
0aeb120aa8 Make DB migration script interactive 2026-01-23 11:52:25 -08:00
Konsti Wohlwend
8f74949a7f
Speed up tests (#1063) 2025-12-28 11:25:04 -08:00
Konsti Wohlwend
b4ae80874e
Upgrade Prisma to v7 (#1064) 2025-12-26 08:13:34 -08:00
Konstantin Wohlwend
7bd91dcf93 fixes? 2025-12-12 17:29:57 -08:00
Konsti Wohlwend
e7e792d462
Email outbox backend (#1030) 2025-12-12 10:26:38 -08:00
BilalG1
b5b311554b
Metrics Endpoint Speed (#966)
<img width="567" height="249" alt="Screenshot 2025-10-20 at 11 23 10 AM"
src="https://github.com/user-attachments/assets/340df844-f619-489f-8d41-cc26bc165018"
/>
<img width="595" height="255" alt="Screenshot 2025-10-20 at 11 24 00 AM"
src="https://github.com/user-attachments/assets/9321bda1-e6f0-4f53-8c6b-e29d0fc16038"
/>

<!--

Make sure you've read the CONTRIBUTING.md guidelines:
https://github.com/stack-auth/stack-auth/blob/dev/CONTRIBUTING.md

-->

<!-- RECURSEML_SUMMARY:START -->
## High-level PR Summary
This PR optimizes the performance of user list and metrics endpoints by
refactoring SQL queries to use more efficient patterns. The changes
include rewriting queries to use `LATERAL` joins and CTEs with proper
filtering, extracting common user mapping logic into reusable functions,
and adding performance tests with SQL scripts to generate realistic test
data (10,000 mock users and activity events across 100 countries).

⏱️ Estimated Review Time: 30-90 minutes

<details>
<summary>💡 Review Order Suggestion</summary>

| Order | File Path |
|-------|-----------|
| 1 | `apps/e2e/tests/backend/performance/mock-users.sql` |
| 2 | `apps/e2e/tests/backend/performance/mock-metric-events.sql` |
| 3 | `apps/e2e/tests/backend/performance/users-list.test.ts` |
| 4 | `apps/backend/src/app/api/latest/users/crud.tsx` |
| 5 | `apps/backend/src/app/api/latest/internal/metrics/route.tsx` |
</details>



[![Need help? Join our
Discord](https://img.shields.io/badge/Need%20help%3F%20Join%20our%20Discord-5865F2?style=plastic&logo=discord&logoColor=white)](https://discord.gg/n3SsVDAW6U)


[![Analyze latest
changes](f22b2c44a1/?repo_owner=stack-auth&repo_name=stack-auth&pr_number=966)
<!-- RECURSEML_SUMMARY:END -->
<!-- ELLIPSIS_HIDDEN -->


----

> [!IMPORTANT]
> Optimize metrics and user list endpoints with SQL refactoring,
caching, and performance tests, adding a `CacheEntry` model and mock
data scripts.
> 
>   - **Performance Optimization**:
> - Refactor SQL queries in `route.tsx` to use `LATERAL` joins and CTEs
for efficient data retrieval.
> - Implement caching in `route.tsx` using `getOrSetCacheValue()` to
reduce database load.
>   - **Database Changes**:
> - Add `CacheEntry` model to `schema.prisma` and create corresponding
table and index in `migration.sql`.
> - Remove auto-migration metadata step from
`check-prisma-migrations.yaml`.
>   - **Testing**:
> - Add performance tests in `metrics.test.ts` to benchmark metrics and
user endpoints.
> - Create mock data scripts `mock-users.sql` and
`mock-metric-events.sql` for testing with 10,000 users and events across
100 countries.
>   - **Miscellaneous**:
> - Update `db-migrations.ts` to include new migration file generation
logic.
>     - Add `cache.tsx` for caching logic implementation.
> 
> <sup>This description was created by </sup>[<img alt="Ellipsis"
src="https://img.shields.io/badge/Ellipsis-blue?color=175173">](https://www.ellipsis.dev?ref=stack-auth%2Fstack-auth&utm_source=github&utm_medium=referral)<sup>
for 4d9be71063. You can
[customize](https://app.ellipsis.dev/stack-auth/settings/summaries) this
summary. It will automatically update as commits are pushed.</sup>

----


<!-- ELLIPSIS_HIDDEN -->

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Metrics now use a cache layer with per-entry TTL and tenancy-aware
loaders.

* **Bug Fixes**
* Improved accuracy of daily active and related metrics with
tenancy-aware counting and more robust last-active computation.

* **Performance**
* Faster metrics responses via batched reads and cache-backed endpoints.

* **Tests**
* Added end-to-end performance benchmarks and SQL seed scripts for
metrics/user load testing.

* **Chores**
* DB migration added support for cached entries; CI migration check flow
adjusted; migration tooling improved.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Konsti Wohlwend <n2d4xc@gmail.com>
2025-11-05 16:24:04 -08:00
Konstantin Wohlwend
cd02113441 several changes 2025-10-14 12:23:22 -07:00
Konstantin Wohlwend
1ed9c6150f Use custom migration script for self-hosting container 2025-10-14 11:29:41 -07:00
Konsti Wohlwend
8a77e07f19
Rename offer to product, offer group to product catalog (#914)
Some checks failed
all-good: Did all the other checks pass? / all-good (push) Has been cancelled
Ensure Prisma migrations are in sync with the schema / check_prisma_migrations (22.x) (push) Has been cancelled
Docker Emulator Test / docker (push) Has been cancelled
Docker Server Build and Push / Docker Build and Push Server (push) Has been cancelled
Docker Server Test / docker (push) Has been cancelled
Runs E2E API Tests / build (22.x) (push) Has been cancelled
Runs E2E API Tests with external source of truth / build (22.x) (push) Has been cancelled
Lint & build / lint_and_build (latest) (push) Has been cancelled
Dev Environment Test / restart-dev-and-test (push) Has been cancelled
Run setup tests / setup-tests (push) Has been cancelled
TOC Generator / TOC Generator (push) Has been cancelled
<!--

Make sure you've read the CONTRIBUTING.md guidelines:
https://github.com/stack-auth/stack-auth/blob/dev/CONTRIBUTING.md

-->

<!-- RECURSEML_SUMMARY:START -->
## High-level PR Summary
This PR implements a comprehensive renaming of "offer" to "product" and
"offer group" to "product catalog" throughout the codebase. The changes
include database migrations, schema updates, API compatibility layers,
function renames, and updates to client and server implementations.
Backwards compatibility is maintained through migration layers that
handle requests using the old terminology, translating them to the new
terminology before processing. The PR includes documentation of this
approach in CLAUDE-KNOWLEDGE.md. This rename affects multiple parts of
the system including the database schema, API endpoints, error types,
and SDK interfaces.

⏱️ Estimated Review Time: 1-3 hours

<details>
<summary>💡 Review Order Suggestion</summary>

| Order | File Path |
|-------|-----------|
| 1 |
`apps/backend/prisma/migrations/20250923191615_rename_offers_to_products/migration.sql`
|
| 2 |
`apps/backend/src/app/api/migrations/v2beta1/payments/purchases/offers-compat.ts`
|
| 3 |
`apps/backend/src/app/api/migrations/v2beta1/payments/purchases/create-purchase-url/route.ts`
|
| 4 |
`apps/backend/src/app/api/migrations/v2beta1/payments/purchases/validate-code/route.ts`
|
| 5 | `apps/backend/src/lib/payments.tsx` |
| 6 | `.claude/CLAUDE-KNOWLEDGE.md` |
| 7 | `packages/stack-shared/src/schema-fields.ts` |
| 8 | `packages/stack-shared/src/known-errors.tsx` |
| 9 | `packages/stack-shared/src/config/schema.ts` |
| 10 | `packages/template/src/lib/stack-app/customers/index.ts` |
| 11 |
`packages/template/src/lib/stack-app/apps/implementations/client-app-impl.ts`
|
| 12 |
`packages/template/src/lib/stack-app/apps/implementations/server-app-impl.ts`
|
</details>



[![Need help? Join our
Discord](https://img.shields.io/badge/Need%20help%3F%20Join%20our%20Discord-5865F2?style=plastic&logo=discord&logoColor=white)](https://discord.gg/n3SsVDAW6U)

<!-- RECURSEML_SUMMARY:END -->
<!-- ELLIPSIS_HIDDEN -->


----

> [!IMPORTANT]
> Renames 'offer' to 'product' and 'offer group' to 'product catalog'
across the codebase, updating database schema, API endpoints, and
application logic for consistency and backward compatibility.
> 
>   - **Database**:
> - Rename columns `offer` to `product` and `offerId` to `productId` in
`OneTimePurchase` and `Subscription` tables in `migration.sql`.
>   - **API & Migrations**:
> - Update API endpoints to accept `product_id`/`product_inline` instead
of `offer_id`/`offer_inline`.
> - Add `v2beta5` compatibility layer to map legacy `offer` fields to
`product` equivalents.
>   - **Shared Schemas**:
> - Rename `offerSchema` to `productSchema` and related schemas in
`schema-fields.ts`.
>   - **Server Implementation**:
> - Update `createCheckoutUrl` method in `server-app-impl.ts` to use
`productId`/`InlineProduct`.
>   - **Tests**:
> - Update tests to reflect renaming in `backend-helpers.ts` and other
test files.
>   - **Miscellaneous**:
>     - Remove dummy data related to offers in `dummy-data.tsx`.
> - Update documentation and comments to reflect terminology changes.
> 
> <sup>This description was created by </sup>[<img alt="Ellipsis"
src="https://img.shields.io/badge/Ellipsis-blue?color=175173">](https://www.ellipsis.dev?ref=stack-auth%2Fstack-auth&utm_source=github&utm_medium=referral)<sup>
for e3227bcbd2. You can
[customize](https://app.ellipsis.dev/stack-auth/settings/summaries) this
summary. It will automatically update as commits are pushed.</sup>

----


<!-- ELLIPSIS_HIDDEN -->

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Backwards-compatibility: legacy offer_id/offer_inline requests are
accepted, normalized, and routed to product-based handlers.

* **Refactor**
* Global rename from Offer/Group → Product/Catalog across UI, APIs,
types, client/server interfaces, and error codes.

* **Bug Fixes**
* Responses, webhooks and UI consistently surface product_display_name
and product-related metadata.

* **Documentation**
* Migration notes and docs updated to explain compatibility and
parameter changes.

* **Tests**
  * Unit and E2E suites updated to cover product/catalog flows.

* **Chores**
  * Database schema migration, seed and config updates applied.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> Renames offers→products and groups→catalogs end-to-end (DB, APIs,
schemas, UI, SDK, docs), adding v2beta5 compatibility to accept legacy
offer fields while updating all internals.
> 
> - **Backend/DB**:
> - Prisma migration: rename `offer`/`offerId`→`product`/`productId` in
`OneTimePurchase` and `Subscription`.
> - Update Stripe webhook, purchase-session, and internal test-mode
flows to use `product*` metadata/fields.
> - **API & Migrations**:
>   - Latest endpoints now accept `product_id`/`product_inline`.
> - Add `v2beta5` compat layer mapping legacy `offer_id`/`offer_inline`
to product equivalents; responses alias conflicting products.
> - **Shared Schemas/Errors/Config**:
> - `offerSchema`→`productSchema`,
`inlineOfferSchema`→`inlineProductSchema`, prices/types renamed.
>   - KnownErrors renamed (e.g., `PRODUCT_DOES_NOT_EXIST`).
> - Config: `groups`→`catalogs`, defaults/migrations updated; improved
override validation messages; ID regex loosened; formatter tweaks; add
schema fuzzer tests.
> - **Payments Lib**:
> - Rename APIs and logic (`offers`→`products`, `groupId`→`catalogId`),
subscription and item-quantity computation updated.
> - **Dashboard/UI**:
> - Routes, dialogs, editors, tables, and code samples switched to
products/catalogs; removed offers dummy data.
> - **SDK/Template**:
> - Client/server `createCheckoutUrl` now uses
`productId`/`InlineProduct`.
> - **Tests/Docs/Utilities**:
>   - E2E and unit tests updated; add legacy (pre-rename) tests.
> - Docs and knowledge base revised; minor script tweaks (recent-first,
limits).
> 
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
e6e20ecd72. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

---------

Co-authored-by: BilalG1 <bg2002@gmail.com>
2025-10-04 02:28:28 -07:00