mirror of
https://github.com/stack-auth/stack.git
synced 2026-06-04 21:04:37 +08:00
a9623d976a
1207 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
a9623d976a
|
[Refactor] [Fix] Remove default prod creation (#1350)
With the new bulldozer rework we dont support default products anymore. Users are encouraged to currently manually handle granting products to their end users. We block api requests and new product creations that attempt to set no price, and we remove any options to set include-by-default. We also migrate users' existing product snapshots in `Subscriptions`, `OneTimePurchases`, and `ProductVersions` to have no price set if it's an include-by-default product. This will make it so that next time a user goes onto their products page, they will be informed that the pricing is invalid and it is no longer delivered by default. Note, however, that these products will still be providing items and the like to the users who have them. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Bug Fixes** * Migrated legacy product snapshots so missing included-items no longer break readers. * Removed deprecated "include-by-default" pricing sentinel; pricing now requires explicit price entries and write validation rejects the old sentinel. * **Chores** * Simplified dashboard pricing flows: create/edit/save now use explicit prices and surface an alert when a formerly implicit free plan needs an explicit $0 price. * Config overrides and stored data are auto-normalized to explicit price objects. * **Tests** * Updated and added tests covering migration, validation, and switching behavior for explicit prices. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: mantrakp04 <mantrakp@gmail.com> Co-authored-by: Mantra <87142457+mantrakp04@users.noreply.github.com> |
||
|
|
15faf709f3
|
stack-cli: explicit --cloud-project-id / --config-file across exec, config, project (#1422)
## Summary Reworks the `stack` CLI surface so the cloud-vs-local choice is **explicit at every invocation**, removing the global `--project-id` / `STACK_PROJECT_ID` env var and the local-default `exec` behavior introduced earlier in this branch. ### `stack exec` - Removes `--cloud`, `STACK_EXEC_DEFAULT_TARGET`, and the implicit local default. The CLI now requires **exactly one** of: - `--cloud-project-id <id>` — run against the Stack Auth cloud API - `--config-file <path>` — run against the local emulator project mapped to that absolute config-file path - The `--config-file` branch resolves the project id by calling the existing `GET /api/latest/internal/local-emulator/project` endpoint and matching `absolute_file_path` client-side. No new backend endpoint introduced. ### `stack config pull` / `stack config push` - Both now take `--cloud-project-id <id>` per-command instead of the global flag / `STACK_PROJECT_ID` env. - `config pull --config-file` is **optional**: when omitted, the CLI uses `./stack.config.ts` from the current directory. If neither flag nor cwd file is present, it exits with a clear hint to pass `--config-file` or `cd` into a directory containing `stack.config.ts`. ### `stack project list` - Default (no flags) lists both **cloud and local emulator** projects. Each entry carries a `target: "cloud" | "dev"` field (text format: `<id>\t<displayName>\t[<target>]`). - `--cloud` / `--dev` filter to a single source (mutually exclusive — passing both errors). - On the default code path, an unreachable local emulator emits a single stderr warning (`warning: skipping dev projects — local emulator not reachable …`) and the command still succeeds with cloud results. With `--dev` explicit, the unreachable case hard-errors. ### `stack project create` - Now requires `--cloud` to make the cloud-vs-local choice explicit. There is no local alternative today; the flag exists to surface the decision so a future local-project create doesn't silently change behavior. ### Backend - Bumps the `LIMIT` on `GET /api/latest/internal/local-emulator/project` from 20 → 100 so `project list --dev` doesn't silently truncate. ### Refactors (from earlier in this branch, unchanged here) - Local-emulator paths/ports/PCK polling live in `packages/stack-cli/src/lib/emulator-paths.ts`. - Shared local-emulator admin credentials live in `packages/stack-shared/src/local-emulator.ts`. - `resolveAuth` / `resolveLocalEmulatorAuth` take an explicit `projectId: string` (no more `Flags` parameter). - New `packages/stack-cli/src/lib/local-emulator-client.ts` encapsulates the GET-and-match flow used by both `exec --config-file` and `project list --dev`. ## Breaking changes **Scripts that relied on any of the following must be updated:** | Removed | Replacement | | --- | --- | | Global `--project-id <id>` flag | Per-command `--cloud-project-id <id>` | | `STACK_PROJECT_ID` env var | Per-command `--cloud-project-id <id>` | | `stack exec --cloud` | `stack exec --cloud-project-id <id>` | | `STACK_EXEC_DEFAULT_TARGET=cloud\|local` | `--cloud-project-id <id>` or `--config-file <path>` | | `stack exec` defaulting to local emulator | Explicit `--config-file <path>` required | | `stack project create` without a flag | `stack project create --cloud …` required | ## Test plan - [x] `pnpm lint` (stack-cli, backend, e2e) — clean - [x] `pnpm --filter @stackframe/stack-cli typecheck` — clean - [x] `pnpm --filter @stackframe/stack-cli exec vitest run` — **72/72 passing** (new unit tests: `parseExecTarget`, `resolveConfigFilePathForPull`, `resolveProjectListSources`, `formatProjectList`) - [x] `pnpm test run apps/e2e/tests/general/cli.test.ts` — **73 passing, 4 skipped, 0 failing**. New e2e cases cover: - `exec` with neither flag → errors with "Specify a target" - `exec` with both flags → errors with "not both" - `exec --config-file` with missing file / missing PCK / unreachable API - `exec --config-file` happy path against a real local-emulator backend (gated on `NEXT_PUBLIC_STACK_IS_LOCAL_EMULATOR=true`) - `config pull` cwd fallback to `./stack.config.ts` - `config pull` with no `--config-file` and no cwd `stack.config.ts` → errors with `Pass --config-file …` - `project list --cloud --dev` together → errors - `project list` default with unreachable emulator → cloud results + single stderr warning - `project create` without `--cloud` → errors - All previously-`--cloud` exec cases ported to `--cloud-project-id` - [x] Manual smoke: `stack exec --help`, `stack project list --cloud --dev`, `stack project create` all emit the expected friendly errors / help text. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Release Notes * **New Features** * CLI `exec`, `config`, and `project` commands now require explicit targeting via `--cloud-project-id` (cloud) or `--config-file` (local emulator). * `project list` now supports `--cloud` and `--dev` flags to display projects from both sources with target indicators. * Enhanced environment variable validation for emulator service ports with proper fallback handling. * **Bug Fixes** * `project list` now gracefully handles unreachable emulator with warning fallback instead of failure. * **Tests** * Expanded test coverage for project targeting, config file resolution, and emulator connectivity scenarios. <!-- end of auto-generated comment: release notes by coderabbit.ai --> |
||
|
|
024da3cacb
|
[Fix] freestyle-mock honors $PORT, drop server.listen string-patch (#1432)
## Summary The multi-worker freestyle-mock rewrite ([#1430](https://github.com/hexclave/stack-auth/pull/1430)) hardcoded `server.listen(8080)`, which collides with qstash inside the local-emulator container. Supervisord sets `PORT=8180` for freestyle-mock specifically to avoid this clash, but the new source ignores `process.env.PORT`. The local-emulator Dockerfile previously bridged this with a `server.replace('server.listen(8080)', ...)` string-patch on the embedded source. The new code is `server.listen(8080, () => { ... })` — the literal `'server.listen(8080)'` substring no longer matches, so the replace silently no-ops and freestyle-mock binds 8080. qstash then can't start (`address already in use: 127.0.0.1:8080` → FATAL), the backend (which depends on qstash) never comes up, and the emulator smoke test times out. Observed in [this run](https://github.com/hexclave/stack-auth/actions/runs/25832479377): ``` smoke-test: FTL address already in use: 127.0.0.1:8080 smoke-test: WARN exited: qstash (exit status 1; not expected) smoke-test: INFO gave up: qstash entered FATAL state, too many start retries too quickly [603s] SMOKE TEST FAILED: backend /health?db=1 did not return 200 within 300s ``` ## Changes - `docker/dependencies/freestyle-mock/Dockerfile`: `server.listen(PORT)` where `PORT = process.env.PORT || 8080`, plus the startup log reflects the actual port. - `docker/local-emulator/Dockerfile`: drop the now-redundant string-replace for the listen call. The two remaining replaces (`fs/promises` import + node_modules symlink) are unrelated and kept. ## Test plan - [ ] QEMU emulator build workflow passes on this branch (smoke test reaches healthy backend). - [ ] Verify locally that supervisord's `PORT=8180` is honored by freestyle-mock and qstash binds 8080 cleanly. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Chores** * Server listening port is now configurable via PORT (default 8080). * Local emulator startup adjusted to better handle dependencies and create a node_modules symlink for smoother local runs. * Seed/process transaction timeout increased to 90s for reliability. * Local database statement timeout changed to 0 (no statement timeout). * **CI** * Added step to enable and validate KVM access during emulator builds. <!-- review_stack_entry_start --> [](https://app.coderabbit.ai/change-stack/hexclave/stack-auth/pull/1432) <!-- review_stack_entry_end --> <!-- end of auto-generated comment: release notes by coderabbit.ai --> |
||
|
|
988505249b
|
[Revert] team invitation accept email-match check (#1431)
## Summary Reverts the team-invitation accept email-match check added in #1365 in response to user friction. The check required the signed-in user to own the invited email as a *verified* contact channel before accepting, which rejected legitimate flows where the recipient hadn't verified the invited email on their account. - Drops the pre-claim `validate` hook in `accept/verification-code-handler.tsx` that compared the accepting user's verified channels to the invited email. - Drops the `normalizeEmail(body.email)` in `send-code/route.tsx` (only existed to make the now-removed compare case-insensitive). - Removes the four e2e tests that asserted the check (mismatch, does-not-burn, case-insensitive, happy-path). - Reverts `items.test.ts` invitee sign-up back to bare `Auth.fastSignUp()`. ## What's preserved - **`TeamInvitationEmailMismatch`** in `packages/stack-shared/src/known-errors.tsx` and its plumbing in `client-interface.ts` / `client-app-impl.ts` / `client-app.ts` — intentionally kept so the check can be reinstated in a focused follow-up without re-plumbing the SDK return types. - **The TOCTOU fix** from the same PR (atomic `updateMany` claim in `route-handlers/verification-code-handler.tsx` and its 5-parallel-redemption test) is unrelated and untouched. ## Test plan - [x] `pnpm lint` — clean (28/28) - [x] `pnpm --filter @stackframe/backend --filter @stackframe/e2e-tests typecheck` — clean - [ ] Pre-existing dashboard typecheck failure on `transaction-table.tsx:347` (`refundEntries`) reproduces on `origin/dev` — not caused by this PR - [ ] e2e team-invitations + items + otp sign-in suites <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Bug Fixes** * Simplified team invitation acceptance process by removing strict email matching requirements, allowing users to accept invitations more flexibly. * **Tests** * Updated team invitation tests to reflect simplified acceptance flow. <!-- review_stack_entry_start --> [](https://app.coderabbit.ai/change-stack/hexclave/stack-auth/pull/1431) <!-- review_stack_entry_end --> <!-- end of auto-generated comment: release notes by coderabbit.ai --> |
||
|
|
2cf0f6f981
|
[Apps] Adding support app alpha and dogfooding (#1368)
<!-- Make sure you've read the CONTRIBUTING.md guidelines: https://github.com/stack-auth/stack-auth/blob/dev/CONTRIBUTING.md --> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Support app: inbox UI to create, view, reply, and manage conversations (status, priority, assignee, tags, internal notes). * Dashboard pages: Conversations and Support Settings; feedback can create managed conversations. * Public/internal APIs for listing, creating, updating, and fetching conversation details; client-side helpers. * **SLA** * Configurable first/next response targets, urgency classification, and timing logic. * **Data** * New conversation persistence (conversations, entry points, messages) and migration tests; preserves conversations on user/team deletion and anonymizes sender data. * **Tests** * Unit, migration, and end-to-end tests added. * **Documentation** * Updated docs describing conversation model and workflow rules. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
3385d6e2b0
|
[Fix] recover stale external db requests (#1428)
Some checks failed
all-good: Did all the other checks pass? / all-good (push) Has been cancelled
Ensure Prisma migrations are in sync with the schema / check_prisma_migrations (22.x) (push) Has been cancelled
DB migration compat / Check if migrations changed (push) Has been cancelled
Docker Server Build and Push / Docker Build and Push Server (push) Has been cancelled
Docker Server Build and Run / docker (push) Has been cancelled
Runs E2E API Tests (Local Emulator) / E2E Tests (Local Emulator, Node ${{ matrix.node-version }}) (22.x) (push) Has been cancelled
Runs E2E API Tests / E2E Tests (Node ${{ matrix.node-version }}, Freestyle ${{ matrix.freestyle-mode }}) (mock, 22.x) (push) Has been cancelled
Runs E2E API Tests / E2E Tests (Node ${{ matrix.node-version }}, Freestyle ${{ matrix.freestyle-mode }}) (prod, 22.x) (push) Has been cancelled
Runs E2E API Tests with custom port prefix / build (22.x) (push) Has been cancelled
Runs E2E Fallback Tests / E2E Fallback Tests (Node ${{ matrix.node-version }}) (22.x) (push) Has been cancelled
Lint & build / lint_and_build (24) (push) Has been cancelled
TOC Generator / TOC Generator (push) Has been cancelled
Mirror main branch to main-mirror-for-wdb / lint_and_build (push) Has been cancelled
Publish npm packages / publish (push) Has been cancelled
Publish Swift SDK to prerelease repo / publish (push) Has been cancelled
Sync Main to Dev / sync-commits (push) Has been cancelled
DB migration compat / Back-compat — Current branch migrations with ${{ needs.check-migrations-changed.outputs.base_branch }} branch code (push) Has been cancelled
DB migration compat / Forward-compat — Current branch code with ${{ needs.check-migrations-changed.outputs.base_branch }} branch migrations (push) Has been cancelled
DB migration compat / No migration changes (skipped) (push) Has been cancelled
Failures between claiming and the deletion of outgoing requests from the handler can leave requests stale and never clean them up. Some of these requests may also have duplicates that are fresh in the outgoing queue. These requests need to be deleted or retried. It's important to still log the stale requests to sentry so the root cause can be investigated. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Bug Fixes** * Improved detection and recovery of stale outgoing requests; telemetry now records precise reset/deleted counts and includes sampled affected IDs. * Added an early fast path to skip unnecessary external calls when there are no pending requests. * **Refactor** * Consolidated stale-request handling into a dedicated helper and optimized recovery logic; poller telemetry now includes claim-limit attributes. [](https://app.coderabbit.ai/change-stack/hexclave/stack-auth/pull/1428) <!-- end of auto-generated comment: release notes by coderabbit.ai --> |
||
|
|
4648fc1899
|
[Feat] new scripts on migrate/seed/init run for internal (#1421)
### Context One script grants free plan to any team which is a customer of the internal project who doesnt have it already. We also want to migrate our users (internal) to the latest version of their products. Needed because some subs on dev right now dont have a plan. And internal isnt using latest version of its own growth plan. ### Describing the Paths we want to Account for 1. Users on production who currently don't have a plan should get free plans, since this script is run with every migrate 2. Users on production should get the latest version of each plan of ours. So a forced migration to latest version of internal project plans 3. No other project's products/product lines should be affected. They will continue to have product versioning 4. 2 should apply to test mode subscriptions as well, on top of stripe subscriptions. All of them should be refreshed 5. Internal project itself should get latest version of its own growth plan 6. If the bulldozer write fails, we should be able to recover on next migration (this should already be handled by init bulldozer script, because it checks if prisma db and bulldozer db are out of sync) 7. if the regenerate or backfill fail, we should be able to recover just by rerunning the script 8. Product version table should not balloon. No table should really balloon ### What I've tested on local 1. Put in 1000 db subscription rows, made them all stale and then ran the regen script. It took about 6 minutes to update all of them, and it was idempotent so rerunning it again did nothing. 2. With proper stripe keys I switched off of test mode on the internal app, granted a product to a new team and updated the product's item list. At this point I checked and the new team had the outdated version of the product. Then I ran the regen script and the new team was moved to latest product version. 3. Tried the above with the internal team's growth plan too and it worked as well. 4. Backfill actually grants free plan ### Deployment strategy in prod Run the backfill and the regen scripts once each after your migrations on the prod db. `pnpm db:backfill-internal-free-plans` will make sure every team has a free plan at least if they dont have an existing plan (and it is idempotent). After that, run `pnpm db:regen-internal-subscriptions-to-latest` which will migrate every user to the latest version of their plan (i.e latest snapshot). This should also be idempotent. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Automated backfill to grant internal free plans to qualifying billing teams. * Regeneration tool to refresh internal subscription snapshots to the latest product versions. * **Chores** * Added CLI commands and package scripts to run backfill and regen jobs. * Database init now runs payment initialization before backfill/regen. * **Tests** * Integration and unit tests added/updated to validate backfill, regeneration, and free-plan idempotency. [](https://app.coderabbit.ai/change-stack/hexclave/stack-auth/pull/1421) <!-- end of auto-generated comment: release notes by coderabbit.ai --> |
||
|
|
d2030e826b | Unhandled promise rejections no longer kill the whole server if not in development | ||
|
|
efa2153d47 | Improve project overview weekly users | ||
|
|
e61c70d3c1 | Update implementation | ||
|
|
e50358710a
|
fix(tests): use sql.json in onboarding migration test and refresh metrics snapshot (#1420)
## Summary
Two small test-maintenance fixes that came up while running the suite:
- **Onboarding migration test**
(`apps/backend/prisma/migrations/20260420000000_add_project_onboarding_state/tests/default-and-updates.ts`):
switch the JSON insert from `\${JSON.stringify(onboardingState)}::jsonb`
to `\${sql.json(onboardingState)}`. This matches the pattern used by
every other migration test in the repo (see
`20260214000000_fix_trusted_domains_config/tests/*`) and lets the
`postgres` driver handle serialization and parameter binding
consistently rather than relying on a manual `::jsonb` cast.
- **Internal metrics snapshot**
(`apps/e2e/tests/backend/endpoints/api/v1/__snapshots__/internal-metrics.test.ts.snap`):
update `active_users_by_country.AQ` to list `mailbox-2` before
`mailbox-1`. The `should return metrics data with users` test signs in
`mailbox-1` (mailboxes[0]) into AQ first, then later signs `mailbox-2`
(mailboxes[1]) into AQ, so sorted by `last_active_at_millis desc`
`mailbox-2` should come first. The snapshot now matches that ordering.
No production code is touched — both changes are limited to test
fixtures.
## Test plan
- [ ] `pnpm -C apps/backend test run` (migration tests)
- [ ] `pnpm -C apps/e2e test run internal-metrics` (snapshot test)
- [ ] `pnpm lint`
- [ ] `pnpm typecheck`
Made with [Cursor](https://cursor.com)
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Tests**
* No user-facing behavior changed; test flows made more robust and less
flaky (migration validation, metrics ingestion polling, CLI expiry
checks, failed-emails digest expectations).
* **API / Documentation**
* CLI auth default expiration reduced from 2 hours to 2 minutes (updated
OpenAPI defaults and related test expectations).
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Co-authored-by: Cursor <cursoragent@cursor.com>
|
||
|
|
80a26ca15d | chore: update package versions | ||
|
|
227dac6567
|
feat(dashboard): add weekly users metrics for projects (#1412)
- Introduced a new API endpoint to fetch weekly and daily user metrics for managed projects. - Updated the dashboard to utilize this new endpoint, replacing the previous daily active users data. - Created a new component to visualize weekly users metrics in the project cards. - Refactored existing components to accommodate the new data structure and ensure proper rendering of user activity charts. This change enhances the analytics capabilities of the dashboard, providing better insights into user engagement over time. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * New internal endpoint providing per-project weekly user totals and 7-day daily activity series. * **Updates** * Dashboard and project cards switched from DAU to weekly user metrics; main metric shows weekly users and label reads "users/wk". * Charts now display weekly-user-aware sparklines alongside daily activity. * **Tests** * Added unit tests covering weekly aggregation and daily-series merging. <!-- end of auto-generated comment: release notes by coderabbit.ai --> |
||
|
|
261d8923d4
|
stack-cli: support self-hosted URLs and tighten CLI auth polling (#1419)
## Summary - **Self-hosted CLI**: read `STACK_API_URL` / `STACK_DASHBOARD_URL` from env in `stack-cli` so the published CLI can talk to self-hosted Stack Auth installs without a custom build. The existing `STACK_CLI_PUBLISHABLE_CLIENT_KEY` override is kept as-is. - **Docker example**: surface the three CLI-relevant vars in `docker/server/.env.example` so self-host operators see them. - **Tighter polling-code TTL**: default `2h -> 2min`, max `24h -> 15min` for the CLI auth polling code. The code is only valid while a user is actively waiting in `stack login`, so a tight window limits the blast radius of a leaked code. - **Raw-SQL poll handler**: convert `apps/backend/src/app/api/latest/auth/cli/poll/route.tsx` from `prisma.cliAuthAttempt.*` to raw SQL targeted at the tenancy source-of-truth schema, matching the pattern already used by the initiate handler in `apps/backend/src/app/api/latest/auth/cli/route.tsx`. ## Test plan - [ ] `pnpm typecheck` - [ ] `pnpm lint` - [ ] `pnpm test run` (focus on CLI-auth tests if any) - [ ] Manual: `stack login` against a local backend - polling code now expires after ~2 minutes by default - `waiting` / `success` / `used` / `expired` branches still return correct status codes and bodies - [ ] Manual: published `stack-cli` against a self-hosted backend with `STACK_API_URL` / `STACK_DASHBOARD_URL` set, end-to-end login Made with [Cursor](https://cursor.com) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Improvements** * More robust CLI authentication polling with atomic database updates to prevent races; returns explicit statuses (waiting/expired/used/success) and provides the refresh token on success. * **Changes** * Default CLI auth token TTL reduced to 2 minutes and capped at 15 minutes. * Anonymous refresh token is considered present only when not null; null expiry is treated as not-expired. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
acc646cb0b
|
stack-cli: cloud/local init flow, auto-create on empty projects, post-setup next-steps (#1383)
### Summary Reworks `stack init` UX, adds Sentry error reporting to the CLI, polishes the emulator start flow, and overhauls the local-emulator dashboard's "Open config file" dialog. #### `stack init` flow - **New top-level flow.** Drops the old "link existing vs. create new local" fork. `init` now asks *where* to create the project — "Stack Auth Cloud" or "Local". Adds a new `create-cloud` mode that logs the user in, creates a cloud project, mints keys, and writes `.env` — no round-trip through the dashboard. - **Conditional emulator-install warning.** The "Local" choice label only shows "(requires local emulator installation, ~1.3gb storage required)" when the QEMU image isn't already on disk; otherwise it shows "(emulator already installed)". Driven by a new `isEmulatorImageInstalled()` helper in `commands/emulator.ts`. - **Auto-create on zero-projects.** When the link-from-cloud path hits an empty project list, the CLI now prompts *"You don't have any Stack Auth projects yet. Would you like to create one?"* and, on yes, runs the same flow as `stack project create`. Skips the pointless "select a project" prompt when we just created one. - **MCP-server notice.** Before invoking the coding agent, the CLI announces that it's also registering the Stack Auth MCP server (`mcp.stack-auth.com`) so the agent can answer Stack-specific questions going forward. - **Local-emulator env header.** When `writeProjectKeysToEnv` runs in `local` mode it writes a 3-line comment header above the keys explaining they're emulator-only and only valid while the emulator is running. - **"What's next" footer.** After setup finishes, prints a short orientation block: where the sign-up/sign-in routes live (`/handler/sign-up`, `/handler/sign-in`), how to start the local emulator (for `create` mode), a dashboard deep link for cloud projects (respects `STACK_DASHBOARD_URL`), and a docs link. #### Sentry error reporting (`lib/sentry.ts`, `index.ts`, `tsdown.config.ts`) - New `lib/sentry.ts` initializes `@sentry/node` with PII scrubbing (Stack key prefixes, JWTs, home-dir paths, sensitive field names like `token`/`secret`/`password`/`dsn`). - DSN is baked at build time via a tsdown `define` sentinel (`__STACK_CLI_SENTRY_DSN__`) — no DSN in source, no runtime env-var dependency for installed users. CI sets `STACK_CLI_SENTRY_DSN_BUILD` before `pnpm build`. - Disabled when `NODE_ENV=development` or `CI`. No user opt-out. - Wired into `main()`'s catch (only for unexpected errors — `CliError`/`AuthError` still print and exit cleanly) plus `uncaughtException` and `unhandledRejection` handlers via a `handleFatal` helper. #### `stack emulator start` welcome - After a fresh start (not when reusing a running VM, not when `--config-file` keeps stdout JSON-only), prints a short "Emulator is up" block with service URLs (dashboard / backend / inbucket) and common commands (`status`, `stop`, `reset`, `run`). #### Local-emulator dashboard "Open config file" dialog The dialog at `http://localhost:26700` (when no project is loaded) used to be a single text input asking for an absolute path, with no explanation of where that path comes from. **Backend** (`apps/backend/src/app/api/latest/internal/local-emulator/project/route.tsx`): - POST is now tolerant of directory paths or paths that don't end in `.ts`/`.js`/`.mjs` — it appends `stack.config.ts` and creates the file if missing (`writeConfigToFile` mkdir's parents). Lets users paste a project folder instead of hunting for the config file. - New GET endpoint returns up to 20 most-recent `LocalEmulatorProject` rows joined with their display names, sorted by `updatedAt` desc. Same `isLocalEmulatorEnabled()` + client-auth gating as POST. **Dashboard** (`apps/dashboard/src/app/(main)/(protected)/(outside-dashboard)/projects/page-client.tsx`): - Title changed to "Open your Stack Auth project". Description now explicitly ties the file to `stack init`: *"Point the local dashboard at the `stack.config.ts` in your project. If you just ran `stack init`, it was created at the root of that project."* - Added: *"Don't have one yet? Paste your project folder path instead and we'll create stack.config.ts for you."* - Recent-projects list (clickable rows that prefill the input) fetched from the new GET endpoint when the dialog opens. - OS-specific copy-path tip below the input (macOS ⌥-Copy as Pathname, Windows Shift+RC Copy as path, Linux `realpath`). - "Open project" button is disabled when the input is empty. - All error paths (empty input, non-absolute path, server errors, exceptions) surface via destructive toasts instead of throwing. Why no native file picker: browsers do not expose absolute filesystem paths from `<input type="file">`, drag-and-drop, or the File System Access API. The backend requires an absolute path, so a Finder-style picker isn't possible from a web page. The recent list + OS tips are the workaround. ### Goal The previous `init` flow dead-ended new users: if you had no project you got an error telling you to go create one in the dashboard and come back. The happy path also forced a choice between "link existing" and "create local emulator" — not the question most users are trying to answer. The emulator dashboard's open-project dialog had similar friction: an unexplained path field with no recall of previously-opened projects. And the CLI silently swallowed unexpected errors with no telemetry. This branch makes the first-run path work end-to-end from the terminal, gives the emulator dashboard a usable open-project surface, and turns CLI crashes into actionable bug reports. ### How to review - Start with `packages/stack-cli/src/commands/init.ts` — the whole user-facing flow lives in `runInit`. Mode dispatch at the top, `handleCreateCloud` is the new cloud branch, `printNextSteps` is the footer, the MCP notice prints right before `runClaudeAgent`. - `packages/stack-cli/src/lib/sentry.ts` is small and self-contained; the sentinel-replacement contract is in `tsdown.config.ts`'s `define` block. Confirm `dist/index.js` contains zero `__STACK_CLI_SENTRY_DSN__` occurrences after a build with the env var unset, and the actual DSN host after a build with it set. - `packages/stack-cli/src/commands/emulator.ts` — `printEmulatorWelcome()` is the welcome block; `isEmulatorImageInstalled()` is the new exported helper used by `init.ts`. - `apps/backend/src/app/api/latest/internal/local-emulator/project/route.tsx` — the directory-tolerance branch is in the POST handler around the `looksLikeConfigFile` check; the GET handler is appended at the bottom. - `apps/dashboard/src/app/(main)/(protected)/(outside-dashboard)/projects/page-client.tsx` — dialog markup, recent-list fetch effect, `pathCopyTip` memo, and the toast-based error handling in `handleOpenConfigFile`. - Non-interactive (CI) paths stay strict: empty-project list still errors with a pointer to `stack project create --display-name`. No surprise project creation in CI. - No tests. The CLI has no harness for the interactive flow; verification is manual. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Recent local emulator projects listed in the config dialog for quick selection. * New CLI create-cloud mode and --display-name flag; interactive cloud project creation and clearer next steps. * Emulator start shows a welcome banner with service URLs when a new instance starts. * **Improvements** * Config dialog UX, validation, error-toasting, and platform-aware copy refined; “Open project” disabled for empty/invalid paths. * CLI: centralized interactive project creation and improved fatal error handling. * **Chores** * Sentry added and initialized for CLI error reporting. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Bilal Godil <bg2002@gmail.com> |
||
|
|
647883c7ac
|
Move MCP server into a standalone apps/mcp app (#1405)
## Summary Splits the Stack Auth MCP server out of `apps/backend` and into a dedicated Next.js app at `apps/mcp/`, served on port `:42` (suffixed via `NEXT_PUBLIC_STACK_PORT_PREFIX`) and exposed in production at `https://mcp.stack-auth.com/mcp`. The backend no longer carries the MCP transport route; clients now point at the new host. Base: `dev` → Head: `chore/move-mcp-to-a-sep-app` Scope: 34 files, +1425 / −353 ## What changed - **New app** `apps/mcp/` — standalone Next.js + `@vercel/mcp-adapter`, with: - `src/app/api/internal/[transport]/route.ts` — MCP transport handler (moved from backend) - `src/app/mcp/route.ts`, `src/app/route.ts` — public landing + setup page - `src/app/health/route.ts` — health check - `src/mcp-handler.ts`, `src/setup-page.ts`, `src/analytics.ts` - **Backend** drops `apps/backend/src/app/api/internal/[transport]/route.ts` (−105) — MCP code is gone from the backend image. - **Dashboard** install hint updated to point at `https://mcp.stack-auth.com/mcp` (was `/`). - **Dev launchpad** gets an MCP tile so the new service shows up alongside the rest of the local stack. - **CI** workflows (`db-migration-backwards-compatibility`, `e2e-api-tests*`) start the MCP service in the background before running tests. - **Docs** (`docs-mintlify`, `docs/`) and `init-stack` / `init-prompt` updated to reference the new URL. - **E2E** `apps/e2e/tests/backend/endpoints/api/v1/internal/mcp.test.ts` reworked to hit the new host; `helpers.ts` and env files gain an MCP base-URL var. ## Visuals ### New `apps/mcp` setup page (`https://mcp.stack-auth.com/`) The standalone app's root now serves a self-contained MCP setup guide with per-client instructions (Cursor, VS Code, Codex, Claude Code, Claude Desktop, Windsurf, ChatGPT, Gemini CLI):  ### Dev launchpad now lists the MCP service New tile at port suffix `:42`, importance 2, alongside Backend / Dashboard / Demo app:  ## Notes for reviewers - The MCP transport endpoint moved path: it was mounted under `/api/internal/[transport]` in the backend; in the new app it's at the same path but on the dedicated host. The public-facing URL is `https://mcp.stack-auth.com/mcp`. - `apps/mcp` ships its own PostHog analytics client (`src/analytics.ts`) so the backend doesn't have to proxy events for it anymore. - Port allocation: `${PORT_PREFIX}42` (default `8142` in dev). Picked to fit the existing dev-launchpad importance-2 row. - No DB migrations. ## Test plan - [x] `apps/mcp` builds and `pnpm dev` serves on `:8142` - [x] Dev launchpad renders the new MCP tile (screenshot above) - [x] MCP setup page renders client tabs (screenshot above) - [x] E2E `mcp.test.ts` updated to hit the new host - [ ] CI green on `e2e-api-tests*` and `db-migration-backwards-compatibility` workflows (they were touched to start the MCP service) - [ ] `init-stack` / `mcp.ts` install flow lands users on the new URL <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Standalone MCP app added with a public /mcp endpoint and health check. * MCP appears in the dev-launchpad apps list. * **Documentation** * MCP endpoint updated to https://mcp.stack-auth.com/mcp in all setup guides and installer snippets. * Setup page enhanced with detailed client install tabs and instructions. * **Chores** * MCP service integrated into CI/e2e workflows and local env configs. <!-- end of auto-generated comment: release notes by coderabbit.ai --> |
||
|
|
7acbd8d56d | Improved StackAssertionError error logging | ||
|
|
d69773c9df | Retry OAuth refreshes | ||
|
|
7f35ae7d54 |
Fix migration tests
Some checks failed
all-good: Did all the other checks pass? / all-good (push) Has been cancelled
Ensure Prisma migrations are in sync with the schema / check_prisma_migrations (22.x) (push) Has been cancelled
Docker Server Build and Push / Docker Build and Push Server (push) Has been cancelled
Docker Server Build and Run / docker (push) Has been cancelled
Runs E2E API Tests (Local Emulator) / E2E Tests (Local Emulator, Node ${{ matrix.node-version }}) (22.x) (push) Has been cancelled
Runs E2E API Tests / E2E Tests (Node ${{ matrix.node-version }}, Freestyle ${{ matrix.freestyle-mode }}) (mock, 22.x) (push) Has been cancelled
Runs E2E API Tests / E2E Tests (Node ${{ matrix.node-version }}, Freestyle ${{ matrix.freestyle-mode }}) (prod, 22.x) (push) Has been cancelled
Runs E2E API Tests with custom port prefix / build (22.x) (push) Has been cancelled
Runs E2E Fallback Tests / E2E Fallback Tests (Node ${{ matrix.node-version }}) (22.x) (push) Has been cancelled
Lint & build / lint_and_build (24) (push) Has been cancelled
Mirror main branch to main-mirror-for-wdb / lint_and_build (push) Has been cancelled
Publish npm packages / publish (push) Has been cancelled
Publish Swift SDK to prerelease repo / publish (push) Has been cancelled
Sync Main to Dev / sync-commits (push) Has been cancelled
TOC Generator / TOC Generator (push) Has been cancelled
|
||
|
|
5ccd8dfd38 | Update GitHub URL | ||
|
|
bc6347e3c3
|
[Feat]: set flag to disable billing (#1417)
### Context There are some kinks to work out with deploying plan limits onto prod, so we'd like to disable it temporarily. ### Summary of Changes We update all call sites of the item quantity things with a flag based check. Idea is when flag is set to true, it should function as if there are no limits. |
||
|
|
765b0f4e29
|
New setup (#1413) | ||
|
|
440c18c894 | chore: update package versions | ||
|
|
2e41fde9c2 | _useSession now refreshes tokens more aggressively | ||
|
|
8901a93b55
|
Rename STACK_SEED_INTERNAL_PROJECT_SECRET_SERVER_KEY to STACK_INTERNAL_PROJECT_SECRET_SERVER_KEY (#1415)
## Summary - Renames the env var `STACK_SEED_INTERNAL_PROJECT_SECRET_SERVER_KEY` to `STACK_INTERNAL_PROJECT_SECRET_SERVER_KEY` everywhere it is used (20 occurrences across 8 files), covering backend env files, the Prisma seed script, runtime config, and the docker entrypoint/local-emulator scripts. - Mirrors the prior publishable-client-key rename in #1411. ## Test plan - [x] `pnpm lint` - [x] `pnpm typecheck` - [ ] Verify local emulator still boots with the renamed variable - [ ] Verify any deploy/CI configs that set the old name are updated alongside this change <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Chores** * Updated internal environment variable naming for API key management and server configuration consistency across backend systems, Docker deployment, and local development setup. <!-- end of auto-generated comment: release notes by coderabbit.ai --> |
||
|
|
24245ae54e
|
Rename STACK_SEED_INTERNAL_PROJECT_PUBLISHABLE_CLIENT_KEY to STACK_INTERNAL_PROJECT_PUBLISHABLE_CLIENT_KEY (#1411)
## Summary - Renames the env var `STACK_SEED_INTERNAL_PROJECT_PUBLISHABLE_CLIENT_KEY` to `STACK_INTERNAL_PROJECT_PUBLISHABLE_CLIENT_KEY` everywhere it is used (24 occurrences across 9 files), covering backend env files, the Prisma seed script, runtime config, and the docker entrypoint/local-emulator scripts. ## Test plan - [x] `pnpm lint` - [x] `pnpm typecheck` - [ ] Verify local emulator still boots with the renamed variable - [ ] Verify any deploy/CI configs that set the old name are updated alongside this change <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Chores** * Updated internal environment variable naming for consistency across backend configuration files and deployment scripts. <!-- end of auto-generated comment: release notes by coderabbit.ai --> |
||
|
|
b0812c8808
|
feat(analytics): gzip event batch body to bypass adblockers (#1407)
## Summary
The `POST /api/latest/analytics/events/batch` endpoint was being dropped
by content-blocking browser extensions (adblockers) because the JSON
request body literally contains the substring `$click`. Many filter
lists pattern-match on tokens like that and silently kill the request —
analytics events from anyone with an adblocker enabled never reached our
backend.
This PR encodes the request body so keyword-matching filters can't see
those tokens, while keeping the URL path unchanged (only the body was
being matched here) and keeping older SDK clients working.
## Approach
- **Client**: gzip the JSON payload via the browser-native
`CompressionStream("gzip")` API and POST it as
`application/octet-stream`. Falls back to plain JSON if
`CompressionStream` isn't available (very old browsers / non-browser
runtimes).
- **Server**: a yup `.transform()` on the body schema detects an
`ArrayBuffer`/`Uint8Array` input, gunzips it, and `JSON.parse`s before
normal schema validation runs. The existing JSON path is untouched, so
requests from older SDK versions in the wild continue to work without
changes — and all existing schema-error snapshot tests still pass
verbatim.
- **Safety**: hard caps on compressed (1 MB) and decompressed (8 MB)
sizes guard against zip-bomb shaped abuse. `node:zlib`'s
`maxOutputLength` enforces the latter at the C++ layer.
Bonus: gzip also gives a meaningful bandwidth win — click/page-view
events compress very well — and keepalive bodies (which have a 64 KB cap
in browsers) get more headroom.
## Files
- `apps/backend/src/app/api/latest/analytics/events/batch/route.tsx` —
body schema gains `.transform()` that gunzips binary inputs; size limits
added; everything else unchanged.
- `packages/stack-shared/src/interface/client-interface.ts` —
`sendAnalyticsEventBatch` now routes through a new module-level
`encodeAnalyticsBody` helper that gzips and switches Content-Type. Same
outer signature; encoding is internal.
- `apps/e2e/tests/backend/backend-helpers.ts` — `niceBackendFetch` gains
optional `rawBody`/`rawContentType` params so tests can send non-JSON
payloads. Existing JSON callers unaffected.
-
`apps/e2e/tests/backend/endpoints/api/v1/analytics-events-batch.test.ts`
— adds two tests:
- happy path: gzipped binary body returns `inserted: 1`
- sad path: garbage bytes return 400
## Out of scope (intentional)
- **URL path renaming**: not all adblockers match on `/analytics/`, but
some do. We're shipping the body fix first and will revisit if requests
still get blocked after deployment.
- **Encryption**: gzip is enough to defeat keyword filters. Encryption
adds key-management cost with no real adversary.
- **SDK regen**: only `client-interface.ts` (in `stack-shared`) was
touched; `event-tracker.ts` (the caller) is unchanged because it already
passes a JSON string. No `pnpm -w run generate-sdks` needed.
## Test plan
- [x] `pnpm typecheck` — green
- [x] `pnpm lint` — green
- [ ] Manually verify in dev: enable adblocker, click around with
analytics enabled, confirm batch requests now go through
- [ ] Spot-check ClickHouse `analytics_internal.events` shows the
expected rows
- [ ] Run the new e2e tests (`pnpm test run
apps/e2e/tests/backend/endpoints/api/v1/analytics-events-batch.test.ts`)
and confirm both new cases plus all preexisting snapshots pass
- [ ] Confirm the JSON back-compat path still works by hitting the route
with the existing JSON-body curl/test payloads
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Analytics batch uploads now accept gzipped binary payloads; clients
can send compressed bytes and the server will detect and decompress.
* Client sender can gzip event batches (falls back to JSON) and uses
keepalive to choose JSON vs compressed bytes.
* **Bug Fixes**
* Malformed, non-gzip, or overly-large compressed payloads now return a
clear 400 response.
* **Tests**
* Added E2E and unit tests plus test-helper support for raw/gzipped
request bodies and encoding behaviors.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
|
||
|
|
185bddec9e
|
[Dashboard] Redefine the user page with tabs and updated UI (#1351)
<!-- Make sure you've read the CONTRIBUTING.md guidelines: https://github.com/stack-auth/stack-auth/blob/dev/CONTRIBUTING.md --> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Tabbed user profile with Activity (30-day analytics, KPIs, daily chart, top lists, recent events), Payments (transactions, subscriptions, product/item balances) and an activity heatmap sidebar. * New internal user-activity API and admin-facing activity hook; admin API client can fetch per-user activity. * **UI/UX Improvements** * Unified menus, cards and tables; inline editable user details with accept/revert; metadata editor validates JSON; country-code input has draft editing; tabs support optional icons. * **API** * Transactions endpoint and admin transaction queries now support optional customer-scoped filtering. * **Tests** * End-to-end coverage for the user-activity endpoint. <!-- end of auto-generated comment: release notes by coderabbit.ai --> <img width="1326" height="752" alt="image" src="https://github.com/user-attachments/assets/97c04dca-db59-4357-98b1-8eae5a7a3673" /> <img width="1142" height="251" alt="image" src="https://github.com/user-attachments/assets/e1aa44fc-0d7e-436d-90a5-c7cb15155e24" /> <img width="1170" height="1125" alt="image" src="https://github.com/user-attachments/assets/bf6659fd-a9b5-4ae6-a13d-dab9956ad650" /> |
||
|
|
3a5153f4db
|
feat(payments): collect 0.9% platform fee on every stripe money movement (#1378)
## Summary Charges the platform 0.9% on both legs of each transaction on non-internal projects. - **Charge leg** — rides along via Stripe's native \`application_fee_amount\` / \`application_fee_percent\` params on the PaymentIntent / Subscription. - **Refund leg** — Stripe's default reverses our charge-leg fee on refund, netting us zero. We disable that with \`refund_application_fee: false\` ## Refs - https://docs.stripe.com/api/subscriptions/create#create_subscription-application_fee_percent - https://docs.stripe.com/api/payment_intents/object#payment_intent_object-application_fee_amount --------- Co-authored-by: nams1570 <amanganapathy@gmail.com> |
||
|
|
c01c052ac9
|
[Refactor][Feat] Implement Plan Limits for Hard-and-Soft Item Caps (#1215)
### Suggested Review Areas Please see `plans.ts` and `seed.ts` to verify whether the item caps are where they should be. Outside of that, each commit should be atomic so stepping through the commits should give you an idea of how I implemented each limit. ### Discussion Something to discuss: when a user cancels team/growth we regrant free fine, but any extra-seats they had just keeps billing. So they end up paying ~$29/mo per extra-seat on top of free's 1 seat, which is strictly worse than just staying on team. This surfaced while manually testing this PR, we only enforce the add-on base requirement at purchase time, nothing cascades on cancel. Should we cascade cancel add ons? ### Context Now that we have a stable suite of products for stack-auth, we want to limit the items under each product a customer has access to based on their plan. So for example, a free plan user has a certain amount of emails they can send out each month, and so on. We try to implement limits in this PR. ### Summary of Changes Implemented hard limits for dashboard admins, analytics per-query timeouts, sent email monthly capacity, events, and session replays. Implemented a soft cap for auth users (where if there's a signup beyond the limit, we log it to sentry so we can manually choose to email that user/team). For auth users, we do not block new user sign ups once plan limit has been hit. We also don't degrade or impact the customer experience. It logs to sentry and it is up to us to take manual action to email the user to upgrade the plan. Also, implementation wise, we count all the users across all the projects for this team and compare it to their plan item limit, rather than debiting items like we do for other approaches. As a soft cap, this should be fine plus this is a better source of truth. For email capacity, we operate a monthly limit of emails. Once this is hit, no more emails can be sent until the next month/ a plan upgrade. These emails will be treated as a send error, so they can be manually resent once the capacity is reset. With respect to the `email-queue` state engine, they go from `SENDING`->`SERVER_ERROR`, hooking into the existing state engine flow, with an external error that shows it's because of the rate limit. This is cleaner than inventing a new state that is identical for all intents and purposes to `SERVER_ERROR`. We check in processSingleEmail since that maps to the sending state. For analytics query timeouts, the backend route accepts a timeout parameter with the request. The way we implement the timeout for each query is by taking the `min(request_timeout,plan_timeout)` and using that. This determines how long a query can run for. For analytics events, there are server-side events (like refresh token refreshes or sign up rule triggers) and client side events (like page views or clicks). When these events occur, they are written to the events table in clickhouse. We choose to implement a hard cap for the total events, not just server side or client side. Once the cap is hit, we stop storing the events and display a banner on the analytics page. A different banner renders when we are at >=80% of total plan capacity. For session replays, we stop creating new session replays when the limit is hit. Old replays can still have chunks appended to them. The source of truth here is the session replay table- a new replay corresponds to a new row in the table. We have similar banners as to the events. Dashboard admins should be 4 for both team and unlimited. #### Implementation Caveats For debiting items across these limits, we now use `tryDecreaseQuantity` at the beginning. This means we debit first if possible before conducting the action (like writing events to clickhouse). In practice, this means that if clickhouse fails, then the user is debited for something that doesn't happen. However trying to build a refund workaround would be very clunky, and also, clickhouse is reliable. For debits that are very small in the order of things (say, 200 items on a 100k plan), it doesn't mean much. For emails, we don't debit items if it's a retry. This prevents the user for being charged multiple times for effectively one email. ### UI Changes The only UI changes in this PR are having certain banners render in analytics when a customer is approaching/ is at their monthly limit of session replays or events. ### Out of Scope for this PR We do not have metered pricing yet, so events/session replays/ email use beyond the limits cannot be charged yet. This is why for this implementation, we rely on hard and soft caps. We do not implement payment per-transaction pricing yet. That is deferred to a followup PR. The UI for the onboarding call will be set up as part of the overall onboarding flow which doesn't exist yet, so it has been deferred. Since the UI for the dashboard home page and project/account settings is currently being reworked, finding a better spot for plan upgrades is not handled in this PR. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Session replays added as a monthly included entitlement; onboarding calls added to Team/Growth plans. Dashboard banners warn about analytics-event and session-replay limits. Projects page adds extra-seat flow and improved invitation error handling. * **Behavior Changes** * Monthly renewal semantics for emails-per-month and analytics-events; analytics query timeouts now respect plan limits and are clamped. Email sends, analytics events, and new session creation are blocked when quotas are exhausted. Growth plan seats set to 4. * **Tests** * E2E and unit tests added to verify quota enforcement and free-plan regranting. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Mantra <87142457+mantrakp04@users.noreply.github.com> |
||
|
|
c69a27017b
|
fix team invitation email check + verification code TOCTOU (#1365)
## Summary Two authorization fixes in the backend. Both are pre-existing in `dev` and were found during a security audit of `apps/backend/src`. ### 1. Team invitation accept — email not validated [`team-invitations/accept/verification-code-handler.tsx`](https://github.com/stack-auth/stack-auth/blob/dev/apps/backend/src/app/api/latest/team-invitations/accept/verification-code-handler.tsx) destructured the invited email as `{}` and only used `data.team_id` + the accepting `user`. Any signed-in user in the tenancy who possessed the 45-char code could join the team as themselves — the invitation was not actually bound to the email it was addressed to. **Attack scenarios that work without this fix** - Forwarded invitation email (shared inbox, assistant inbox, auto-forward rules). - Screenshot of the invitation link pasted into Slack / Notion. - Insider with server-access reading the email outbox (`GET /api/latest/emails/outbox` returns rendered `html` + `variables.teamInvitationLink`). - Stale invite still sitting in spam after the invitee forwarded it elsewhere. **Fix.** The accept handler now requires that the accepting user owns the invited email as a *verified* contact channel on their account. Matches the invariant already used by the "list invitations for me" endpoint ([`team-invitations/crud.tsx:41-66`](https://github.com/stack-auth/stack-auth/blob/dev/apps/backend/src/app/api/latest/team-invitations/crud.tsx#L41-L66)). Rejections return a new `TEAM_INVITATION_EMAIL_MISMATCH` (403) error. ### 2. Verification-code handler TOCTOU [`route-handlers/verification-code-handler.tsx`](https://github.com/stack-auth/stack-auth/blob/dev/apps/backend/src/route-handlers/verification-code-handler.tsx) had a classic read-then-write TOCTOU: ```ts const verificationCode = await prisma.verificationCode.findUnique(...); if (verificationCode.usedAt) throw new KnownErrors.VerificationCodeAlreadyUsed(); // ... validation ... await prisma.verificationCode.update({ data: { usedAt: new Date() } }); // unconditional return await options.handler(...); ``` Five concurrent requests with the same code all pass the `if (usedAt)` gate, all mark the code used, all run the post-handler. For OTP sign-in the handler calls `createAuthTokens` which writes a fresh `projectUserRefreshToken` row per call — so **one OTP → N refresh tokens**. `auth/sessions/current` only revokes by `id: refreshTokenId` and there is no bulk-revoke for passwordless users (only password change in [`users/crud.tsx:1210`](https://github.com/stack-auth/stack-auth/blob/dev/apps/backend/src/app/api/latest/users/crud.tsx#L1210) does `deleteMany`). A phished OTP therefore becomes a session-persistence primitive. **Fix.** Replace the unconditional `update` with a conditional `updateMany({ where: { …, usedAt: null } })` executed before `options.handler`; if `count === 0` the race was already lost and we throw `VERIFICATION_CODE_ALREADY_USED` (409). This also benefits MFA sign-in and passkey sign-in, which share the same handler. ## Changes | File | Change | |---|---| | `team-invitations/accept/verification-code-handler.tsx` | Require verified contact channel matching `method.email` | | `route-handlers/verification-code-handler.tsx` | Atomic `updateMany` claim gated on `usedAt: null` | | `stack-shared/src/known-errors.tsx` | New `TeamInvitationEmailMismatch` (403) | | `e2e/.../team-invitations.test.ts` | Two new tests (mismatch + happy path) | | `e2e/.../auth/otp/sign-in.test.ts` | One new test: 5 parallel redemptions of one OTP → 1× 200 + 4× 409 | ## Test plan - [x] `pnpm test run apps/e2e/tests/backend/endpoints/api/v1/team-invitations.test.ts` — 27/27 pass - [x] `pnpm test run apps/e2e/tests/backend/endpoints/api/v1/auth/otp/sign-in.test.ts` — 12/12 (+ 4 pre-existing `it.todo`) - [x] `pnpm test run apps/e2e/tests/backend/endpoints/api/v1/auth/password` — 33/33 (+ 7 pre-existing todos) - [x] `pnpm test run apps/e2e/tests/backend/endpoints/api/v1/contact-channels` — 24/24 - [x] `pnpm test run apps/e2e/tests/backend/endpoints/api/v1/auth/passkey apps/e2e/tests/backend/endpoints/api/v1/auth/mfa` — 16/16 - [x] `pnpm --filter @stackframe/backend typecheck` — clean - [x] `pnpm --filter @stackframe/backend lint` + `pnpm --filter @stackframe/stack-shared lint` — clean ## Notes - The broader "plaintext credentials in DB + Sentry logs every header" finding from the same audit is **not** in this PR — a scrubber for `Sentry.setContext` request headers + unit tests is prepared on a local stash and will go out as a separate PR. - The team-invitation fix does not require any config change; fresh signups via the OTP / password flows that set `primary_email_verified: true` during creation already land the user with a verified channel matching the invited email, so the happy path is unaffected. ### Follow-up review (Codex) Addressed in follow-up commit `954cddb`: - **Finding 1 (High)**: mismatched invite acceptance was consuming the invitation before rejecting. Moved the email-ownership check into the pre-claim `options.validate` hook so a wrong-email attempt leaves `usedAt` untouched and the real recipient can still redeem. New test asserts this end-to-end. - **Finding 3 (Medium)**: invitation stored `body.email` raw but contact channels are stored via `normalizeEmail`, so case-varied invites (e.g. `Alice@Example.com`) wouldn't match a `alice@example.com` channel. `send-code` now normalizes on storage and `accept` normalizes on compare for back-compat with already-issued invites. New test covers the mixed-case path. - **Finding 2 (partial)**: added `expiresAt > now` to the atomic claim predicate for the boundary case where a code expires between the read and the claim. The reviewer's broader point about the `attemptCount` rate-limit check being non-atomic with its own increment **pre-dates this PR** (it reads the in-memory `verificationCode.attemptCount` from line 150, not a fresh read) and exists independently of the `usedAt` TOCTOU I'm fixing here. Tracking that as a separate follow-up so this PR stays scoped to the two originally-flagged issues. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Invite acceptance now requires the invitee’s verified, normalized (case‑insensitive) email; mismatches return HTTP 403 (TEAM_INVITATION_EMAIL_MISMATCH). * Client APIs now surface the new email-mismatch error alongside verification errors. * **Bug Fixes** * OTP verification codes are now guarded against parallel double‑redeem so only one request succeeds. * **Tests** * Added E2E tests for invitation email validation, non‑consuming rejection, case‑insensitive matching, and OTP concurrency. <!-- end of auto-generated comment: release notes by coderabbit.ai --> |
||
|
|
0ab2654051 | chore: update package versions | ||
|
|
d2f2fb0e42
|
[codex] Fix preview dummy payments customer types (#1398)
## Summary Fixes preview dummy payments seed data so seeded products and items match their team-scoped product lines. ## Root Cause The preview seed configured `workspace` and `add_ons` product lines with `customerType: "team"`, but the products inside those lines (`starter`, `growth`, and `regression-addon`) were configured as `customerType: "user"`. Environment override writes validate against the rendered branch config, so unrelated environment updates could fail with a product/product-line customer type warning. ## Changes - Mark preview dummy payments products and included items as team-scoped. - Export the dummy payments setup helper for focused validation. - Add a regression test that validates the generated branch payments override has no config override errors or incomplete config warnings. ## Validation Passed in the original checkout with dependencies installed: - `STACK_SKIP_TEMPLATE_GENERATION=true pnpm exec vitest run --config vitest.config.ts src/lib/seed-dummy-data.test.ts --reporter=verbose --maxWorkers=1 --minWorkers=1` - `pnpm -C apps/backend lint src/lib/seed-dummy-data.ts src/lib/seed-dummy-data.test.ts` - `pnpm -C apps/backend typecheck` The temporary clean worktree used for this PR did not have `node_modules`, so dependency-backed commands were not rerun there. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Improvements** * Strengthened payment product configuration with tighter typing and validation * Normalized product customer types (switched relevant dummy data from user to team) for consistency * **Tests** * Added tests validating dummy payments configuration and branch/override validation * **Documentation** * Added Q&A documenting a configuration validation failure mode and required consistency for dummy payments data <!-- end of auto-generated comment: release notes by coderabbit.ai --> |
||
|
|
e831972c4c
|
Move internal MCP server to backend, use Mintlify MCP for docs tools (#1389)
## Summary - Move the `/api/internal/[transport]` MCP route from the docs app to the backend, so the public `ask_stack_auth` MCP tool is served from the same origin as the AI query API it proxies to. - Replace the bespoke docs-tools HTTP client in `apps/backend/src/lib/ai/tools/docs.ts` with an `@ai-sdk/mcp` client that talks to Mintlify's generated MCP server. The backend AI agent now consumes Mintlify's lower-level search/fetch tools directly instead of going through the docs app. - Swap `STACK_DOCS_INTERNAL_BASE_URL` for `STACK_MINTLIFY_MCP_URL` (defaults to the Mintlify-hosted MCP URL). - Move the `@vercel/mcp-adapter` dependency from `docs` to `apps/backend`. ## Test plan - [ ] `pnpm typecheck` - [ ] `pnpm lint` - [ ] e2e: new `apps/e2e/tests/backend/endpoints/api/v1/internal/mcp.test.ts` covers `tools/list` and validation on `tools/call` - [ ] Manual: hit `POST /api/internal/mcp` on the backend and confirm `ask_stack_auth` is listed and callable - [ ] Manual: confirm backend AI agent docs tools resolve via the Mintlify MCP URL <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Backend docs tooling now uses a Mintlify MCP server for documentation tools and discovery. * **Chores** * Development environment variables updated to point to the Mintlify MCP endpoint. * Backend dependency added to support MCP integration; docs package dependency removed. * **Tests** * Added end-to-end tests for the internal MCP endpoint and tool validation. <!-- end of auto-generated comment: release notes by coderabbit.ai --> |
||
|
|
5e5cfdec4f
|
[Dashboard][Backend][SDK] - Adds sharable session replay ids. (#1294)
# Shareable Session Replay Links Adds the ability to share individual session replays via unique, direct URLs. https://www.loom.com/share/1e3298a19b114fc38af4bc43dcd5ec48 ## What changed - New admin endpoint — GET /api/v1/internal/session-replays/:id - Fetches a single session replay by ID with user metadata (display name, primary email) and chunk/event counts - Returns 404 if the replay doesn't exist - Admin-only access, consistent with the existing list endpoint ## New standalone replay page — /projects/:projectId/analytics/replays/:replayId - Thin server page wrapper that passes the replay ID to the existing PageClient - PageClient detects standalone mode via initialReplayId prop and fetches replay metadata directly instead of loading the full session list - Sidebar is hidden; the replay viewer takes the full width - "Back to all replays" link shown under the page title ## Copy link button - Moved from per-session sidebar items to the replay viewer header (next to the settings gear) - Copies a direct URL to the currently selected replay ## SDK plumbing - AdminGetSessionReplayResponse type in stack-shared - getSessionReplay() on StackAdminInterface, StackAdminApp interface, and _StackAdminAppImplIncomplete ## Tests - Happy path: fetch single replay by ID with inline snapshot - 404 for nonexistent replay ID - 401 for non-admin access (client and server) ## Test plan - [ ] Open /analytics/replays, select a replay, click the link icon in the header — verify URL is copied to clipboard - [ ] Paste that URL in a new tab — verify the standalone replay page loads and plays the correct replay - [ ] Verify "Back to all replays" link navigates back to the list page - [ ] Verify the original /analytics/replays list page still works as before (selecting, filtering, pagination) - [ ] Run pnpm test run session-replays <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Backend: internal endpoint to fetch a single session replay with user info, millisecond timestamps, and chunk/event counts. * Admin SDK/App: added response type and admin method to retrieve a single session replay; admin app maps response into the app model. * Dashboard: standalone session-replay page, UI adjustments for standalone mode, and a “copy replay link” button. * **Tests** * Added end-to-end tests for retrieval, not-found, and access-control scenarios. <!-- end of auto-generated comment: release notes by coderabbit.ai --> |
||
|
|
65d87a4836
|
Dashboard: DataGrid refactor + layout (stacked on overview-revamp) (#1338)
Some checks failed
all-good: Did all the other checks pass? / all-good (push) Has been cancelled
Ensure Prisma migrations are in sync with the schema / check_prisma_migrations (22.x) (push) Has been cancelled
DB migration compat / Check if migrations changed (push) Has been cancelled
Docker Server Build and Push / Docker Build and Push Server (push) Has been cancelled
Docker Server Build and Run / docker (push) Has been cancelled
Runs E2E API Tests (Local Emulator) / E2E Tests (Local Emulator, Node ${{ matrix.node-version }}) (22.x) (push) Has been cancelled
Runs E2E API Tests / E2E Tests (Node ${{ matrix.node-version }}, Freestyle ${{ matrix.freestyle-mode }}) (mock, 22.x) (push) Has been cancelled
Runs E2E API Tests / E2E Tests (Node ${{ matrix.node-version }}, Freestyle ${{ matrix.freestyle-mode }}) (prod, 22.x) (push) Has been cancelled
Runs E2E API Tests with custom port prefix / build (22.x) (push) Has been cancelled
Runs E2E Fallback Tests / E2E Fallback Tests (Node ${{ matrix.node-version }}) (22.x) (push) Has been cancelled
Lint & build / lint_and_build (24) (push) Has been cancelled
TOC Generator / TOC Generator (push) Has been cancelled
DB migration compat / Back-compat — Current branch migrations with ${{ needs.check-migrations-changed.outputs.base_branch }} branch code (push) Has been cancelled
DB migration compat / Forward-compat — Current branch code with ${{ needs.check-migrations-changed.outputs.base_branch }} branch migrations (push) Has been cancelled
DB migration compat / No migration changes (skipped) (push) Has been cancelled
## Summary Stacked on `overview-revamp` (now rebased against `dev`). Introduces a first-class `DataGrid` component in `@stackframe/dashboard-ui-components`, migrates every dashboard table off the legacy `DesignDataTable` / hand-rolled `<Table>` pattern to it, and ships a matching dashboard design guide. Since the last writeup the `DataGrid` runtime has been substantially rewritten: the virtualizer now supports `rowHeight="auto"` with `estimatedRowHeight`, every column can opt into `cellOverflow: "wrap"`, the toolbar + header stick under a configurable `stickyTop`, and the seeded dummy data has been fleshed out so the migrated surfaces render with realistic density. The AI-analytics prompt was also extended with full schema docs for the auth / team / email / payments tables so natural-language queries produce better SQL. **Base:** `dev` → **Head:** `ui-fixes-minor` **Scope:** 39 files, ~+6.5k / -2.4k ## Screenshots Captured against the seeded Demo Project on the local dashboard (`admin@example.com` via mock GitHub OAuth). Viewport: **1920×1200** (standard) and **2560×1440** (widescreen). Assets hosted in [this gist](https://gist.github.com/mantrakp04/2fe05ddbb2d2d7cd2d237027c909c1b9). ### Overview — revamped metrics + line chart | Light | Dark | | --- | --- | |  |  | Widescreen: | Light | Dark | | --- | --- | |  |  | ### Users — DataGrid with seeded rows | Light | Dark | | --- | --- | |  |  | Widescreen: | Light | Dark | | --- | --- | |  |  | ### Transactions — new DataGridToolbar + sticky chrome | Light | Dark | | --- | --- | |  |  | Widescreen: | Light | Dark | | --- | --- | |  |  | ### Teams | Light | Dark | | --- | --- | |  |  | Widescreen: | Light | Dark | | --- | --- | |  |  | ### Email Outbox | Light | Dark | | --- | --- | |  |  | Widescreen: | Light | Dark | | --- | --- | |  |  | ### Payments — Customers | Light | Dark | | --- | --- | |  |  | Widescreen: | Light | Dark | | --- | --- | |  |  | ### Sticky behaviour — scrolled views Grids scrolled down ~600px. The page header is still pinned, and the `DataGrid` toolbar + column header row stay put under it (backdrop-blur + `stickyTop` offset) while the virtualized body rows scroll past. Compare the scrolled view against the top-of-page view above. | Page | Light | Dark | | --- | --- | --- | | Users |  |  | | Teams |  |  | | Transactions |  |  | | Payments Customers |  |  | | Email Outbox |  |  | | Analytics Tables |  |  | ### Other migrated surfaces | Page | Light | Dark | | --- | --- | --- | | Analytics Tables |  |  | | Emails |  |  | | Email Sent |  |  | | Domains |  |  | | Webhooks |  |  | | External DB Sync |  |  | ## What's new ### `DataGrid` in `@stackframe/dashboard-ui-components` A new, fully-typed, fully-controlled grid component under `packages/dashboard-ui-components/src/components/data-grid/`. Single source of truth for tabular UI across the dashboard. Package files: - `data-grid.tsx` — main grid renderer (virtualized rows, sticky toolbar + header) - `data-grid-toolbar.tsx` — built-in toolbar (search, columns, density, export) - `data-grid-sizing.ts` — column width / flex / min-width resolution - `state.ts` — state helpers (`createDefaultDataGridState`, sort / select / paginate utilities, `exportToCsv`, date formatters) - `strings.ts` — i18n string table + `resolveDataGridStrings` - `types.ts` — public types (`DataGridColumnDef`, `DataGridProps`, `DataGridState`, `DataGridDataSource`, etc.) - `use-data-source.ts` — `useDataSource` hook with `client` / `server` / `infinite` modes - `index.ts` — package entrypoint Features: - Controlled state (`state` + `onChange`) covering sorting, pagination, column visibility, column widths, column pinning, selection, date-display mode, and quick search. - Column definitions with `string` / `number` / `date` / `dateTime` / `boolean` / `singleSelect` / `custom` types, custom `renderCell`, custom sort comparators, per-column `parseValue` / `dateFormat`, pinning, align, flex / min / max width. - **Cell overflow control** — new `cellOverflow: "truncate" | "wrap"` per column. `"wrap"` + `rowHeight="auto"` lets rows grow to fit multi-line content. - **Dynamic row heights** — `rowHeight` now accepts `"auto"` with an `estimatedRowHeight` hint for the virtualizer, eliminating scroll-position jank while rows are still being measured. - **Sticky chrome with `stickyTop`** — the toolbar and header stick under a caller-provided offset (matching the page header height) with a proper blur backdrop. See the _Sticky behaviour — scrolled views_ section above for the visual. - Client-side sort + quick-search + pagination via `useDataSource` — consumer never pre-sorts / paginates. - Server-side and async-generator data sources for streaming / cursor pagination. - Paginated and infinite-scroll UI modes. - CSV export + clipboard copy. - Row single / multi selection with shift-range anchor. - Row + cell click / double-click callbacks. - Pluggable toolbar / footer / empty / loading states and i18n strings. ### Dashboard design guide New `apps/dashboard/DESIGN-GUIDE.md`: prescriptive, AI-readable source of truth for dashboard UI. Documents when to use each `design-components` primitive, the `DataGrid` canonical pattern, color / typography / spacing / motion rules, route-specific guidance, and the migration priority. Now also documents the new `cellOverflow` and dynamic-`rowHeight` patterns, and marks `DesignDataTable` as deprecated in favor of `DataGrid` + `useDataSource` + `createDefaultDataGridState`. ### Overview page revamp `apps/dashboard/src/app/(main)/(protected)/projects/[projectId]/(overview)/line-chart.tsx` — line chart rewritten on top of the shared `AnalyticsChart` / `DonutChartDisplay` primitives, feeding the revamped Overview. ### Data-table migrations Every shared table under `apps/dashboard/src/components/data-table/` has been rewritten on top of `DataGrid`: - `api-key-table.tsx` - `payment-product-table.tsx` - `permission-table.tsx` - `team-member-search-table.tsx` - `team-member-table.tsx` - `team-search-table.tsx` - `team-table.tsx` - `transaction-table.tsx` — now also wires in `DataGridToolbar` with search / column visibility - `user-search-picker.tsx` - `user-table.tsx` — extracted `USER_TABLE_COLUMNS` for readability / reuse ### Page adoption Page-level tables migrated to `DataGrid` (or the new `useDataSource` + `createDefaultDataGridState` pattern): - `(overview)/line-chart.tsx` - `analytics/tables/query-data-grid.tsx` (now with sticky header) - `domains/page-client.tsx` - `email-drafts/[draftId]/page-client.tsx` - `email-outbox/page-client.tsx` (with `DataGridToolbar`) - `email-sent/page-client.tsx`, `grouped-email-table.tsx`, `sent-emails-view.tsx` - `emails/page-client.tsx` - `external-db-sync/page-client.tsx` - `payments/layout.tsx`, `payments/customers/page-client.tsx`, `payments/products/[productId]/page-client.tsx` - `users/[userId]/page-client.tsx` - `webhooks/page-client.tsx`, `webhooks/[endpointId]/page-client.tsx` - `design-language/page-client.tsx`, `design-language/realistic-demo/page-client.tsx` - `playground/page-client.tsx` ### Backend & supporting changes - `apps/backend/src/lib/ai/prompts.ts` — extends the AI-analytics prompt with detailed schema docs for `contact_channels`, `teams`, `team_member_profiles`, `team_permissions`, `team_invitations`, `email_outboxes`, `project_permissions`, `notification_preferences`, `refresh_tokens`, and `connected_accounts`, so natural-language queries have richer context to compile against. - `apps/backend/src/lib/seed-dummy-data.ts` — additional OAuth providers on seed users, improving dummy-data coverage for the migrated tables (visible on the Users grid). - `apps/dashboard/src/app/globals.css` — adds `--data-grid-sticky-top` token used to derive the grid's sticky offset under the page header. - `packages/template/src/dev-tool/dev-tool-core.ts` — persist the "closed" state when the user closes the dev-tool panel so it doesn't reopen on next load. ## Notes for reviewers - Rebased onto latest `dev`; conflict in `api-key-table.tsx` resolved by keeping the `DataGrid` implementation (consistent with the other migrated tables). - `DesignDataTable` is still in the codebase but marked deprecated in the design guide — new code must use `DataGrid`. - `DataGrid` is fully controlled: consumers must pass state + onChange, must feed `rows` from `useDataSource` (never raw arrays), and must define columns outside the component or via `useMemo`. The guide's §4.12 spells this out. - `rowHeight="auto"` is opt-in; the default fixed-height virtualization path is unchanged and remains the fast path for dense, single-line grids (users, transactions, etc.). - Screenshots are JPEG this round — the local capture tooling's PNG path was producing blank frames, so the new set is `.jpg` end-to-end. Same viewports, same seeded project. ## Test plan - [ ] `pnpm lint` passes - [ ] `pnpm typecheck` passes - [ ] Load the dashboard and verify every migrated surface renders, sorts, searches, paginates, and handles row-click navigation: - [ ] Overview (line chart + donut metrics) - [ ] Users list + user detail (teams, sessions, permissions, API keys) - [ ] Teams list + team detail (members, permissions) - [ ] Domains - [ ] Emails, email-sent, email-outbox, email-drafts - [ ] Webhooks list + endpoint detail - [ ] Payments customers, product detail, transactions (new toolbar) - [ ] External DB sync - [ ] Analytics query table (sticky header) - [ ] Verify infinite-scroll surfaces (domains, etc.) load additional rows on scroll - [ ] Verify sticky header stays below the page header in light and dark themes - [ ] Verify CSV export produces correct output on a representative table - [ ] Verify column resize, visibility toggle, and sort work across themes - [ ] Verify `cellOverflow: "wrap"` rows grow to fit when `rowHeight="auto"` and clip when `rowHeight` is numeric - [ ] Spot-check AI analytics queries against the new schema context (contact_channels, teams, email_outboxes, …) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Release Notes * **New Features** * Unified table components across dashboard with improved infinite pagination and quick search. * **Improvements** * Enhanced table performance with sticky headers and better row height handling. * Improved sorting, filtering, and data loading with consistent state management. * Better visual consistency across all data grids and table layouts. * **UI/Styling** * Refined table styling for better text truncation and content wrapping. * Optimized layout spacing and alignment across dashboard tables. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Developing-Gamer <maxcodes11110@gmail.com> Co-authored-by: Armaan Jain <84474476+Developing-Gamer@users.noreply.github.com> Co-authored-by: Konstantin Wohlwend <n2d4xc@gmail.com> |
||
|
|
2f719903b1
|
Redesign Email Server settings + managed domain flow (#1373)
Some checks failed
all-good: Did all the other checks pass? / all-good (push) Has been cancelled
Ensure Prisma migrations are in sync with the schema / check_prisma_migrations (22.x) (push) Has been cancelled
DB migration compat / Check if migrations changed (push) Has been cancelled
Docker Server Build and Push / Docker Build and Push Server (push) Has been cancelled
Docker Server Build and Run / docker (push) Has been cancelled
Runs E2E API Tests (Local Emulator) / E2E Tests (Local Emulator, Node ${{ matrix.node-version }}) (22.x) (push) Has been cancelled
Runs E2E API Tests / E2E Tests (Node ${{ matrix.node-version }}, Freestyle ${{ matrix.freestyle-mode }}) (mock, 22.x) (push) Has been cancelled
Runs E2E API Tests / E2E Tests (Node ${{ matrix.node-version }}, Freestyle ${{ matrix.freestyle-mode }}) (prod, 22.x) (push) Has been cancelled
Runs E2E API Tests with custom port prefix / build (22.x) (push) Has been cancelled
Runs E2E Fallback Tests / E2E Fallback Tests (Node ${{ matrix.node-version }}) (22.x) (push) Has been cancelled
Lint & build / lint_and_build (24) (push) Has been cancelled
TOC Generator / TOC Generator (push) Has been cancelled
DB migration compat / Back-compat — Current branch migrations with ${{ needs.check-migrations-changed.outputs.base_branch }} branch code (push) Has been cancelled
DB migration compat / Forward-compat — Current branch code with ${{ needs.check-migrations-changed.outputs.base_branch }} branch migrations (push) Has been cancelled
DB migration compat / No migration changes (skipped) (push) Has been cancelled
## Summary Rewrites the **Email Server** section of the project email settings page and the managed-domain setup flow. Replaces the dropdown + conditional-fields layout with a visual four-card picker, a clearer unsaved-state model, a stepper dialog for managed-domain onboarding, and a consistent tracked-domains list. Also fixes two data-correctness bugs in the managed-domain backend. ## Walkthrough (2×, dead-frames trimmed)  ## Before The saved state was a minimal dropdown, but choosing Custom SMTP / Resend revealed a long conditional form with a hidden gear toggle for server config, no clear "what is saved" signal, and a separate dialog pattern for managed domains. | Saved (Managed) | Custom SMTP selected | |---|---| |  |  | ## After — Provider cards Four visual cards (Stack Shared, Managed Domain, Resend, Custom SMTP) with updated copy. The saved provider shows a green **Current** pill; the card the user is previewing shows an amber dashed **Draft** pill. An amber unsaved-changes banner appears between the picker and the form when state diverges from saved, so it is unambiguous that a click is not yet committed. | Saved state | Previewing a different provider | |---|---| |  |  | Copy changes: - **Stack Shared** — "Only default emails — no custom templates, themes, or sender identity." (was: "Shared (noreply@stackframe.co)") - **Managed Domain** — "Bring your own domain. You add DNS records; we handle signing & delivery." (was: "Managed (via managed domain setup)") - **Resend** uses the official Resend brand mark (light/dark variants in `apps/dashboard/public/assets/`) ## After — Managed domain list + stepper dialog Selecting **Managed Domain** immediately shows the tracked-domain list with an **Add domain** button. Each row reflects real status (Active / Verified / Waiting for DNS / Verifying / Failed). Exactly one domain can be **Active** — the one matching the saved email config; every other verified/applied domain shows a **Use this domain** button so switching is always possible. Adding a domain opens a 3-stage dialog with a horizontal stepper (Verify is right-aligned for the final step). Stage 2 replaces the old bare NS-list with a proper **Type / Name / Content** DNS records table with per-row copy buttons. | Tracked domains list | DNS records table | |---|---| |  |  | ## Bug fixes - **Backend: applying a managed domain did not demote previously-applied ones.** Multiple rows could end up with status `APPLIED` even though only one could be in the saved config. New helper `demoteOtherAppliedManagedEmailDomains({ tenancyId, keepId })` runs inside `applyManagedEmailProvider` to demote all other applied rows in the tenancy back to `VERIFIED` before marking the new one. - **Frontend: "Use this domain" only appeared for `status === verified`.** A domain that had been applied then replaced could never be re-applied from the UI. Button now appears for any `verified` or `applied` row that is not currently in use; the **Active** label is derived from config match instead of DB status. - **Dev mock onboarding now mirrors production timing.** `shouldUseMockManagedEmailOnboarding()` used to insert domains as `verified` synchronously. Now the domain is created as `pending_verification`, and a fire-and-forget `runAsynchronously(() => wait(1000))` updates it to `verified` — mirroring the real Resend webhook flow so the UI states (pending → verifying → verified) are exercised in local dev. ## Test plan - [ ] Cards: clicking each card shows `Draft` pill + amber banner; Discard restores; Save commits and flips `Current` to the new card - [ ] Managed: Add domain → stage 1 input → stage 2 DNS table + copy → Check verification flips to stage 3 → Use this domain sets it Active and demotes the previously-active domain in the list - [ ] Managed: clicking **Use this domain** on a non-active verified row makes it Active and the previously-active row back to Verified - [ ] Shared / Resend / SMTP: existing save + test-email flows still work (logic preserved verbatim) - [ ] `pnpm typecheck` (dashboard + backend) and `pnpm lint` pass <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Redesigned email domain setup flow with multi-step verification dialog * Added copy-to-clipboard for DNS records * Enhanced provider selection interface with improved visual presentation * Onboarding now shows initial "pending verification" state and completes verification asynchronously * **Bug Fixes** * Ensures only one managed domain becomes active when applying a domain * Improved error handling for email configuration saves * **Tests** * Updated end-to-end tests to reflect async verification timing <!-- end of auto-generated comment: release notes by coderabbit.ai --> |
||
|
|
4a2595d9f7
|
Classify ClickHouse NO_COMMON_TYPE (386) as unsafe (#1380)
## Summary - Add ClickHouse error code `386` (`NO_COMMON_TYPE`) to `UNSAFE_CLICKHOUSE_ERROR_CODES` in `apps/backend/src/lib/clickhouse-errors.ts`. This stops the Sentry `StackAssertionError` (`Unknown Clickhouse error: code 386 not in safe or unsafe codes`) that was firing whenever an admin wrote a query like `SELECT [1, 'a']` or `SELECT if(1, 'a', 1)`, while keeping the raw error message out of prod responses. - Add two e2e regression tests: one against the cross-project `analytics_internal.users` table, and one against `system.query_log`, to pin that 386 is wrapped with the generic `Error during execution of this query.` message in prod (full detail only surfaces in dev/test). ## Why unsafe, not safe Both callers of `getSafeClickhouseErrorMessage` (`apps/backend/src/app/api/latest/internal/analytics/query/route.ts:59` and `apps/backend/src/lib/ai/tools/sql-query.ts:80`) execute caller-authored SQL under `readonly: "1"` with `SQL_project_id`/`SQL_branch_id` scoping. The ClickHouse client runs under a `limited_user` whose grants restrict most tables — but ClickHouse resolves types **before** enforcing ACL. That means a query like `SELECT if(1, query, 1) FROM system.query_log` surfaces code 386 with a message like `There is no supertype for types String, UInt8 ...`, leaking that `system.query_log.query` is a `String` — schema info from a table the caller can't actually read. This is the same type-before-ACL class as code 43 (`ILLEGAL_TYPE_OF_ARGUMENT`), which is already classified unsafe. Classifying 386 as unsafe keeps the defense-in-depth consistent: if per-customer tables are ever introduced and grants don't block reference-resolution in time, 386 won't leak their schema. Cost: in prod, an admin writing a malformed type-mismatch query sees only `Error during execution of this query.` instead of the supertype hint. Dev and test environments still show the full error via the existing `getNodeEnvironment()` branch, so local iteration is unaffected. ## Test plan - [x] `pnpm test run apps/e2e/tests/backend/endpoints/api/v1/analytics-query.test.ts` — all 64 tests pass, including the two 386 regression tests. - [ ] Monitor Sentry after deploy to confirm the `unknown-clickhouse-error-for-query` events for code 386 stop firing. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Bug Fixes** * Improved handling of a ClickHouse type-mismatch error to prevent exposure of sensitive data and ensure sanitized error responses. * **Tests** * Added regression tests that verify error responses are sanitized, return consistent error codes, and include expected headers without leaking internal details. <!-- end of auto-generated comment: release notes by coderabbit.ai --> |
||
|
|
a132dd23f9
|
fix: refresh-token P2025 race with concurrent sign-out (#1372)
## Summary - Fixes Sentry [STACK-BACKEND-146](https://stackframe-pw.sentry.io/issues/7377768662/): `PrismaClientKnownRequestError` P2025 on `projectUserRefreshToken.update()` during token refresh. - Root cause: `generateAccessTokenFromRefreshTokenIfValid` (`apps/backend/src/lib/tokens.tsx`) reads the refresh-token row upstream, then issues `.update(...)` on it (and on `projectUser`) inside a `Promise.all`. If a concurrent sign-out (`DELETE /auth/sessions/current`), session revoke, password change, or user deletion removes the row between the read and the update, Prisma throws P2025 and the refresh endpoint 500s. ## Changes - `apps/backend/src/lib/tokens.tsx` — swap the two `.update(...)`s for `.updateMany(...)` so a missing row is a no-op, then re-check the refresh token still exists; return `null` if it doesn't. The refresh route already maps `null` -> `KnownErrors.RefreshTokenNotFoundOrExpired` (401), which is the correct user-facing behavior for a just-revoked session. - `apps/backend/src/oauth/model.tsx` — in `generateAccessToken`, replace the "ultra-rare race condition" `throwErr` fallback with `throw new KnownErrors.RefreshTokenNotFoundOrExpired()` so concurrent sign-out during an OAuth `refresh_token` grant returns a clean 401 instead of 500. - `apps/e2e/tests/backend/endpoints/api/v1/auth/sessions/current/refresh-race.test.ts` — new regression test that fires `POST /auth/sessions/current/refresh` and `DELETE /auth/sessions/current` concurrently with the same refresh token. Before the fix it 500s on the first iteration; after, it passes in ~12s. ## Test plan - [x] New regression test passes locally. - [x] Existing `auth/sessions/**` + `auth/oauth/token.test.ts` still pass (27 tests, 3 todo, 0 failed). - [ ] CI green. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Bug Fixes** * Refresh flows now detect a revoked or removed refresh token during concurrent operations and stop cleanly, preventing issuance of an access token from stale data. * A specific refresh-token-not-found/expired error is returned instead of a generic failure when refresh cannot proceed. * **Tests** * Added E2E tests exercising concurrent refresh vs sign-out to prevent race-condition crashes and validate safe handling of competing requests. <!-- end of auto-generated comment: release notes by coderabbit.ai --> |
||
|
|
7957de4182
|
fix(email-queue): recover stuck sending without duplicate retry (#1356)
## Summary Email outbox rows can get stuck in `SENDING` if a worker dies after setting `startedSendingAt` but before finishing or unclaiming. This change adds `recoverEmailsStuckInSending`, which runs each email queue step and marks rows past the stuck timeout as **terminal server errors** with delivery status unknown, **without** scheduling an automatic retry (to avoid duplicate sends if the provider already accepted the message). ## Changes - **`recoverEmailsStuckInSending`**: updates stuck rows with `finishedSendingAt`, `canHaveDeliveryInfo: false`, and server error fields; emits Sentry via `captureError` when any rows are recovered. - **Tests**: `email-queue-step.test.tsx` covers recovery of old `startedSendingAt`, no-op for recent sends, and idempotency (second pass does not re-queue). ## Test plan - [ ] `pnpm` / vitest for `apps/backend/src/lib/email-queue-step.test.tsx` (requires dev DB like other integration tests in this package) Made with [Cursor](https://cursor.com) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Bug Fixes** * Email reliability: messages that remained stuck in sending are now automatically marked as terminal failures, assigned standardized error details, cleared from retry scheduling, prevented from receiving delivery info, and recovery emits an alert only when actual work occurs. Recovery is safe to run repeatedly (idempotent). * **Tests** * Added integration tests validating recovery behavior, proper field updates, and idempotency. <!-- end of auto-generated comment: release notes by coderabbit.ai --> |
||
|
|
f89b97bc54
|
fix connected accounts tokens (#1358)
Some checks failed
all-good: Did all the other checks pass? / all-good (push) Has been cancelled
Ensure Prisma migrations are in sync with the schema / check_prisma_migrations (22.x) (push) Has been cancelled
DB migration compat / Check if migrations changed (push) Has been cancelled
Docker Server Build and Push / Docker Build and Push Server (push) Has been cancelled
Docker Server Build and Run / docker (push) Has been cancelled
Runs E2E API Tests (Local Emulator) / E2E Tests (Local Emulator, Node ${{ matrix.node-version }}) (22.x) (push) Has been cancelled
Runs E2E API Tests / E2E Tests (Node ${{ matrix.node-version }}, Freestyle ${{ matrix.freestyle-mode }}) (mock, 22.x) (push) Has been cancelled
Runs E2E API Tests / E2E Tests (Node ${{ matrix.node-version }}, Freestyle ${{ matrix.freestyle-mode }}) (prod, 22.x) (push) Has been cancelled
Runs E2E API Tests with custom port prefix / build (22.x) (push) Has been cancelled
Runs E2E Fallback Tests / E2E Fallback Tests (Node ${{ matrix.node-version }}) (22.x) (push) Has been cancelled
Lint & build / lint_and_build (24) (push) Has been cancelled
TOC Generator / TOC Generator (push) Has been cancelled
DB migration compat / Back-compat — Current branch migrations with ${{ needs.check-migrations-changed.outputs.base_branch }} branch code (push) Has been cancelled
DB migration compat / Forward-compat — Current branch code with ${{ needs.check-migrations-changed.outputs.base_branch }} branch migrations (push) Has been cancelled
DB migration compat / No migration changes (skipped) (push) Has been cancelled
<!-- Make sure you've read the CONTRIBUTING.md guidelines: https://github.com/stack-auth/stack-auth/blob/dev/CONTRIBUTING.md --> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Bug Fixes** * OAuth flows now consistently block extra scopes and access tokens for shared OAuth keys, enforcing restrictions earlier in the request processing and across all environments. * **Tests** * Added end-to-end regression tests to verify requests with extra scopes against shared OAuth providers return a 400 response indicating extra scopes/access tokens are not allowed. <!-- end of auto-generated comment: release notes by coderabbit.ai --> |
||
|
|
3ea8052d35 | chore: update package versions | ||
|
|
d9492ac5f1 | Update submodules | ||
|
|
6f1df1a0c7 | Update submodules | ||
|
|
37ee5ec320
|
Fast-start local emulator via RAM snapshot + live secret rotation (#1340)
## Summary
`stack emulator start` now resumes a fully-warm VM snapshot instead of
cold-booting, bringing startup from 30–120s down to ~5–8s with
per-install secret rotation, or ~2.5s with rotation opt-out. The
snapshot is captured **locally on first `stack emulator pull`**, not
shipped from CI — QEMU migration state isn't portable across
accelerators (KVM/HVF/TCG) or `-cpu max` feature sets, so a CI-captured
snapshot couldn't resume reliably on arbitrary user hardware.
Also bundles a pile of CLI QoL fixes (progress bars, PR/run artifact
pulls, PR-build download, native-TS ISO writer replacing
`hdiutil`/`mkisofs`/`genisoimage` host dep, unit tests).
| Scenario | Before | After |
|---|---|---|
| Cold boot (no snapshot) | 30–120s | same, works as fallback |
| `stack emulator pull` (one-time, includes local snapshot capture) |
~30s download | ~30s download + ~1–3 min cold-boot capture |
| Snapshot resume, normal start | — | **~5–8s** |
| Snapshot resume, `EMULATOR_NO_ROTATION=1` | — | **~2.5s** |
Backend (`/health?db=1`) and dashboard (`/handler/sign-in`) return 200
on all paths. Two successive snapshot resumes produce different rotated
PCK/SSK/SAK/CRON_SECRET values per install.
## How it works
**Build (CI)** — `docker/local-emulator/qemu/build-image.sh`:
1. Cloud-init provisioning runs to completion (migrations, seed,
slim-image) producing `stack-emulator-<arch>.qcow2`.
2. Image is built with a topology compatible with later snapshot capture
(pinned SMP=4, phantom seed/bundle ISOs, STACKCFG runtime ISO mounted at
build time, qemu-guest-agent running, placeholder hex secrets baked in
under `STACK_EMULATOR_BUILD_SNAPSHOT=1`).
3. CI publishes **only the qcow2** — no `.savevm.zst` ships.
**Pull (user's machine)** —
`packages/stack-cli/src/commands/emulator.ts` + `run-emulator.sh
capture`:
1. `stack emulator pull` downloads the qcow2 with a progress bar (or
from a PR / workflow run via `--pr` / `--run`).
2. CLI invokes `run-emulator.sh capture`: cold-boots the qcow2 with a
matching device layout (phantom ISOs, fsdev, pcie-root-port, virtfs
detached — migration-incompatible), waits for backend+dashboard health,
then drives QMP: `stop` → set `mapped-ram` + `multifd` caps → `migrate
file:state.raw` → poll `query-migrate` → `quit`. Raw mapped-ram file is
zstd-compressed to `stack-emulator-<arch>.savevm.zst` in the images dir.
3. `--skip-snapshot` opts out (first `start` will then cold-boot).
**Runtime** — `run-emulator.sh start`:
1. Launch QEMU with `-incoming defer` when a `.savevm.zst` is present;
decompress on first use, keep the `.raw` cached for subsequent starts.
2. QMP: same `mapped-ram` + `multifd` caps → `migrate-incoming
file:<.raw>` → poll for `paused` → `cont`.
3. Generate fresh per-install secrets on the host; pipe them
base64-encoded through QGA `guest-exec input-data` →
`trigger-fast-rotate` in the guest → `docker exec -e … rotate-secrets`.
4. `rotate-secrets` in the container: validate keys (hex-only), targeted
`sed` on the placeholder PCK across built JS, `UPDATE ApiKeySet`,
`supervisorctl restart stack-app cron-jobs` (with
`stopasgroup`/`killasgroup` so the Node children actually die and
release their ports).
5. Poll backend+dashboard health; if anything fails, clean up and fall
back to cold boot transparently.
**Security model**: placeholder hex values are baked into the snapshot
(`00…ff` PCK, `00…ee` SSK, `00…dd` SAK, `00…cc` CRON_SECRET). They are
non-secret by construction. Real per-install secrets are generated at
each `emulator start` and never leave the host.
## CLI changes (`packages/stack-cli`)
- **`src/lib/iso.ts`** (new): native TypeScript ISO 9660 + Joliet
writer, replacing the host-side `hdiutil`/`mkisofs`/`genisoimage`
dependency for generating the STACKCFG runtime config disk. Unit tests
in `src/lib/iso.test.ts`.
- **`src/commands/emulator.ts`**:
- `pull`: streamed downloads with progress bar + ETA; `--pr <number>`
and `--run <id>` to pull from a PR build's CI artifacts (uses
`extract-zip` for the nested zip); `--skip-snapshot` to opt out of the
one-time local capture.
- `start` (existing, extended): auto-pulls AND auto-captures when no
image exists, so first-ever `start` is self-bootstrapping; emits
`STACK_EMULATOR_CLI_WROTE_ISO=1` so the shell helper skips its own ISO
regen (avoids the genisoimage host dep).
- `capture` (new, invoked by `pull` and the auto-pull path of `start`):
drives the local snapshot capture via `run-emulator.sh`.
- `status`, `stop`, `reset`, `list-releases`: preflight +
path-resolution tightening (`STACK_EMULATOR_HOME` → images/run dirs).
- Unit tests in `src/commands/emulator.test.ts`.
- **`EMULATOR_NO_ROTATION=1`** env var skips the post-resume rotation
(intended for tests/CI where the placeholder secrets are fine — comes
with a loud warning).
## CI (`.github/workflows/qemu-emulator-build.yaml`)
- Builds **QEMU 10.2.2 from source** (cached), because
`mapped-ram`/`multifd` migration capabilities aren't available in the
distro's QEMU. Enables KVM on ubicloud runners so amd64 boots at
hardware speed.
- amd64 + arm64 both build on the same amd64 matrix
(`ubicloud-standard-8`); arm64 runs under cross-arch TCG (provisioning
only — boot/verify smoke test is amd64-only).
- Verification now runs through the CLI: `emulator start` → `emulator
status` → `emulator stop` against the freshly-built qcow2 (via
`STACK_EMULATOR_HOME` pointing at the workspace, so the CLI doesn't
silently auto-pull a prior release).
- Packages **only** the qcow2. No `.savevm.zst` upload / publish.
- Release notes updated.
## Key files
**Shell / guest:**
- `docker/local-emulator/qemu/build-image.sh` — snapshot-compatible
device topology + STACKCFG runtime ISO at build time
- `docker/local-emulator/qemu/run-emulator.sh` — `start`, `capture`,
`stop`, `reset`, `status`; `-incoming defer`, `.raw` cache, QGA-driven
rotation, cold-boot fallback
- `docker/local-emulator/qemu/common.sh` (new) — shared `qmp_session` +
`capture_vm_state` (factored out so build-image.sh and run-emulator.sh
share the capture path)
- `docker/local-emulator/qemu/cloud-init/emulator/user-data` —
placeholder secrets in snapshot mode, `wait-for-stack-ready`,
`trigger-fast-rotate`, qemu-guest-agent enabled
- `docker/local-emulator/rotate-secrets.sh` (new) — in-container
rotation (sed + UPDATE + supervisorctl)
- `docker/local-emulator/supervisord.conf` — `stopasgroup`/`killasgroup`
on `stack-app` and `cron-jobs`
- `docker/local-emulator/entrypoint.sh` — only mint CRON_SECRET if unset
(placeholder supplied in snapshot mode via --env-file)
- `docker/local-emulator/Dockerfile` — ships `rotate-secrets` to
`/usr/local/bin`
- `docker/server/entrypoint.sh` — source
`/run/stack-auth/rotated-secrets.env`; skip full-tree sentinel scan on
warm restarts via marker
**CLI:**
- `packages/stack-cli/src/lib/iso.ts` (new) + `iso.test.ts` (new)
- `packages/stack-cli/src/commands/emulator.ts` + `emulator.test.ts`
(new)
- `packages/stack-cli/vitest.config.ts` (new)
**CI:**
- `.github/workflows/qemu-emulator-build.yaml`
## Test plan
- [x] `docker/local-emulator/qemu/build-image.sh {amd64,arm64}` produces
`stack-emulator-<arch>.qcow2` with snapshot-compatible topology
- [x] `stack emulator pull` downloads qcow2 with progress, then captures
locally (~1–3 min) and writes `stack-emulator-<arch>.savevm.zst` in the
images dir
- [x] `stack emulator pull --skip-snapshot` stops after download
- [x] `stack emulator pull --pr <n>` / `--run <id>` pull from PR /
workflow run artifacts
- [x] `stack emulator start` on a fresh dir auto-pulls **and**
auto-captures, then starts; subsequent starts fast-resume in ~5–8s;
backend + dashboard return 200
- [x] `EMULATOR_NO_ROTATION=1 stack emulator start` completes in ~2.5s;
backend + dashboard return 200 with warning printed
- [x] Two consecutive `emulator start` invocations produce different PCK
values in the internal `ApiKeySet` row
- [x] `stack emulator status` / `stop` / `reset` resolve paths from
`STACK_EMULATOR_HOME`
- [x] Verified end-to-end on arm64 macOS under HVF (capture ~50s,
fast-resume ~6.5s)
- [x] `pnpm lint` and `pnpm typecheck` pass; stack-cli unit tests (iso +
emulator) pass
- [ ] CI green on this PR (qemu-emulator-build matrix, smoke test)
- [ ] `gh release download emulator-<branch>-latest` contains only
`stack-emulator-<arch>.qcow2` once this PR merges and publish runs
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Snapshot fast-start/resume with optional warm-snapshot assets, runtime
ISO generation, and a cached QEMU build to speed emulator setup.
* CLI: streamed artifact downloads with progress, improved release/asset
handling, stronger preflight checks, and start/status/stop emulator
commands.
* Automated secret rotation and ability to apply rotated secrets at
container startup; supervisor control socket enabled.
* **Bug Fixes**
* More robust start/stop/resume flows with automatic fallback to cold
boot and improved process-group shutdown behavior.
* **Tests**
* New tests for CLI utilities and ISO image generation.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
|
||
|
|
85ae4b1c9e
|
Fix ClickHouse OOM in MAU query + optimize /internal/metrics route (#1344)
## Summary Fixes the Sentry `StackAssertionError: Failed to load monthly active users for internal metrics` crash (ClickHouse OOM at the 7.2 GiB per-query cap) and applies two related optimizations to other queries in the same route while here. Adds a local benchmark harness that validates correctness and measures peak memory / duration before & after. ## Root cause (the original Sentry error) `loadMonthlyActiveUsers` was written as `SELECT user_id … GROUP BY user_id` and then counting in Node via a `Set`. On a large project that ships back millions of user_ids. Two failure modes stacked: 1. **Result materialization** — every distinct user_id had to be buffered in the server before streaming to Node (~20 MiB of result for 450k users; much more at real scale). 2. **`JSONExtract(toJSONString(data), 'is_anonymous', 'UInt8')`** — the `toJSONString(data)` per-row re-serialization of the entire nested JSON column, billions of times, just to pull one boolean. Dominates bytes-read. Combined, on a single partition read from S3-backed MergeTree, this can exceed ClickHouse's 7.2 GiB per-query memory cap. That's exactly what the Sentry trace showed. ## Changes ### 1. Fix MAU query (`loadMonthlyActiveUsers`) Moved counting to the server with `uniqExact(sipHash64(normalized_user_id))` and pulled the JS-side normalization (`lower`, `trim`, `isUuid`) into SQL. Picked `sipHash64` after benchmarking 7 variants — it's exact (at <<2³² users) and halves the uniqExact hash-state vs. raw string keys. ### 2. Fix 1 — `JSONExtract(toJSONString(data), …)` → direct `CAST(data.is_anonymous, …)` Applied everywhere the pattern appeared in the metrics route: - `loadDailyActiveUsers` - the `analyticsUserJoin` subquery - the `nonAnonymousAnalyticsUserFilter` - `analyticsOverview:topRegion` - `analyticsOverview:online` Semantics preserved (`coalesce(CAST(data.is_anonymous, 'Nullable(UInt8)'), 0)` matches `JSONExtract(…, 'UInt8')` behavior when the field is missing). ### 3. Fix 3 — server-aggregate the split queries `loadDailyActiveUsersSplit` and `loadDailyActiveTeamsSplit` used to ship 1.2M+ `(day, user_id)` rows back to Node just so the JS could bucket them into new / retained / reactivated. Rewrote both as one CTE-style query that returns 31 rows (one per day in the 30-day window) with the counts precomputed. **Minor semantic shift** (documented inline in `route.tsx`): \"new\" is now based on the user's first-ever `\$token-refresh` event rather than their Postgres `signedUpAt`. Agrees for users who log in immediately after sign-up (the common case). Disagrees for the rare edge case of an account that existed pre-window but never generated a `\$token-refresh` until now — old code classified as \"reactivated,\" new code classifies as \"new.\" Judged acceptable; can be revisited. Postgres round-trips for `ProjectUser.signedUpAt` / `Team.createdAt` are no longer needed for the split, and the 76 MiB-ish wire ship is gone. ### 4. Benchmark harness (`apps/backend/scripts/benchmark-internal-metrics.ts`) Local-only tool. Three modes: - **MAU equivalence matrix** — 13 edge cases (empty, dedup, anonymous filter, window boundary, null user_id, non-UUID user_id, case variation, project isolation, missing/null `is_anonymous`, wrong event_type). Asserts OLD pipeline and NEW query return the **same set** of users, not just the same count. - **MAU perf** — OLD vs NEW plus 6 other candidate variants (inline regex, UUID keys, sipHash64, HLL sketches), reads `memory_usage` / `read_rows` / `result_bytes` from `system.query_log` for each, prints a ranked table. - **Full-route benchmark** (`BENCH_ROUTE_QUERIES=1`) — runs every ClickHouse query in `/internal/metrics` in three stages (BEFORE, AFTER, candidate OPTIMIZED) against the same seed and prints per-query deltas plus endpoint-level totals. Seeds under a synthetic `project_id` so real data is never touched; cleans up on exit via `ALTER TABLE … DELETE`. ## Benchmark results ### MAU query alone Ran at two scales; set-equality verified (new query identifies the same individual users, not just the same count). | seed | MAU | peak memory (old → new) | bytes read | duration | |---|---|---|---|---| | 500k events | 89,939 | 158.7 MiB → 46.7 MiB (**3.4×**, −70%) | 175.7 MiB → 63.0 MiB (2.8×) | 483 ms → 76 ms (**6.4×**) | | 2.5M events | 449,990 | 439.2 MiB → 281.4 MiB (1.56×, −36%) | 865.0 MiB → 310.9 MiB (2.8×) | 783 ms → 126 ms (**6.2×**) | MAU variant bake-off at 2.5M events (all exact, all set-equal to OLD): | variant | memory | duration | notes | |---|---|---|---| | v0_old (baseline) | 440 MiB | 567 ms | — | | v1_uniqExact_string | 284 MiB | 110 ms | naive fix | | v3_uniqExact_toUUID | 244 MiB | 153 ms | UUID keys, slower per-row | | **v4_uniqExact_sipHash64** | **125 MiB** | **95 ms** | **shipped** | | v5_uniq (HLL) ~approx | 30 MiB | 86 ms | −0.25% error | | v6_uniqCombined ~approx | 31 MiB | 67 ms | −0.15% error | ### Full `/internal/metrics` route (2.7M events, 300k users + page-views + clicks + teams) Ranked by BEFORE peak memory: | query | mem BEFORE | mem AFTER | Δ mem | dur BEFORE | dur AFTER | Δ dur | |---|---|---|---|---|---|---| | analyticsOverview:topReferrers | 588.1 MiB | 411.1 MiB | 1.43× | 1833 ms | 110 ms | **16.66×** | | analyticsOverview:totalVisitors | 584.3 MiB | 403.5 MiB | 1.45× | 1829 ms | 121 ms | 15.12× | | analyticsOverview:dailyEvents | 584.1 MiB | 403.7 MiB | 1.45× | 1897 ms | 140 ms | 13.55× | | loadUsersByCountry | 393.1 MiB | 385.4 MiB | ≈same | 74 ms | 80 ms | ≈same | | loadDailyActiveUsersSplit | 363.4 MiB | 396.8 MiB | *+9%* | 1966 ms | 356 ms | 5.52× | | analyticsOverview:topRegion | 269.9 MiB | 106.4 MiB | 2.54× | 1602 ms | 65 ms | 24.65× | | loadDailyActiveUsers | 268.3 MiB | 84.0 MiB | 3.19× | 1111 ms | 44 ms | 25.25× | | loadDailyActiveTeamsSplit | 59.6 MiB | 78.1 MiB | *+31%* | 70 ms | 123 ms | *+76%* | | loadMonthlyActiveUsers | 54.9 MiB | 54.9 MiB | ≈same | 68 ms | 56 ms | ≈same | | analyticsOverview:online | 18.4 MiB | 5.8 MiB | 3.17× | 58 ms | 4 ms | 14.50× | **Endpoint-level totals** | metric | BEFORE | AFTER | Δ | |---|---|---|---| | Sum peak ClickHouse memory | 3.11 GiB | 2.28 GiB | **−27%** | | **Max query duration** (endpoint wall-clock floor) | **1966 ms** | **356 ms** | **−82%** (5.5×) | | Sum query duration (total CPU) | 10508 ms | 1099 ms | **−90%** (9.6×) | | Bytes read | 10.70 GiB | 4.55 GiB | −57% | | Bytes shipped to Node | 94.8 MiB | 44.2 KiB | **−99.95%** | Both split queries show a small memory *regression* at this seed size (the new server-side window-function + self-join has its own state cost that's near break-even with \"materialize + ship\" at 300k users); at prod scale the 76 MiB-ship saving dominates. Duration is unambiguously better. ## Why we don't need to drop the `analyticsUserJoin` in this PR The benchmark includes an OPTIMIZED stage that drops the LEFT JOIN and trusts `e.data.is_anonymous` directly, which would shave another **1.2 GiB / 1.9× duration** off the endpoint. **But we can't ship that here** — an audit of the client tracker (`packages/js/src/lib/stack-app/apps/implementations/event-tracker.ts`) confirmed `is_anonymous` is never set on client-emitted `$page-view` / `$click` events. The JOIN is currently load-bearing. A follow-up PR will enrich `is_anonymous` at the batch ingest endpoint using `auth.user.is_anonymous`; after one metrics-window cycle (~30 days) the JOIN can be dropped. ## Follow-up work (out of scope for this PR) - **Batch-endpoint enrichment** + drop the analytics-overview LEFT JOIN (est. further −53% endpoint memory, −46% duration per the benchmark). - **Teams-split hash-variant count mismatch** — `sipHash64(team_id)` variant of the teams split shows a count discrepancy vs. the string-keyed version in the benchmark. Not blocking since teams-split is only #8 by memory; needs a root-cause pass before shipping that particular optimization. - **`loadUsersByCountry` window bound** — currently scans every `$token-refresh` event ever for the tenancy (no time filter). Bounding to 30 days would bound memory growth with project age, but changes semantics (\"country of latest login ever\" → \"in last 30 days\"). Deferred because it's product-facing. ## Snapshot changes in `internal-metrics.test.ts.snap` The `should return metrics data with users` test signs in 10 users today, then deletes one of them mid-test. Two small snapshot values change on today's date; both are just a reclassification of that single deleted user — the total (10 active users) is unchanged. - **`daily_active_users_split.new[today]`: 9 → 10** All 10 users really did sign in for the first time today. The old code only counted 9 because the deleted user's Postgres row was gone by the time the metrics query ran, so the old classifier couldn't see they were created today. The new query looks at ClickHouse events directly, sees the deleted user's first event was today, and counts them as new like everyone else. - **`daily_active_users_split.reactivated[today]`: 1 → 0** No user was "reactivated" today — nobody was active on an earlier day and came back. The old "1" was the deleted user falling into this bucket by default (the old classifier had no other rule that fit them). The new code correctly reports zero. Totals match either way (9 + 1 = 10 + 0). We're moving one deleted user out of the "returning visitor" bucket and into the "brand-new user" bucket, which is what they actually were. ## Test plan - [x] `pnpm typecheck` and `pnpm lint` pass on the backend package - [x] MAU equivalence matrix: 13/13 cases return the same set of users (not just the same count) between OLD and NEW pipelines - [x] Set-equality verified at 500k-MAU perf scale - [x] Full-route benchmark confirms the expected memory / duration improvements - [ ] Sanity-check the dashboard rendering after deploy (split charts, MAU counter, analytics overview) - [ ] Monitor Sentry for the assertion error — should drop to zero <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Performance Improvements** * Monthly and daily active metrics are now computed entirely server-side for faster queries and reduced client-side processing. * **Bug Fixes** * More consistent handling of anonymous/missing IDs and stricter ID filtering to improve accuracy across edge cases. * **Tests** * Added a comprehensive benchmark and validation harness to measure query performance and verify result equivalence across variants. <!-- end of auto-generated comment: release notes by coderabbit.ai --> |
||
|
|
0621ad2032
|
ai proxy fix (#1343)
<!-- Make sure you've read the CONTRIBUTING.md guidelines: https://github.com/stack-auth/stack-auth/blob/dev/CONTRIBUTING.md --> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Refactor** * Request sanitization now includes an extra proxy-specific preprocessing step for safer AI proxying. * **New Features** * Initialization prompts centralized into a shared helper, with a web-specific prompt variant. * Authenticated requests can optionally route via a provided external API key to access alternate models. * **Chores** * Added and exposed a preprocessing hook with a default no-op implementation. <!-- end of auto-generated comment: release notes by coderabbit.ai --> |
||
|
|
f0bbdb1c34 | Make access token warning just a log | ||
|
|
82c923e03c | waitUntil Sentry flush is complete | ||
|
|
560ee4c16e | Fix memory leak |