## Summary
Fixes the Sentry `StackAssertionError: Failed to load monthly active
users for internal metrics` crash (ClickHouse OOM at the 7.2 GiB
per-query cap) and applies two related optimizations to other queries in
the same route while here. Adds a local benchmark harness that validates
correctness and measures peak memory / duration before & after.
## Root cause (the original Sentry error)
`loadMonthlyActiveUsers` was written as `SELECT user_id … GROUP BY
user_id` and then counting in Node via a `Set`. On a large project that
ships back millions of user_ids. Two failure modes stacked:
1. **Result materialization** — every distinct user_id had to be
buffered in the server before streaming to Node (~20 MiB of result for
450k users; much more at real scale).
2. **`JSONExtract(toJSONString(data), 'is_anonymous', 'UInt8')`** — the
`toJSONString(data)` per-row re-serialization of the entire nested JSON
column, billions of times, just to pull one boolean. Dominates
bytes-read.
Combined, on a single partition read from S3-backed MergeTree, this can
exceed ClickHouse's 7.2 GiB per-query memory cap. That's exactly what
the Sentry trace showed.
## Changes
### 1. Fix MAU query (`loadMonthlyActiveUsers`)
Moved counting to the server with
`uniqExact(sipHash64(normalized_user_id))` and pulled the JS-side
normalization (`lower`, `trim`, `isUuid`) into SQL. Picked `sipHash64`
after benchmarking 7 variants — it's exact (at <<2³² users) and halves
the uniqExact hash-state vs. raw string keys.
### 2. Fix 1 — `JSONExtract(toJSONString(data), …)` → direct
`CAST(data.is_anonymous, …)`
Applied everywhere the pattern appeared in the metrics route:
- `loadDailyActiveUsers`
- the `analyticsUserJoin` subquery
- the `nonAnonymousAnalyticsUserFilter`
- `analyticsOverview:topRegion`
- `analyticsOverview:online`
Semantics preserved (`coalesce(CAST(data.is_anonymous,
'Nullable(UInt8)'), 0)` matches `JSONExtract(…, 'UInt8')` behavior when
the field is missing).
### 3. Fix 3 — server-aggregate the split queries
`loadDailyActiveUsersSplit` and `loadDailyActiveTeamsSplit` used to ship
1.2M+ `(day, user_id)` rows back to Node just so the JS could bucket
them into new / retained / reactivated. Rewrote both as one CTE-style
query that returns 31 rows (one per day in the 30-day window) with the
counts precomputed.
**Minor semantic shift** (documented inline in `route.tsx`): \"new\" is
now based on the user's first-ever `\$token-refresh` event rather than
their Postgres `signedUpAt`. Agrees for users who log in immediately
after sign-up (the common case). Disagrees for the rare edge case of an
account that existed pre-window but never generated a `\$token-refresh`
until now — old code classified as \"reactivated,\" new code classifies
as \"new.\" Judged acceptable; can be revisited.
Postgres round-trips for `ProjectUser.signedUpAt` / `Team.createdAt` are
no longer needed for the split, and the 76 MiB-ish wire ship is gone.
### 4. Benchmark harness
(`apps/backend/scripts/benchmark-internal-metrics.ts`)
Local-only tool. Three modes:
- **MAU equivalence matrix** — 13 edge cases (empty, dedup, anonymous
filter, window boundary, null user_id, non-UUID user_id, case variation,
project isolation, missing/null `is_anonymous`, wrong event_type).
Asserts OLD pipeline and NEW query return the **same set** of users, not
just the same count.
- **MAU perf** — OLD vs NEW plus 6 other candidate variants (inline
regex, UUID keys, sipHash64, HLL sketches), reads `memory_usage` /
`read_rows` / `result_bytes` from `system.query_log` for each, prints a
ranked table.
- **Full-route benchmark** (`BENCH_ROUTE_QUERIES=1`) — runs every
ClickHouse query in `/internal/metrics` in three stages (BEFORE, AFTER,
candidate OPTIMIZED) against the same seed and prints per-query deltas
plus endpoint-level totals.
Seeds under a synthetic `project_id` so real data is never touched;
cleans up on exit via `ALTER TABLE … DELETE`.
## Benchmark results
### MAU query alone
Ran at two scales; set-equality verified (new query identifies the same
individual users, not just the same count).
| seed | MAU | peak memory (old → new) | bytes read | duration |
|---|---|---|---|---|
| 500k events | 89,939 | 158.7 MiB → 46.7 MiB (**3.4×**, −70%) | 175.7
MiB → 63.0 MiB (2.8×) | 483 ms → 76 ms (**6.4×**) |
| 2.5M events | 449,990 | 439.2 MiB → 281.4 MiB (1.56×, −36%) | 865.0
MiB → 310.9 MiB (2.8×) | 783 ms → 126 ms (**6.2×**) |
MAU variant bake-off at 2.5M events (all exact, all set-equal to OLD):
| variant | memory | duration | notes |
|---|---|---|---|
| v0_old (baseline) | 440 MiB | 567 ms | — |
| v1_uniqExact_string | 284 MiB | 110 ms | naive fix |
| v3_uniqExact_toUUID | 244 MiB | 153 ms | UUID keys, slower per-row |
| **v4_uniqExact_sipHash64** | **125 MiB** | **95 ms** | **shipped** |
| v5_uniq (HLL) ~approx | 30 MiB | 86 ms | −0.25% error |
| v6_uniqCombined ~approx | 31 MiB | 67 ms | −0.15% error |
### Full `/internal/metrics` route (2.7M events, 300k users + page-views
+ clicks + teams)
Ranked by BEFORE peak memory:
| query | mem BEFORE | mem AFTER | Δ mem | dur BEFORE | dur AFTER | Δ
dur |
|---|---|---|---|---|---|---|
| analyticsOverview:topReferrers | 588.1 MiB | 411.1 MiB | 1.43× | 1833
ms | 110 ms | **16.66×** |
| analyticsOverview:totalVisitors | 584.3 MiB | 403.5 MiB | 1.45× | 1829
ms | 121 ms | 15.12× |
| analyticsOverview:dailyEvents | 584.1 MiB | 403.7 MiB | 1.45× | 1897
ms | 140 ms | 13.55× |
| loadUsersByCountry | 393.1 MiB | 385.4 MiB | ≈same | 74 ms | 80 ms |
≈same |
| loadDailyActiveUsersSplit | 363.4 MiB | 396.8 MiB | *+9%* | 1966 ms |
356 ms | 5.52× |
| analyticsOverview:topRegion | 269.9 MiB | 106.4 MiB | 2.54× | 1602 ms
| 65 ms | 24.65× |
| loadDailyActiveUsers | 268.3 MiB | 84.0 MiB | 3.19× | 1111 ms | 44 ms
| 25.25× |
| loadDailyActiveTeamsSplit | 59.6 MiB | 78.1 MiB | *+31%* | 70 ms | 123
ms | *+76%* |
| loadMonthlyActiveUsers | 54.9 MiB | 54.9 MiB | ≈same | 68 ms | 56 ms |
≈same |
| analyticsOverview:online | 18.4 MiB | 5.8 MiB | 3.17× | 58 ms | 4 ms |
14.50× |
**Endpoint-level totals**
| metric | BEFORE | AFTER | Δ |
|---|---|---|---|
| Sum peak ClickHouse memory | 3.11 GiB | 2.28 GiB | **−27%** |
| **Max query duration** (endpoint wall-clock floor) | **1966 ms** |
**356 ms** | **−82%** (5.5×) |
| Sum query duration (total CPU) | 10508 ms | 1099 ms | **−90%** (9.6×)
|
| Bytes read | 10.70 GiB | 4.55 GiB | −57% |
| Bytes shipped to Node | 94.8 MiB | 44.2 KiB | **−99.95%** |
Both split queries show a small memory *regression* at this seed size
(the new server-side window-function + self-join has its own state cost
that's near break-even with \"materialize + ship\" at 300k users); at
prod scale the 76 MiB-ship saving dominates. Duration is unambiguously
better.
## Why we don't need to drop the `analyticsUserJoin` in this PR
The benchmark includes an OPTIMIZED stage that drops the LEFT JOIN and
trusts `e.data.is_anonymous` directly, which would shave another **1.2
GiB / 1.9× duration** off the endpoint. **But we can't ship that here**
— an audit of the client tracker
(`packages/js/src/lib/stack-app/apps/implementations/event-tracker.ts`)
confirmed `is_anonymous` is never set on client-emitted `$page-view` /
`$click` events. The JOIN is currently load-bearing. A follow-up PR will
enrich `is_anonymous` at the batch ingest endpoint using
`auth.user.is_anonymous`; after one metrics-window cycle (~30 days) the
JOIN can be dropped.
## Follow-up work (out of scope for this PR)
- **Batch-endpoint enrichment** + drop the analytics-overview LEFT JOIN
(est. further −53% endpoint memory, −46% duration per the benchmark).
- **Teams-split hash-variant count mismatch** — `sipHash64(team_id)`
variant of the teams split shows a count discrepancy vs. the
string-keyed version in the benchmark. Not blocking since teams-split is
only #8 by memory; needs a root-cause pass before shipping that
particular optimization.
- **`loadUsersByCountry` window bound** — currently scans every
`$token-refresh` event ever for the tenancy (no time filter). Bounding
to 30 days would bound memory growth with project age, but changes
semantics (\"country of latest login ever\" → \"in last 30 days\").
Deferred because it's product-facing.
## Snapshot changes in `internal-metrics.test.ts.snap`
The `should return metrics data with users` test signs in 10 users
today, then deletes one of them mid-test. Two small snapshot values
change on today's date; both are just a reclassification of that single
deleted user — the total (10 active users) is unchanged.
- **`daily_active_users_split.new[today]`: 9 → 10**
All 10 users really did sign in for the first time today. The old code
only counted 9 because the deleted user's Postgres row was gone by the
time the metrics query ran, so the old classifier couldn't see they were
created today. The new query looks at ClickHouse events directly, sees
the deleted user's first event was today, and counts them as new like
everyone else.
- **`daily_active_users_split.reactivated[today]`: 1 → 0**
No user was "reactivated" today — nobody was active on an earlier day
and came back. The old "1" was the deleted user falling into this bucket
by default (the old classifier had no other rule that fit them). The new
code correctly reports zero.
Totals match either way (9 + 1 = 10 + 0). We're moving one deleted user
out of the "returning visitor" bucket and into the "brand-new user"
bucket, which is what they actually were.
## Test plan
- [x] `pnpm typecheck` and `pnpm lint` pass on the backend package
- [x] MAU equivalence matrix: 13/13 cases return the same set of users
(not just the same count) between OLD and NEW pipelines
- [x] Set-equality verified at 500k-MAU perf scale
- [x] Full-route benchmark confirms the expected memory / duration
improvements
- [ ] Sanity-check the dashboard rendering after deploy (split charts,
MAU counter, analytics overview)
- [ ] Monitor Sentry for the assertion error — should drop to zero
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Performance Improvements**
* Monthly and daily active metrics are now computed entirely server-side
for faster queries and reduced client-side processing.
* **Bug Fixes**
* More consistent handling of anonymous/missing IDs and stricter ID
filtering to improve accuracy across edge cases.
* **Tests**
* Added a comprehensive benchmark and validation harness to measure
query performance and verify result equivalence across variants.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
### Object of this PR
This PR is NOT a monolithic series of fixes for the payments suite + a
complete rework. Its aims were
a) introducing and robustly testing the bulldozer db system
b) reworking the payments underlying architecture to use bulldozer for
correctness and scalability
c) Achieving parity with the old payments system excepting a few changes
like ensuring correctness of the ledger algo
There may still be some work to do with handling refunds, decoupling the
concepts of purchases from that of products, and some other things.
### Ledger Algorithm
This has been tuned and fixed. Item removals i.e negative item quantity
changes will apply to the soonest expiring item grant i.e positive item
quantity change. This is what is best for the user. Item grants can also
expire, and when they expire we obviate whatever is left of their
original capacity (meaning after all the removals that were applied to
it). Our ledger algo is applied via Bulldozer, so automatic
re-computation is handled when a new grant/ removal is inserted in the
middle of the existing ones.
### Things we got rid of
* No more automatic support for default products. You can use $0 plan
provisions to accomplish the same effect but it's manual
* Negative item quantity changes (i.e item removals) no longer can have
expiries
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Enhanced payment processing pipeline with improved data consistency
and state management.
* Advanced refund handling with comprehensive transaction tracking.
* Better tracking and management of customer item quantities and owned
products.
* Improved subscription lifecycle management including period-end
handling.
* **Bug Fixes**
* Fixed payment data integrity verification.
* Improved handling of edge cases in refund scenarios.
* **Chores**
* Updated cSpell configuration with additional words.
* Expanded developer documentation for linting workflows.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Co-authored-by: Konstantin Wohlwend <n2d4xc@gmail.com>
Co-authored-by: Aadesh Kheria <kheriaaadesh@gmail.com>
Co-authored-by: Mantra <87142457+mantrakp04@users.noreply.github.com>
The custom-base-port and db-migration-backwards-compatibility workflows
were running cron jobs with `with-env:dev` instead of `with-env:test`,
causing ClickHouse sync mismatches in verify-data-integrity.
<!--
Make sure you've read the CONTRIBUTING.md guidelines:
https://github.com/stack-auth/stack-auth/blob/dev/CONTRIBUTING.md
-->
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Chores**
* Streamlined CI test workflows to standardize background cron job
startup for more consistent test runs.
* **Tests**
* Improved end-to-end test reliability by aligning background process
behavior across suites.
* **Bug Fixes**
* Enhanced data verification reliability by ensuring external database
sync before integrity checks and tightening comparison ordering for
certain records, reducing false mismatch detections.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
- Normalize empty route path to / for valid OpenAPI path keys
- Drop invalid OAS3 top-level type on header parameters
- Write client/server/admin/webhooks JSON to docs-mintlify/openapi on codegen
<!--
Make sure you've read the CONTRIBUTING.md guidelines:
https://github.com/stack-auth/stack-auth/blob/dev/CONTRIBUTING.md
-->
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* External DB sync now covers teams, team members, permissions,
invitations, email outbox, session replays, refresh tokens, and
connected accounts.
* New sequence ID fields and automatic change-flagging added to many
record types to enable incremental sync.
* **Improvements**
* Added concurrent indexes, faster/parallelized sync pipelines,
verification tooling, and richer observability.
* Dashboard sequencer stats expanded and end-to-end sync tests
significantly extended.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
- Introduced a fallback mechanism for the private sign-up risk engine,
allowing for zero-score assessments when the primary engine is
unavailable.
- Updated Next.js configuration to support dynamic resolution of the
private risk engine, including aliasing for both Turbopack and Webpack.
- Added a new fallback implementation in
`private-sign-up-risk-engine-fallback.ts` to ensure consistent behavior
during builds.
- Adjusted `risk-scores.tsx` to utilize the new compiled engine,
improving error handling and logging for risk assessment failures.
This update improves the robustness of the sign-up risk scoring system
and enhances the development experience by streamlining engine
resolution.
<!--
Make sure you've read the CONTRIBUTING.md guidelines:
https://github.com/stack-auth/stack-auth/blob/dev/CONTRIBUTING.md
-->
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Improvements**
* Sign-up risk engine is initialized and validated at startup for more
predictable performance.
* If the risk engine is unavailable or invalid, the system immediately
returns safe zero-risk scores to avoid runtime failures.
* **Tests**
* End-to-end tests updated to match the new engine initialization and
detection behavior.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Co-authored-by: Konstantin Wohlwend <n2d4xc@gmail.com>
### Context
We encountered an out of memory error when running verify-data-integrity
against the prod database. This was the error:
`FATAL ERROR: Ineffective mark-compacts near heap limit Allocation
failed - JavaScript heap out of memory`. This was one of the things
preventing verify-data-integrity from running successfully in prod.
### Summary of Changes
Local stress testing with constrained heap and memory telemetry revealed
that the rise in used heap memory was directly proportional to the
number of api calls. Investigation revealed that the `currentOutputData`
array was growing with each api call and was kept in memory. Since it
was still being appended to, it was actively kept in the heap. We
refactor the script to no longer use it, and for the two flags
`--save-output` and `--verify-output` that used it before, we refactor
them to not need to. `--save-output` now streams responses to disk as
JSONL and `--verify-output` now compares each response immediately and
discards it.
We also note a potential source of a future memory leak in the
`allUsers` array that is populated in memory for each project. We
refactor to paginate instead. Note that this didn't cause a memory leak
on local, this is a preventive measure.
### Out of Scope
fetching all transactions in the payments section of the script is
another potential cause for concern, but since the payments section of
the script will be refactored soon, we defer that discussion.
DB migration compat / Back-compat — Current branch migrations with ${{ needs.check-migrations-changed.outputs.base_branch }} branch code (push) Has been cancelled
DB migration compat / Forward-compat — Current branch code with ${{ needs.check-migrations-changed.outputs.base_branch }} branch migrations (push) Has been cancelled
<!--
Make sure you've read the CONTRIBUTING.md guidelines:
https://github.com/stack-auth/stack-auth/blob/dev/CONTRIBUTING.md
-->
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Browser-side event tracker with batching, navigation & click capture
and background/keepalive delivery
* Server endpoint to accept batched analytics events and associate them
with session replay segments
* Client APIs to send analytics batches and integrate with session
replay
* **Bug Fixes / UX**
* Pausing replay now uses the UI-facing playback time for more accurate
pause positions
* Replay endpoint now returns a clear analytics-disabled error
(ANALYTICS_NOT_ENABLED) when analytics is off
* **Tests**
* End-to-end tests covering batch ingestion, validation, and replay
timing behavior
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
<!--
Make sure you've read the CONTRIBUTING.md guidelines:
https://github.com/stack-auth/stack-auth/blob/dev/CONTRIBUTING.md
-->
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Added new session replay analytics columns to ClickHouse for enhanced
tracking and reporting
* **Refactor**
* Renamed session recording segment identifier across APIs and data
models from `tab_id` to `session_replay_segment_id`
* Updated internal data structures and type definitions to align with
new naming convention
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
DB migrations are backwards-compatible / Test migrations with ${{ needs.check-migrations-changed.outputs.base_branch }} branch code (push) Has been cancelled
<!--
Make sure you've read the CONTRIBUTING.md guidelines:
https://github.com/stack-auth/stack-auth/blob/dev/CONTRIBUTING.md
-->
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Real-time AI search with project-scoped analytics and dynamic query
execution; streaming AI responses replace the placeholder flow.
* External DB sync adds ClickHouse support: users sync, sync metadata
tracking, tenancy-aware status, and per-mapping throttling.
* AI assistant UI shows expandable tool-invocation results and streams
via the real AI pipeline.
* **Chores**
* Dashboard dependencies and workspace exclusions updated; development
OpenAI env var added; editor config flag toggled.
* **Tests**
* E2E coverage extended to validate ClickHouse user sync and analytics
queries.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Co-authored-by: aadesh18 <110230993+aadesh18@users.noreply.github.com>
Co-authored-by: Konsti Wohlwend <n2d4xc@gmail.com>
<img width="1920" height="969" alt="Screenshot 2026-02-04 at 9 47 16 AM"
src="https://github.com/user-attachments/assets/d7d0cd04-0051-4fc4-b857-e6f87ee97a59"
/>
**This PR revolves around the following components**
1. Sequencer - sequences the updates in the internal db
2. Poller - polls for the latest updates to sync with the external db
3. Outgoing Request Handler - essentially a trigger that can make http
requests based on a change in the internal db
4. Sync Engine - syncs with the latest changes from the internal db to
the external db
**What has been done**
- Added a global sequence id for ProjectUser, ContactChannel and
DeletedRow.
- Added the deletedRow table to keep track of the rows that were deleted
across ProjectUser and ContactChannel.
- Added the OutgoingRequest table to keep track of the outgoing requests
- Added function for the sequencer to call to sequence updates
- Added a sequencer that sequences all the changes in the internal db
every 50 ms
- Added a poller that polls for the latest changes in the internal db
every 50 ms, and adds to a queue
- Added a Vercel cron that calls sequencer and poller every minute
- Added a queue that fulfills the outgoing requests by making http calls
(for external db sync, it calls the sync engine endpoint)
- Added a sync engine that uses the defined sql mapping query in the
user's schema to pull in the changes for the user, and sync them with
the external db
- Added tests to test out each functionality
**How to review this PR:**
1. Review the migrations (sequence id, deletedRow, triggers, backlog
sync) (all files created under the migrations folder)
2. Review sequencer
3. Review poller
4. Review the changes in schema
5. Review sync-engine (the function, and it's helper file)
6. Review the schema changes, and query mappings
7. Review the tests (basic, advanced and race, along with the helper
file)
8. Review the changes made in Dockerfile to support local testing using
the postgres docker
<!-- CURSOR_SUMMARY -->
---
> [!NOTE]
> Introduces a cron-driven external DB sync pipeline with global
sequencing, internal poller and webhook sync engine, new DB
tables/functions, config schema/mappings, and comprehensive e2e tests.
>
> - **Database (Prisma/Migrations)**:
> - Add global sequence (`global_seq_id`) and
`sequenceId`/`shouldUpdateSequenceId` to `ProjectUser`,
`ContactChannel`, `DeletedRow` with partial indexes.
> - Create `DeletedRow` (capture deletes) and `OutgoingRequest` (queue)
tables; add unique/indexes.
> - Add triggers/functions: `log_deleted_row`,
`reset_sequence_id_on_update`, `backfill_null_sequence_ids`,
`enqueue_tenant_sync`.
> - **Backend/API**:
> - New internal routes: `GET
/api/latest/internal/external-db-sync/sequencer`, `GET /poller`, `POST
/sync-engine` (Upstash-verified) for sync orchestration.
> - Add cron wiring: `vercel.json` schedules and local
`scripts/run-cron-jobs.ts`; start in dev via `dev` script.
> - Tweak route handler (remove noisy logging) without behavior change.
> - **Sync Engine**:
> - Implement `src/lib/external-db-sync.ts` to read tenant mappings and
upsert to external Postgres (schema bootstrap, param checks,
sequencing).
> - Add default mappings `DEFAULT_DB_SYNC_MAPPINGS` and config schema
`dbSync.externalDatabases` in shared config.
> - **Testing/Infra**:
> - Add extensive e2e tests (basics, advanced, race conditions) for
sequencing, idempotency, deletes, pagination, multi-mapping, and
permissions.
> - Docker compose: add `external-db-test` Postgres for tests; e2e deps
for `pg` types.
>
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
3f2a8efcfb. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* External PostgreSQL sync: automatic, batched replication with
mappings, resume/idempotency, and on-demand enqueueing.
* **Admin UI**
* Real-time External DB Sync dashboard and status API showing
per-mapping backlog, sequencer/poller/sync-engine telemetry, and fusebox
controls.
* **Tests**
* Large e2e suite: basic, advanced, race, high-volume tests and test
utilities for external DB sync.
* **Chores**
* DB migrations, CI/workflow updates, background cron runner and
local/dev test support.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Co-authored-by: Konsti Wohlwend <n2d4xc@gmail.com>
Co-authored-by: Bilal Godil <bg2002@gmail.com>
<!--
Make sure you've read the CONTRIBUTING.md guidelines:
https://github.com/stack-auth/stack-auth/blob/dev/CONTRIBUTING.md
-->
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Added a comprehensive payments data-integrity verifier, Stripe payout
reconciliation, API validation helpers, and a throttled progress utility
for long-running checks.
* **Bug Fixes**
* Improved subscription/product filtering to correctly respect customer
type during verification.
* **Chores**
* Reorganized verification scripts and updated the verification
entrypoint invocation.
* **Tests**
* Enhanced test fixtures to include full product data for subscriptions.
<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
<img width="567" height="249" alt="Screenshot 2025-10-20 at 11 23 10 AM"
src="https://github.com/user-attachments/assets/340df844-f619-489f-8d41-cc26bc165018"
/>
<img width="595" height="255" alt="Screenshot 2025-10-20 at 11 24 00 AM"
src="https://github.com/user-attachments/assets/9321bda1-e6f0-4f53-8c6b-e29d0fc16038"
/>
<!--
Make sure you've read the CONTRIBUTING.md guidelines:
https://github.com/stack-auth/stack-auth/blob/dev/CONTRIBUTING.md
-->
<!-- RECURSEML_SUMMARY:START -->
## High-level PR Summary
This PR optimizes the performance of user list and metrics endpoints by
refactoring SQL queries to use more efficient patterns. The changes
include rewriting queries to use `LATERAL` joins and CTEs with proper
filtering, extracting common user mapping logic into reusable functions,
and adding performance tests with SQL scripts to generate realistic test
data (10,000 mock users and activity events across 100 countries).
⏱️ Estimated Review Time: 30-90 minutes
<details>
<summary>💡 Review Order Suggestion</summary>
| Order | File Path |
|-------|-----------|
| 1 | `apps/e2e/tests/backend/performance/mock-users.sql` |
| 2 | `apps/e2e/tests/backend/performance/mock-metric-events.sql` |
| 3 | `apps/e2e/tests/backend/performance/users-list.test.ts` |
| 4 | `apps/backend/src/app/api/latest/users/crud.tsx` |
| 5 | `apps/backend/src/app/api/latest/internal/metrics/route.tsx` |
</details>
[](https://discord.gg/n3SsVDAW6U)
[
<!-- RECURSEML_SUMMARY:END -->
<!-- ELLIPSIS_HIDDEN -->
----
> [!IMPORTANT]
> Optimize metrics and user list endpoints with SQL refactoring,
caching, and performance tests, adding a `CacheEntry` model and mock
data scripts.
>
> - **Performance Optimization**:
> - Refactor SQL queries in `route.tsx` to use `LATERAL` joins and CTEs
for efficient data retrieval.
> - Implement caching in `route.tsx` using `getOrSetCacheValue()` to
reduce database load.
> - **Database Changes**:
> - Add `CacheEntry` model to `schema.prisma` and create corresponding
table and index in `migration.sql`.
> - Remove auto-migration metadata step from
`check-prisma-migrations.yaml`.
> - **Testing**:
> - Add performance tests in `metrics.test.ts` to benchmark metrics and
user endpoints.
> - Create mock data scripts `mock-users.sql` and
`mock-metric-events.sql` for testing with 10,000 users and events across
100 countries.
> - **Miscellaneous**:
> - Update `db-migrations.ts` to include new migration file generation
logic.
> - Add `cache.tsx` for caching logic implementation.
>
> <sup>This description was created by </sup>[<img alt="Ellipsis"
src="https://img.shields.io/badge/Ellipsis-blue?color=175173">](https://www.ellipsis.dev?ref=stack-auth%2Fstack-auth&utm_source=github&utm_medium=referral)<sup>
for 4d9be71063. You can
[customize](https://app.ellipsis.dev/stack-auth/settings/summaries) this
summary. It will automatically update as commits are pushed.</sup>
----
<!-- ELLIPSIS_HIDDEN -->
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Metrics now use a cache layer with per-entry TTL and tenancy-aware
loaders.
* **Bug Fixes**
* Improved accuracy of daily active and related metrics with
tenancy-aware counting and more robust last-active computation.
* **Performance**
* Faster metrics responses via batched reads and cache-backed endpoints.
* **Tests**
* Added end-to-end performance benchmarks and SQL seed scripts for
metrics/user load testing.
* **Chores**
* DB migration added support for cached entries; CI migration check flow
adjusted; migration tooling improved.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Co-authored-by: Konsti Wohlwend <n2d4xc@gmail.com>
<!--
Make sure you've read the CONTRIBUTING.md guidelines:
https://github.com/stack-auth/stack-auth/blob/dev/CONTRIBUTING.md
-->
<!-- RECURSEML_SUMMARY:START -->
## High-level PR Summary
This PR implements a comprehensive renaming of "offer" to "product" and
"offer group" to "product catalog" throughout the codebase. The changes
include database migrations, schema updates, API compatibility layers,
function renames, and updates to client and server implementations.
Backwards compatibility is maintained through migration layers that
handle requests using the old terminology, translating them to the new
terminology before processing. The PR includes documentation of this
approach in CLAUDE-KNOWLEDGE.md. This rename affects multiple parts of
the system including the database schema, API endpoints, error types,
and SDK interfaces.
⏱️ Estimated Review Time: 1-3 hours
<details>
<summary>💡 Review Order Suggestion</summary>
| Order | File Path |
|-------|-----------|
| 1 |
`apps/backend/prisma/migrations/20250923191615_rename_offers_to_products/migration.sql`
|
| 2 |
`apps/backend/src/app/api/migrations/v2beta1/payments/purchases/offers-compat.ts`
|
| 3 |
`apps/backend/src/app/api/migrations/v2beta1/payments/purchases/create-purchase-url/route.ts`
|
| 4 |
`apps/backend/src/app/api/migrations/v2beta1/payments/purchases/validate-code/route.ts`
|
| 5 | `apps/backend/src/lib/payments.tsx` |
| 6 | `.claude/CLAUDE-KNOWLEDGE.md` |
| 7 | `packages/stack-shared/src/schema-fields.ts` |
| 8 | `packages/stack-shared/src/known-errors.tsx` |
| 9 | `packages/stack-shared/src/config/schema.ts` |
| 10 | `packages/template/src/lib/stack-app/customers/index.ts` |
| 11 |
`packages/template/src/lib/stack-app/apps/implementations/client-app-impl.ts`
|
| 12 |
`packages/template/src/lib/stack-app/apps/implementations/server-app-impl.ts`
|
</details>
[](https://discord.gg/n3SsVDAW6U)
<!-- RECURSEML_SUMMARY:END -->
<!-- ELLIPSIS_HIDDEN -->
----
> [!IMPORTANT]
> Renames 'offer' to 'product' and 'offer group' to 'product catalog'
across the codebase, updating database schema, API endpoints, and
application logic for consistency and backward compatibility.
>
> - **Database**:
> - Rename columns `offer` to `product` and `offerId` to `productId` in
`OneTimePurchase` and `Subscription` tables in `migration.sql`.
> - **API & Migrations**:
> - Update API endpoints to accept `product_id`/`product_inline` instead
of `offer_id`/`offer_inline`.
> - Add `v2beta5` compatibility layer to map legacy `offer` fields to
`product` equivalents.
> - **Shared Schemas**:
> - Rename `offerSchema` to `productSchema` and related schemas in
`schema-fields.ts`.
> - **Server Implementation**:
> - Update `createCheckoutUrl` method in `server-app-impl.ts` to use
`productId`/`InlineProduct`.
> - **Tests**:
> - Update tests to reflect renaming in `backend-helpers.ts` and other
test files.
> - **Miscellaneous**:
> - Remove dummy data related to offers in `dummy-data.tsx`.
> - Update documentation and comments to reflect terminology changes.
>
> <sup>This description was created by </sup>[<img alt="Ellipsis"
src="https://img.shields.io/badge/Ellipsis-blue?color=175173">](https://www.ellipsis.dev?ref=stack-auth%2Fstack-auth&utm_source=github&utm_medium=referral)<sup>
for e3227bcbd2. You can
[customize](https://app.ellipsis.dev/stack-auth/settings/summaries) this
summary. It will automatically update as commits are pushed.</sup>
----
<!-- ELLIPSIS_HIDDEN -->
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Backwards-compatibility: legacy offer_id/offer_inline requests are
accepted, normalized, and routed to product-based handlers.
* **Refactor**
* Global rename from Offer/Group → Product/Catalog across UI, APIs,
types, client/server interfaces, and error codes.
* **Bug Fixes**
* Responses, webhooks and UI consistently surface product_display_name
and product-related metadata.
* **Documentation**
* Migration notes and docs updated to explain compatibility and
parameter changes.
* **Tests**
* Unit and E2E suites updated to cover product/catalog flows.
* **Chores**
* Database schema migration, seed and config updates applied.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
<!-- CURSOR_SUMMARY -->
---
> [!NOTE]
> Renames offers→products and groups→catalogs end-to-end (DB, APIs,
schemas, UI, SDK, docs), adding v2beta5 compatibility to accept legacy
offer fields while updating all internals.
>
> - **Backend/DB**:
> - Prisma migration: rename `offer`/`offerId`→`product`/`productId` in
`OneTimePurchase` and `Subscription`.
> - Update Stripe webhook, purchase-session, and internal test-mode
flows to use `product*` metadata/fields.
> - **API & Migrations**:
> - Latest endpoints now accept `product_id`/`product_inline`.
> - Add `v2beta5` compat layer mapping legacy `offer_id`/`offer_inline`
to product equivalents; responses alias conflicting products.
> - **Shared Schemas/Errors/Config**:
> - `offerSchema`→`productSchema`,
`inlineOfferSchema`→`inlineProductSchema`, prices/types renamed.
> - KnownErrors renamed (e.g., `PRODUCT_DOES_NOT_EXIST`).
> - Config: `groups`→`catalogs`, defaults/migrations updated; improved
override validation messages; ID regex loosened; formatter tweaks; add
schema fuzzer tests.
> - **Payments Lib**:
> - Rename APIs and logic (`offers`→`products`, `groupId`→`catalogId`),
subscription and item-quantity computation updated.
> - **Dashboard/UI**:
> - Routes, dialogs, editors, tables, and code samples switched to
products/catalogs; removed offers dummy data.
> - **SDK/Template**:
> - Client/server `createCheckoutUrl` now uses
`productId`/`InlineProduct`.
> - **Tests/Docs/Utilities**:
> - E2E and unit tests updated; add legacy (pre-rename) tests.
> - Docs and knowledge base revised; minor script tweaks (recent-first,
limits).
>
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
e6e20ecd72. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->
---------
Co-authored-by: BilalG1 <bg2002@gmail.com>
<!--
Make sure you've read the CONTRIBUTING.md guidelines:
https://github.com/stack-auth/stack-auth/blob/dev/CONTRIBUTING.md
-->
<!-- ELLIPSIS_HIDDEN -->
----
> [!IMPORTANT]
> Introduces an automated database migration system, replacing manual
Prisma commands with new scripts and updating workflows, configurations,
and tests accordingly.
>
> - **Auto-Migration System**:
> - Introduces `db-migrations.ts` script for handling database
migrations automatically.
> - Adds utility functions in `utils.tsx` for managing migration files.
> - Implements `applyMigrations` and `runMigrationNeeded` in `index.tsx`
for executing migrations.
> - **Workflow and Scripts**:
> - Updates GitHub workflows (`check-prisma-migrations.yaml`,
`e2e-api-tests.yaml`) to use new migration commands.
> - Replaces `prisma migrate` commands with `db:init`, `db:migrate`,
etc., in `package.json` and `README.md`.
> - **Testing**:
> - Adds `auto-migration.tests.ts` for testing migration logic and
concurrency handling.
> - **Configuration**:
> - Updates `.env.development` and `vitest.config.ts` for new
environment variables and paths.
> - Modifies `turbo.json` and `package.json` to include new migration
tasks and scripts.
>
> <sup>This description was created by </sup>[<img alt="Ellipsis"
src="https://img.shields.io/badge/Ellipsis-blue?color=175173">](https://www.ellipsis.dev?ref=stack-auth%2Fstack-auth&utm_source=github&utm_medium=referral)<sup>
for 2c24183879. You can
[customize](https://app.ellipsis.dev/stack-auth/settings/summaries) this
summary. It will automatically update as commits are pushed.</sup>
<!-- ELLIPSIS_HIDDEN -->
---------
Co-authored-by: Konsti Wohlwend <n2d4xc@gmail.com>
Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>