stack/docker/local-emulator/qemu/cloud-init/emulator/user-data
BilalG1 57ff5d3ce9
Some checks failed
all-good: Did all the other checks pass? / all-good (push) Has been cancelled
Ensure Prisma migrations are in sync with the schema / check_prisma_migrations (22.x) (push) Has been cancelled
DB migration compat / Check if migrations changed (push) Has been cancelled
Docker Server Build and Push / Docker Build and Push Server (push) Has been cancelled
Docker Server Build and Run / docker (push) Has been cancelled
Runs E2E API Tests (Local Emulator) / E2E Tests (Local Emulator, Node ${{ matrix.node-version }}) (22.x) (push) Has been cancelled
Runs E2E API Tests / E2E Tests (Node ${{ matrix.node-version }}, Freestyle ${{ matrix.freestyle-mode }}) (mock, 22.x) (push) Has been cancelled
Runs E2E API Tests / E2E Tests (Node ${{ matrix.node-version }}, Freestyle ${{ matrix.freestyle-mode }}) (prod, 22.x) (push) Has been cancelled
Runs E2E API Tests with custom port prefix / build (22.x) (push) Has been cancelled
Runs E2E Fallback Tests / E2E Fallback Tests (Node ${{ matrix.node-version }}) (22.x) (push) Has been cancelled
Lint & build / lint_and_build (24) (push) Has been cancelled
TOC Generator / TOC Generator (push) Has been cancelled
DB migration compat / Back-compat — Current branch migrations with ${{ needs.check-migrations-changed.outputs.base_branch }} branch code (push) Has been cancelled
DB migration compat / Forward-compat — Current branch code with ${{ needs.check-migrations-changed.outputs.base_branch }} branch migrations (push) Has been cancelled
DB migration compat / No migration changes (skipped) (push) Has been cancelled
feat(hexclave): PR 2 — visible rebrand (Hexclave brand goes public) (#1481)
## Summary

**Stacked on [#1475](https://github.com/hexclave/stack-auth/pull/1475)**
(`cl/hexclave-pr1`, the invisible compatibility layer). Diff vs that
base = the actual PR 2 code.

This is **PR 2 of the Stack Auth → Hexclave rebrand: the visible flip**.
Old wire identifiers (cookies, request/response headers, Bearer prefix,
JWT issuers, MCP tool name) keep working indefinitely via PR 1's
dual-accept. This PR flips every user-visible surface — package names
taught in docs, SDK class names in code examples, dashboard setup
snippets, page titles, error messages, email content, CLI binary,
default base URLs, GitHub repo slug, contributor guidance — to the
Hexclave brand.

See [`RENAME-TO-HEXCLAVE.md`](./RENAME-TO-HEXCLAVE.md) → *"PR 2: Rebrand
to Hexclave (visible)"* for the full per-work-area spec.

## What's implemented (per the plan's PR 2 scope)

- **SDK base URLs** flipped: `defaultBaseUrl` and
`defaultAnalyticsBaseUrl` in
[common.ts](packages/template/src/lib/stack-app/apps/implementations/common.ts:127)
→ `https://api.hexclave.com` / `https://r.hexclave.com`. PR 1's
[`getHardcodedFallbackUrls`](packages/stack-shared/src/utils/urls.tsx:199)
table now keys on the Hexclave domain.

- **Domain inventory sweep** (16 subdomains from the plan): every
`api/app/docs/discord/demo/mcp/skill/feedback/test/preview/r/api2/api.staging/idp-jwk-audience/built-with.stack-auth.com`
reference in production code, docs-mintlify, examples, READMEs, and
contributor guidance flipped to `*.hexclave.com`. Carve-outs: PR 1's
intentional JWT issuer dual-accept table in
[tokens.tsx](apps/backend/src/lib/tokens.tsx), the legacy `./docs/`
folder, the `unified-docs-widget` allowlist (deliberately accepts both
during DNS transition), and `url-targets.ts` hosted-component default
(baked into existing customer deploys).

- **`@deprecated` JSDoc** on every `Stack*` public export
([packages/template/src/lib/stack-app/index.ts](packages/template/src/lib/stack-app/index.ts)
+ [packages/template/src/index.ts](packages/template/src/index.ts)) —
`StackClientApp`, `StackServerApp`, `StackAdminApp` + every
constructor/options/JSON type, `StackHandler`, `StackProvider`,
`StackTheme`, `useStackApp`, `defineStackConfig`, `StackConfig`.
Hexclave\* aliases are now canonical.

- **Runtime `console.warn`**
([packages/template/src/internal/deprecation-warning.ts](packages/template/src/internal/deprecation-warning.ts))
— once-per-process when the SDK is loaded from a `@stackframe/*`
artifact. Detection uses the existing
`STACK_COMPILE_TIME_CLIENT_PACKAGE_VERSION_SENTINEL` (rewritten at build
time to e.g. `js @stackframe/stack@2.8.92` or `js
@hexclave/next@1.0.0`); `@hexclave/*` mirror artifacts short-circuit the
warning.

- **Tier 3 data migration**: new idempotent SQL migration
[`20260523000000_rename_internal_project_to_hexclave`](apps/backend/prisma/migrations/20260523000000_rename_internal_project_to_hexclave/migration.sql)
— updates the internal Project `displayName` 'Stack Dashboard' →
'Hexclave Dashboard' and `description` only if both still hold the
pre-rebrand defaults. Operator-renamed projects untouched, missing row
no-ops, re-runs are no-ops. [`seed.ts`](apps/backend/prisma/seed.ts:87)
default flipped. `getSharedEmailConfig("Stack Auth")` → `("Hexclave")`.

- **Tier 4 brand strings** (mechanical sweep, ~340 files):
- Page + OpenAPI titles (Hexclave API / Dashboard / REST API / Webhooks
API / Documentation). OpenAPI `info.description` documents
`X-Hexclave-*` headers as canonical with compat note on `X-Stack-*`.
- `HexclaveAssertionError` message text
([errors.tsx:71](packages/stack-shared/src/utils/errors.tsx:71)) — "an
error in Stack." → "an error in Hexclave."
- Known-error message templates
([known-errors.tsx](packages/stack-shared/src/known-errors.tsx)) flipped
to lead with `x-hexclave-*` + the new `docs.hexclave.com` URL; legacy
`x-stack-*` mentioned as compat aliases. **25 e2e test files updated in
lockstep**.
- Email content: failed-emails-digest body, sendTestEmail recipient (now
`sent-with-hexclave.com`), test-email-recipient default.
  - `CHANGELOG.md` title → "Hexclave Changelog".
- `AGENTS.md` env var convention: new vars prefix `HEXCLAVE_` /
`NEXT_PUBLIC_HEXCLAVE_` for Category A/B; legacy `STACK_*` explicitly
noted as accepted via PR 1's dual-read.

- **CLI / init wizard**:
- Every dashboard setup snippet, init-stack template, and docs-mintlify
page teaches `npx @hexclave/cli@latest init` (was
`@stackframe/stack-cli`).
[setup-page.tsx](apps/dashboard/src/app/(main)/(protected)/projects/[projectId]/(overview)/setup-page.tsx)
+
[link-existing-onboarding](apps/dashboard/src/app/(main)/(protected)/(outside-dashboard)/new-project/page-client-parts/link-existing-onboarding.tsx).
- [init-stack](packages/init-stack/src/index.ts:634)
`STACK_*_INSTALL_PACKAGE_NAME_OVERRIDE` defaults flipped to
`@hexclave/*`.
- Generated `stack/client.ts` / `stack/server.ts` import from
`@hexclave/next` and reference `HexclaveClientApp` /
`HexclaveServerApp`.
- Internal `StackAuthKeys` dashboard component renamed to
`HexclaveKeys`.

- **docs-mintlify rewrite** (legacy `./docs/` intentionally untouched
per scoping decision):
- **78 MDX files swept**.
`@stackframe/{react,stack,js,tanstack-start,...}` →
`@hexclave/{react,stack,js,...}` in install snippets and code blocks;
`Stack*` SDK class names → `Hexclave*` in all code examples; 'Stack
Auth' brand phrase → 'Hexclave'.
- `openapi/{server,admin,client,webhooks}.json` titles → 'Hexclave REST
API' / 'Hexclave Webhooks API'.

- **Generators flipped before regeneration**:
-
[`packages/stack-shared/src/helpers/init-prompt.ts`](packages/stack-shared/src/helpers/init-prompt.ts),
[`/ai/prompts.ts`](packages/stack-shared/src/ai/prompts.ts),
[`apps/backend/src/lib/ai/prompts.ts`](apps/backend/src/lib/ai/prompts.ts),
[`apps/backend/src/lib/ai/tools/create-email-{template,draft}.ts`](apps/backend/src/lib/ai/tools/create-email-template.ts),
[`apps/skills/src/app/route.ts`](apps/skills/src/app/route.ts) (taught
MCP tool → `ask_hexclave` with compat note; CLI binary teach →
`hexclave`),
[`docs-mintlify/snippets/home-prompt-island.jsx`](docs-mintlify/snippets/home-prompt-island.jsx),
[`packages/template/README.md`](packages/template/README.md) +
integrations/convex/component/README.md.
  - `generate-sdks` propagated changes to `packages/{react,stack,js}`.

- **OpenAPI dual-documentation**:
[`apps/backend/src/app/api/latest/route.ts`](apps/backend/src/app/api/latest/route.ts)
now lists `X-Hexclave-*` headers as primary documented schemas with
`X-Stack-*` duplicates marked `.optional()` (both accepted at runtime by
PR 1's normalize-at-proxy shim).

- **`@stackframe/emails` virtual module**: dual-aliased to
`@hexclave/emails` at the bundler boundary
([email-rendering.tsx:89](apps/backend/src/lib/email-rendering.tsx:89)).
Stored email templates continue to import from either name; new
AI-generated templates and the system prompt teach `@hexclave/emails`.

- **Tier 2 mirror-publish wiring** (new this PR, lays the groundwork for
`@hexclave/*` first publish):
-
[`scripts/rewrite-packages-to-hexclave.ts`](scripts/rewrite-packages-to-hexclave.ts)
— rewrites 9 publishable `@stackframe/*` → `@hexclave/*` `package.json`
files (reads `HEXCLAVE_VERSION` env or `--version=` flag), pins
cross-deps to the shared `@hexclave` version, registers `hexclave` bin
alongside `stack` for `@hexclave/cli`.
-
[`.github/workflows/npm-publish.yaml`](.github/workflows/npm-publish.yaml)
appended with rewrite-then-republish step. `pnpm publish` skips
already-on-npm versions so reruns are safe.

- **Sender email domain**: `noreply@stackframe.co` →
`noreply@sent-with-hexclave.com` (the dedicated transactional-sender
domain split per the plan, to isolate bulk deliverability from
`hexclave.com` reputation); `security@` / `team@stack-auth.com` inbound
mailboxes → `@hexclave.com`.

- **Self-host docs**: docker network / container names in the bash
examples flipped from `stack-auth` to `hexclave` (`hexclave-postgres`,
`hexclave-clickhouse`, `hexclave.env`). The docker image tag
`stackauth/server:latest` stays per the plan's locked decision.

- **GitHub repo slug**: `hexclave/stack-auth` → `hexclave/hexclave` in
every `package.json` `repository` field, README link, CHANGELOG
raw-asset URL.

## Carve-outs (deliberately untouched)

-
**[`apps/backend/src/lib/tokens.tsx`](apps/backend/src/lib/tokens.tsx)**
JWT issuer dual-accept table — PR 1 intentional infrastructure, kept
indefinitely.
- **Legacy `./docs/` folder** — per scoping decision (only
`docs-mintlify/` rewritten).
- **`unified-docs-widget` hostname allowlist** — accepts both
`.hexclave.com` (canonical) and `.stack-auth.com` (transition window)
for DNS rollout.
- **`url-targets.ts`** hosted-domain default
`.built-with-stack-auth.com` — wire identifier baked into existing
customer deploys; indefinite read-fallback.
- **Binary visual assets** (logos, favicons, OG images, README
screenshots) — out of scope for this PR. Need design work; tracked
separately.

## Verification

- **`pnpm typecheck`** on
`packages/{template,stack-shared,react,stack,js}` + `apps/dashboard`:
**all green**. The remaining backend / e-commerce-demo typecheck errors
are pre-existing (Prisma codegen output +
`./generated/api-versions.json` not present in fresh worktrees without
`pnpm run codegen-prisma` + a live DB) and unrelated to this diff.
- **`pnpm lint`** on the same 6 packages: all green.
- **Final grep** for residual `Stack Auth` / `stack-auth.com` /
`@stackframe/stack-cli@latest` references: zero outside the intentional
carve-outs above.
- **25 e2e test files updated in lockstep** with the known-error message
changes (asserted strings flipped to match the new x-hexclave-* +
compat-note messages).

## Deploy blockers (ops sequencing before this rebrand goes live)

This PR is code-complete, but the rebrand's visible surfaces (SDK
default URLs, dashboard links, npm READMEs, REST error messages, runtime
deprecation warning) all point at `*.hexclave.com` / `@hexclave/*`
resources that don't exist yet. None of these are fixable from a PR —
they're ops/registrar/npm work that has to be sequenced before merging
this to a release tag.

Suggested ordering, hardest blockers first:

### Tier 1 — required before customer-facing deploy (everything below
this line *will visibly break customers on day 1* if skipped)

1. **DNS + TLS for `api.hexclave.com` + `api1./api2.hexclave.com`** →
must point at the same backend that serves `api.stack-auth.com` (or a
backend that mirrors PR 1's dual-accept). The SDK's new `defaultBaseUrl`
is `https://api.hexclave.com`; every customer that relied on the old
default and upgrades to a post-PR2 SDK build sends API requests here.
Until this resolves, every default-config customer's API call NXDOMAINs.
2. **DNS for `app.hexclave.com`** → the dashboard. Referenced in the
SDK's default-error messages ("Please create a project on the Hexclave
dashboard at https://app.hexclave.com"), the init-stack flow's
`wizard-congrats` redirect, and the OAuth dashboard handoff.
3. **DNS for `docs.hexclave.com`** + Mintlify deploy → the SDK runtime
deprecation warning (`https://docs.hexclave.com/migration`), every
README, every "Learn more" link in the dashboard, and every REST API
error body (`/api/overview#authentication`) points here. The MDX is in
this PR; the docs build target needs DNS.
4. **DNS for `mcp.hexclave.com`** → the MCP server endpoint that every
taught agent integration (`claude mcp add ...`, `cursor`, `codex`,
`vscode`) registers. Until this resolves, every `npx
@hexclave/cli@latest init` MCP-registration step fails.
5. **Reserve the `@hexclave` npm scope + set repo variable
`HEXCLAVE_VERSION`** → the mirror-publish step in
`.github/workflows/npm-publish.yaml` is gated on this variable. Without
it, the entire taught onboarding command `npx @hexclave/cli@latest init`
404s from the npm registry, *and* every README that says "install
`@hexclave/next`" leads to install failure. Pick the initial version
intentionally (`1.0.0` or aligned to `@stackframe/stack`); don't accept
a silent default.

### Tier 2 — required before announcing the rebrand publicly (lookalike
or low-traffic surfaces, but visibly broken)

6. **DNS for `r.hexclave.com`** → the analytics beacon
`defaultAnalyticsBaseUrl`. Silent failure if missing (analytics drops),
but should land alongside Tier 1.
7. **Register `sent-with-hexclave.com` + full email auth (SPF / DKIM /
DMARC)** → the new default sender domain for shared-sender transactional
emails. Without it the dashboard "send test email" path emits bounces,
and shared-sender flows (`getSharedEmailConfig("Hexclave")`) deliver to
spam at best.
8. **MX + SPF / DMARC for `hexclave.com`** → `team@hexclave.com` and
`security@hexclave.com` mailboxes. The security disclosure mailbox is
referenced in [`.github/SECURITY.md`](.github/SECURITY.md);
`team@hexclave.com` is the actual recipient of internal feedback emails
sent at runtime by
[`apps/backend/src/lib/internal-feedback-emails.tsx`](apps/backend/src/lib/internal-feedback-emails.tsx).
Today, every runtime feedback email bounces.
9. **DNS for `skill.hexclave.com`** → the canonical AI-agent skill fetch
URL (the agent bootstrap pivot). Without it, the entire "agent downloads
`SKILL.md` from a known URL" flow taught in
[`packages/stack-shared/src/helpers/init-prompt.ts`](packages/stack-shared/src/helpers/init-prompt.ts)
fails.
10. **Create `github.com/hexclave/hexclave` as a public repo** (even as
a redirect to `hexclave/stack-auth`) **OR** rewrite every `package.json`
`"repository"` field + dashboard footer "view on GitHub" link to point
at `hexclave/stack-auth` (which already exists). Currently every npm
package page's "Repository" link is dead, and the dashboard's GitHub
button + dev-tool repo link are dead.

### Tier 3 — broken but low-visibility / low-traffic

11. **DNS for `discord.hexclave.com`** → Discord invite redirect, used
in every README's chip and the dashboard footer.
12. **DNS for `demo.hexclave.com`** → " Demo" badge in every npm
package README. Broken-image badge on the package page.
13. **DNS + TLS for `built-with-hexclave.com`** → optional
hosted-handler domain (the default reverted to
`.built-with-stack-auth.com` in this PR's carve-outs, so this only
matters for projects that manually flip).

## Other follow-ups (not deploy-blocking)

- **E2E snapshot regen across the full suite** for the dual-emitted
`x-hexclave-*` response headers (PR 1 follow-up; `vitest -u` in CI
absorbs).
- **Binary visual assets** — logos, favicons, OG images, README
screenshots; need design pass.
- **Backend OpenAPI fumadocs regen** in CI flow — the JSON files in
`docs-mintlify/openapi/` are committed but regen runs in CI. Verify the
workflow that does this still works against the post-PR2 source.
- **Backend typecheck infra debt** — needs `codegen-prisma` +
`codegen-route-info` to clear; pre-existing, unaffected by this PR.

## Test plan

- [ ] CI runs full e2e suite (with `vitest -u` to absorb residual
snapshot deltas, then committed back).
- [ ] Spot-check: new `@hexclave/cli init` (once published) generates
`hexclave.config.ts` and works against a fresh project.
- [ ] Spot-check: existing customer with `@stackframe/stack` import sees
the once-per-process `console.warn` recommending `@hexclave/next` on SDK
init.
- [ ] Manual: dashboard setup page renders the `npx @hexclave/cli@latest
init` snippet and the `x-hexclave-publishable-client-key` API header in
the curl example.
- [ ] Manual: a fresh `pnpm run prisma migrate` against a clean DB sets
the internal project displayName to 'Hexclave Dashboard'.

---------

Co-authored-by: Konstantin Wohlwend <n2d4xc@gmail.com>
2026-05-26 19:18:20 -07:00

800 lines
33 KiB
Plaintext

#cloud-config
hostname: stack-emulator
manage_etc_hosts: true
users:
- name: stack
shell: /bin/bash
sudo: ALL=(ALL) NOPASSWD:ALL
lock_passwd: false
chpasswd:
list: |
root:stack-emulator
stack:stack-emulator
expire: false
ssh_pwauth: false
package_update: true
package_upgrade: false
packages:
- docker.io
- ca-certificates
- curl
- netcat-openbsd
- qemu-guest-agent
write_files:
- path: /usr/local/bin/install-emulator-containers
permissions: '0755'
content: |
#!/bin/bash
set -euo pipefail
mkdir -p /mnt/stack-bundle
bundle_device="$(readlink -f /dev/disk/by-label/STACKBUNDLE)"
mount -o ro "$bundle_device" /mnt/stack-bundle
systemctl enable --now docker
until docker info >/dev/null 2>&1; do sleep 1; done
gzip -dc /mnt/stack-bundle/img.tgz | docker load
if [ -f /mnt/stack-bundle/build.env ]; then
cp /mnt/stack-bundle/build.env /etc/stack-build.env
fi
# build-arch.env lets the guest skip the smoke test on cross-arch TCG.
if [ -f /mnt/stack-bundle/build-arch.env ]; then
cp /mnt/stack-bundle/build-arch.env /etc/stack-build-arch.env
fi
- path: /usr/local/bin/render-stack-env
permissions: '0755'
content: |
#!/bin/bash
set -euo pipefail
mkdir -p /mnt/stack-runtime /run/stack-auth /var/lib/stack-auth
runtime_device="$(readlink -f /dev/disk/by-label/STACKCFG)"
mountpoint -q /mnt/stack-runtime || mount -o ro "$runtime_device" /mnt/stack-runtime
set -a
source /mnt/stack-runtime/runtime.env
source /mnt/stack-runtime/base.env
set +a
# Generate and persist the internal-project keys on first boot; reuse
# across container restarts so the dashboard keeps its internal-project
# session. Reset via `stack emulator reset`.
#
# pck: used by stack-cli to auth against /api/v1/internal/local-emulator/project
# ssk/sak: required by the emulator's own dashboard (StackServerApp
# construction throws without them). Not used by user-app flows; the
# /local-emulator/project route mints separate per-project credentials.
#
# Snapshot-build mode (STACK_EMULATOR_BUILD_SNAPSHOT=1 in /etc/stack-build.env):
# use deterministic placeholder hex strings instead of random values. The
# built image then contains these placeholders; at every `emulator start`
# resume the host generates fresh per-install secrets and
# /usr/local/bin/rotate-secrets (inside the stack container) swaps them in.
umask 077
if [ -f /etc/stack-build.env ] && grep -q '^STACK_EMULATOR_BUILD_SNAPSHOT=1' /etc/stack-build.env 2>/dev/null; then
printf '%s' '00000000000000000000000000000000ffffffffffffffffffffffffffffffff' > /var/lib/stack-auth/internal-pck
printf '%s' '00000000000000000000000000000000eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee' > /var/lib/stack-auth/internal-ssk
printf '%s' '00000000000000000000000000000000dddddddddddddddddddddddddddddddd' > /var/lib/stack-auth/internal-sak
else
for key in internal-pck internal-ssk internal-sak; do
if [ ! -s "/var/lib/stack-auth/$key" ]; then
openssl rand -hex 32 > "/var/lib/stack-auth/$key"
fi
done
fi
INTERNAL_PCK="$(cat /var/lib/stack-auth/internal-pck)"
INTERNAL_SSK="$(cat /var/lib/stack-auth/internal-ssk)"
INTERNAL_SAK="$(cat /var/lib/stack-auth/internal-sak)"
# Container-local dependencies run on localhost. Host-only development
# services (such as the OAuth mock server) are reachable via the QEMU
# user-network host alias.
DEPS_HOST=127.0.0.1
HOST_SERVICES_HOST=10.0.2.2
P="$STACK_EMULATOR_PORT_PREFIX"
# Snapshot-build mode: ship a deterministic placeholder CRON_SECRET so the
# baked VM contains a known-public value that rotate-secrets swaps out on
# every resume. Outside snapshot-build mode, leave CRON_SECRET unset so
# docker/local-emulator/entrypoint.sh generates a fresh random one.
EMULATOR_CRON_SECRET=""
if [ -f /etc/stack-build.env ] && grep -q '^STACK_EMULATOR_BUILD_SNAPSHOT=1' /etc/stack-build.env 2>/dev/null; then
EMULATOR_CRON_SECRET="00000000000000000000000000000000cccccccccccccccccccccccccccccccc"
fi
{
# Static vars from base config and runtime (e.g. API keys, feature flags)
cat /mnt/stack-runtime/base.env
cat /mnt/stack-runtime/runtime.env
printf 'STACK_INTERNAL_PROJECT_PUBLISHABLE_CLIENT_KEY=%s\n' "$INTERNAL_PCK"
printf 'STACK_INTERNAL_PROJECT_SECRET_SERVER_KEY=%s\n' "$INTERNAL_SSK"
printf 'STACK_SEED_INTERNAL_PROJECT_SUPER_SECRET_ADMIN_KEY=%s\n' "$INTERNAL_SAK"
if [ -n "$EMULATOR_CRON_SECRET" ]; then
printf 'CRON_SECRET=%s\n' "$EMULATOR_CRON_SECRET"
fi
# Computed vars — depend on port prefix or deps host
# Host-side ports (for browser URLs — browser runs on host, not in VM)
HP_BACKEND="$STACK_EMULATOR_BACKEND_HOST_PORT"
HP_DASHBOARD="$STACK_EMULATOR_DASHBOARD_HOST_PORT"
HP_MINIO="$STACK_EMULATOR_MINIO_HOST_PORT"
HP_INBUCKET="$STACK_EMULATOR_INBUCKET_HOST_PORT"
# Mock OAuth binds to this port inside the VM and the host forwards the
# same port through, so the OIDC issuer URL is reachable identically
# from the browser and from the backend. Falls back to ${P}14 for
# older ISOs that don't set it.
HP_MOCK_OAUTH="${STACK_EMULATOR_MOCK_OAUTH_HOST_PORT:-${P}14}"
cat <<COMPUTED
STACK_SKIP_MIGRATIONS=true
STACK_SKIP_SEED_SCRIPT=true
NEXT_PUBLIC_HEXCLAVE_PORT_PREFIX=${P}
STACK_RUNTIME_WORK_DIR=/app
STACK_LOCAL_EMULATOR_HOST_MOUNT_ROOT=/host
NEXT_PUBLIC_STACK_API_URL=http://localhost:${HP_BACKEND}
NEXT_PUBLIC_STACK_DASHBOARD_URL=http://localhost:${HP_DASHBOARD}
NEXT_PUBLIC_BROWSER_STACK_API_URL=http://localhost:${HP_BACKEND}
NEXT_PUBLIC_BROWSER_STACK_DASHBOARD_URL=http://localhost:${HP_DASHBOARD}
NEXT_PUBLIC_SERVER_STACK_API_URL=http://127.0.0.1:${P}02
NEXT_PUBLIC_SERVER_STACK_DASHBOARD_URL=http://127.0.0.1:${P}01
NEXT_PUBLIC_STACK_SVIX_SERVER_URL=http://localhost:${HP_BACKEND}
STACK_DATABASE_CONNECTION_STRING=postgres://postgres:PASSWORD-PLACEHOLDER--uqfEC1hmmv@${DEPS_HOST}:5432/stackframe
STACK_EMAIL_HOST=${DEPS_HOST}
STACK_SVIX_SERVER_URL=http://${DEPS_HOST}:8071
STACK_S3_ENDPOINT=http://${DEPS_HOST}:9090
STACK_S3_PUBLIC_ENDPOINT=http://localhost:${HP_MINIO}/stack-storage
STACK_QSTASH_URL=http://${DEPS_HOST}:8080
STACK_CLICKHOUSE_URL=http://${DEPS_HOST}:8123
STACK_EMAIL_MONITOR_VERIFICATION_CALLBACK_URL=http://localhost:${HP_DASHBOARD}/handler/email-verification
STACK_EMAIL_MONITOR_INBUCKET_API_URL=http://${DEPS_HOST}:9001
STACK_OAUTH_MOCK_URL=http://localhost:${HP_MOCK_OAUTH}
STACK_OAUTH_MOCK_PORT=${HP_MOCK_OAUTH}
STACK_FREESTYLE_API_ENDPOINT=http://${DEPS_HOST}:8180
STACK_STRIPE_MOCK_PORT=12111
NEXT_PUBLIC_STACK_STRIPE_PUBLISHABLE_KEY=pk_test_mock_publishable_key_for_local_emulator
BACKEND_PORT=${P}02
DASHBOARD_PORT=${P}01
COMPUTED
} > /run/stack-auth/local-emulator.env
- path: /usr/local/bin/mount-host-fs
permissions: '0755'
content: |
#!/bin/bash
# Mount the host filesystem at /host. Two modes:
# (no args) — cold-boot: bind /host on itself, make it a shared
# mount point, then mount virtio-9p on top. The
# bind+shared step is what lets the docker bind
# mount (-v /host:/host:rshared) receive later
# propagation events.
# --post-resume — snapshot-resume: /host is already shared (set up
# at build time and preserved across the snapshot,
# plus the docker bind mount has rshared
# propagation). The host has just hot-plugged
# virtio-9p; mount it on /host and the new mount
# propagates into the running container.
set -uo pipefail
mkdir -p /host
# Idempotent: bind /host on itself once so it becomes a mount point
# with its own propagation, then make it shared. mount --make-shared
# requires a mount point, hence the bind first.
if ! mountpoint -q /host; then
mount --bind /host /host
fi
mount --make-shared /host
if [ "${1:-}" = "--post-resume" ]; then
if mount -t 9p -o trans=virtio,version=9p2000.L hostfs /host; then
exit 0
fi
echo "post-resume 9p mount failed" >&2
exit 1
fi
# Cold boot. In snapshot-build mode the host detaches virtfs (QEMU
# disallows migration while it's mounted), so the 9p mount may not be
# available — tolerate that and fall through to an empty /host.
if mount -t 9p -o trans=virtio,version=9p2000.L hostfs /host 2>/dev/null; then
exit 0
fi
echo "host filesystem unavailable; continuing with empty /host" >&2
exit 0
- path: /usr/local/bin/run-stack-container
permissions: '0755'
content: |
#!/bin/bash
set -euo pipefail
/usr/local/bin/mount-host-fs
/usr/local/bin/render-stack-env
# Publish the internal publishable client key to the host via 9p so the
# stack-cli can authenticate its bootstrap call to
# /api/v1/internal/local-emulator/project.
set -a
source /mnt/stack-runtime/runtime.env
set +a
if [ -n "${STACK_EMULATOR_VM_DIR_HOST:-}" ] && [ -s /var/lib/stack-auth/internal-pck ]; then
install -m 0600 /var/lib/stack-auth/internal-pck \
"/host${STACK_EMULATOR_VM_DIR_HOST}/internal-pck"
fi
docker rm -f stack >/dev/null 2>&1 || true
# Mirror container stdout/stderr to a host-visible log for debugging.
# The container already bind-mounts /host:/host, so we reuse that path.
# Falls back to stdout (captured by systemd-journald) when no host log is set.
if [ -n "${STACK_EMULATOR_VM_DIR_HOST:-}" ]; then
host_log="/host${STACK_EMULATOR_VM_DIR_HOST}/stack.log"
: > "$host_log" 2>/dev/null || true
exec docker run \
--rm \
--name stack \
--network host \
--add-host host.docker.internal:host-gateway \
--env-file /run/stack-auth/local-emulator.env \
-v stack-postgres-data:/data/postgres \
-v stack-redis-data:/data/redis \
-v stack-clickhouse-data:/data/clickhouse \
-v stack-minio-data:/data/minio \
-v stack-inbucket-data:/data/inbucket \
-v /host:/host:rshared \
stack-local-emulator 2>&1 | tee -a "$host_log"
else
exec docker run \
--rm \
--name stack \
--network host \
--add-host host.docker.internal:host-gateway \
--env-file /run/stack-auth/local-emulator.env \
-v stack-postgres-data:/data/postgres \
-v stack-redis-data:/data/redis \
-v stack-clickhouse-data:/data/clickhouse \
-v stack-minio-data:/data/minio \
-v stack-inbucket-data:/data/inbucket \
-v /host:/host:rshared \
stack-local-emulator
fi
- path: /usr/local/bin/wait-for-deps
permissions: '0755'
content: |
#!/bin/bash
set -uo pipefail
# Hard upper bound across the whole dep wait. Under TCG every service
# init is 5-20x slower than native, so we allow a generous budget, but
# if we cross it something is genuinely stuck and we need to surface it.
DEPS_TIMEOUT="${STACK_DEPS_TIMEOUT:-1500}"
DEPS_CONTAINER="${STACK_DEPS_CONTAINER:-stack-build-init}"
start=$SECONDS
log() { /usr/local/bin/log-provision "wait-for-deps: $*"; }
# name|probe pairs — probe runs through `eval` and must exit 0 when ready.
# No --max-time on these: under slow TCG a service may take >3s to
# respond; let curl wait, outer DEPS_TIMEOUT bounds the whole dep wait.
SERVICES=(
'postgres|nc -z 127.0.0.1 5432'
'clickhouse|curl -sf http://127.0.0.1:8123/ping'
'svix|curl -sf http://127.0.0.1:8071/api/v1/health/'
'minio|curl -sf http://127.0.0.1:9090/minio/health/live'
'qstash|[ "$(curl -s -o /dev/null -w "%{http_code}" http://127.0.0.1:8080/ 2>/dev/null || true)" = "401" ]'
)
dump_diagnostics() {
log "dumping diagnostics for stuck dep wait..."
log "--- docker ps -a ---"
docker ps -a 2>&1 | /usr/local/bin/log-provision-stream "wait-for-deps: ps" || true
log "--- docker logs ${DEPS_CONTAINER} (last 300 lines) ---"
docker logs --tail 300 "$DEPS_CONTAINER" 2>&1 | /usr/local/bin/log-provision-stream "wait-for-deps: deps" || true
log "--- per-service probes (3s timeout) ---"
nc -z -w 3 127.0.0.1 5432 >/dev/null 2>&1 && log "postgres:5432 reachable" || log "postgres:5432 NOT reachable"
curl -sf --max-time 3 http://127.0.0.1:8123/ping >/dev/null 2>&1 && log "clickhouse:8123 reachable" || log "clickhouse:8123 NOT reachable"
curl -sf --max-time 3 http://127.0.0.1:8071/api/v1/health/ >/dev/null 2>&1 && log "svix:8071 reachable" || log "svix:8071 NOT reachable"
curl -sf --max-time 3 http://127.0.0.1:9090/minio/health/live >/dev/null 2>&1 && log "minio:9090 reachable" || log "minio:9090 NOT reachable"
code=$(curl -s -o /dev/null -w '%{http_code}' --max-time 3 http://127.0.0.1:8080/ 2>/dev/null || true)
[ "$code" = "401" ] && log "qstash:8080 reachable (401)" || log "qstash:8080 NOT reachable (code=${code:-none})"
}
wait_for() {
local name="$1" probe="$2" elapsed
local svc_start=$SECONDS
local next_heartbeat=$((svc_start + 30))
while true; do
if eval "$probe" >/dev/null 2>&1; then
elapsed=$((SECONDS - svc_start))
log "${name} ready (${elapsed}s)"
return 0
fi
if [ "$SECONDS" -ge "$next_heartbeat" ]; then
log "still waiting for ${name} ($((SECONDS - svc_start))s elapsed)"
next_heartbeat=$((SECONDS + 30))
fi
if [ "$((SECONDS - start))" -ge "$DEPS_TIMEOUT" ]; then
elapsed=$((SECONDS - start))
log "TIMEOUT waiting for ${name} after ${elapsed}s (hard cap ${DEPS_TIMEOUT}s)"
dump_diagnostics
exit 1
fi
sleep 2
done
}
log "starting dep wait (timeout=${DEPS_TIMEOUT}s)"
for entry in "${SERVICES[@]}"; do
wait_for "${entry%%|*}" "${entry#*|}"
done
log "all deps ready ($((SECONDS - start))s total)"
- path: /etc/stack-build-computed.env
content: |
USE_INLINE_ENV_VARS=true
NEXT_PUBLIC_STACK_API_URL=http://localhost:8102
NEXT_PUBLIC_STACK_DASHBOARD_URL=http://localhost:8101
NEXT_PUBLIC_BROWSER_STACK_API_URL=http://localhost:8102
NEXT_PUBLIC_BROWSER_STACK_DASHBOARD_URL=http://localhost:8101
NEXT_PUBLIC_SERVER_STACK_API_URL=http://127.0.0.1:8102
NEXT_PUBLIC_SERVER_STACK_DASHBOARD_URL=http://127.0.0.1:8101
NEXT_PUBLIC_STACK_SVIX_SERVER_URL=http://localhost:8071
NEXT_PUBLIC_HEXCLAVE_PORT_PREFIX=81
STACK_CLICKHOUSE_DATABASE=default
BACKEND_PORT=8102
DASHBOARD_PORT=8101
- path: /usr/local/bin/log-provision
permissions: '0755'
content: |
#!/bin/bash
set -euo pipefail
msg="$*"
echo "STACK_PROVISION: $msg"
if [ -n "${STACK_PROVISION_LOG_FILE:-}" ]; then
printf '%s\n' "$msg" >> "$STACK_PROVISION_LOG_FILE"
fi
- path: /usr/local/bin/log-provision-stream
permissions: '0755'
content: |
#!/bin/bash
set -uo pipefail
prefix="${1:-}"
while IFS= read -r line; do
/usr/local/bin/log-provision "${prefix}: ${line}"
done
- path: /usr/local/bin/run-build-migrations
permissions: '0755'
content: |
#!/bin/bash
set -euo pipefail
log() { /usr/local/bin/log-provision "$*"; }
log "Starting deps container..."
docker run --rm --name stack-build-init \
--network host \
-e STACK_DEPS_ONLY=true \
-v stack-postgres-data:/data/postgres \
-v stack-redis-data:/data/redis \
-v stack-clickhouse-data:/data/clickhouse \
-v stack-minio-data:/data/minio \
-v stack-inbucket-data:/data/inbucket \
-d stack-local-emulator
log "Waiting for deps (postgres, redis, clickhouse, minio, qstash)..."
/usr/local/bin/wait-for-deps
log "Deps ready."
# Wait for init-services.sh (MinIO buckets, ClickHouse DB creation)
log "Waiting for init-services.sh..."
timeout=120
elapsed=0
while [ "$elapsed" -lt "$timeout" ]; do
if docker exec stack-build-init test -f /var/run/stack-local-init-services.done 2>/dev/null; then
break
fi
sleep 1
elapsed=$((elapsed + 1))
done
if [ "$elapsed" -ge "$timeout" ]; then
log "ERROR: init-services.sh did not finish within ${timeout}s"
exit 1
fi
log "init-services done (${elapsed}s)."
log "Running migrations..."
# Cross-arch TCG mistranslates V8's JIT-emitted arm64, and V8's wasm
# tier-up path trips an InnerPointerToCodeCache check deep in the heap
# (Runtime_WasmTriggerTierUp → StackFrameIterator::Advance crashes
# when Wasm code has been freed while a frame still references it).
# --no-opt keeps JS off TurboFan/Maglev
# --no-wasm-tier-up keeps Wasm on Liftoff (no TurboFan)
# --no-wasm-dynamic-tiering suppresses the tier-up decision runtime call
# --no-wasm-code-gc keeps Wasm code alive across stack walks
# All four are no-ops under KVM, and must be passed on node's CLI
# (NODE_OPTIONS rejects them).
migrate_log="$(mktemp)"
set +e
docker exec \
--env-file /etc/stack-build.env \
--env-file /etc/stack-build-computed.env \
stack-build-init \
sh -c 'cd /app/apps/backend && node --no-opt --no-wasm-tier-up --no-wasm-dynamic-tiering --no-wasm-code-gc dist/db-migrations.mjs migrate && node --no-opt --no-wasm-tier-up --no-wasm-dynamic-tiering --no-wasm-code-gc dist/db-migrations.mjs seed' \
> "$migrate_log" 2>&1
migrate_status=$?
set -e
if [ "$migrate_status" -ne 0 ]; then
log "MIGRATIONS FAILED (exit ${migrate_status}) — last 200 lines of migration output:"
tail -200 "$migrate_log" | /usr/local/bin/log-provision-stream "migrate" || true
rm -f "$migrate_log"
exit "$migrate_status"
fi
rm -f "$migrate_log"
log "Migrations + seed complete."
log "Stopping deps container..."
docker stop stack-build-init || true
log "run-build-migrations done."
- path: /usr/local/bin/slim-docker-image
permissions: '0755'
content: |
#!/bin/bash
set -euo pipefail
log() { /usr/local/bin/log-provision "$*"; }
log "Building slim Docker image..."
docker build -t stack-local-emulator-slim - <<'DOCKERFILE'
FROM stack-local-emulator
RUN rm -rf /app/node_modules /app/apps/backend/dist && \
mv /app/node_modules.standalone /app/node_modules && \
for entry in /app/node_modules/.pnpm/node_modules/*; do \
name="$(basename "$entry")"; \
[ "$name" = ".bin" ] && continue; \
ln -sf ".pnpm/node_modules/$name" "/app/node_modules/$name" 2>/dev/null || true; \
done
DOCKERFILE
log "Slim image built."
# Determine build arch to decide whether to run the smoke test. Cross-arch
# (TCG) builds can't reliably run the Next.js backend inside the smoke
# test container: V8 JIT ↔ QEMU TCG mistranslations crash the process,
# and even with --jitless the backend is too slow to respond within any
# sane timeout. amd64 builds run under KVM and are unaffected.
BUILD_ARCH=""
if [ -f /etc/stack-build-arch.env ]; then
# shellcheck disable=SC1091
. /etc/stack-build-arch.env
BUILD_ARCH="${STACK_EMULATOR_BUILD_ARCH:-}"
fi
if [ "$BUILD_ARCH" = "arm64" ]; then
log "Skipping smoke test: build arch is arm64 and cross-arch TCG can't reliably run the backend."
else
log "Running smoke test on slim image..."
# build.env sets NEXT_PUBLIC_STACK_IS_LOCAL_EMULATOR=true, which makes
# docker/server/entrypoint.sh require the three internal SEED keys.
# At real-VM boot those come from render-stack-env via
# /run/stack-auth/local-emulator.env, but that path doesn't run during
# the build-time smoke test. Mint throwaway hex keys for this container
# only; they must be hex because entrypoint.sh also validates that
# before the internal ApiKeySet bootstrap SQL.
SMOKE_PCK="$(openssl rand -hex 32)"
SMOKE_SSK="$(openssl rand -hex 32)"
SMOKE_SAK="$(openssl rand -hex 32)"
docker run --rm --name smoke-test \
--network host \
--env-file /etc/stack-build.env \
--env-file /etc/stack-build-computed.env \
-e STACK_INTERNAL_PROJECT_PUBLISHABLE_CLIENT_KEY="$SMOKE_PCK" \
-e STACK_INTERNAL_PROJECT_SECRET_SERVER_KEY="$SMOKE_SSK" \
-e STACK_SEED_INTERNAL_PROJECT_SUPER_SECRET_ADMIN_KEY="$SMOKE_SAK" \
-e STACK_SKIP_MIGRATIONS=true \
-e STACK_SKIP_SEED_SCRIPT=true \
-e STACK_RUNTIME_WORK_DIR=/app \
-v stack-postgres-data:/data/postgres \
-v stack-redis-data:/data/redis \
-v stack-clickhouse-data:/data/clickhouse \
-v stack-minio-data:/data/minio \
-v stack-inbucket-data:/data/inbucket \
-d stack-local-emulator-slim
smoke_timeout=300
smoke_elapsed=0
smoke_passed=false
while [ "$smoke_elapsed" -lt "$smoke_timeout" ]; do
code=$(curl -s -o /dev/null -w "%{http_code}" --max-time 3 http://127.0.0.1:8102/health?db=1 2>/dev/null || true)
if [ "$code" = "200" ]; then
smoke_passed=true
break
fi
sleep 2
smoke_elapsed=$((smoke_elapsed + 2))
done
if [ "$smoke_passed" = "false" ]; then
log "SMOKE TEST FAILED: backend /health?db=1 did not return 200 within ${smoke_timeout}s"
log "--- docker ps -a ---"
docker ps -a 2>&1 | /usr/local/bin/log-provision-stream "ps" || true
log "--- smoke-test container logs (last 200 lines) ---"
docker logs --tail 200 smoke-test 2>&1 | /usr/local/bin/log-provision-stream "smoke-test" || true
log "--- free -m ---"
free -m 2>&1 | /usr/local/bin/log-provision-stream "mem" || true
log "--- curl -v /health?db=1 ---"
curl -v --max-time 5 http://127.0.0.1:8102/health?db=1 2>&1 | /usr/local/bin/log-provision-stream "curl" || true
docker stop smoke-test 2>/dev/null || true
exit 1
fi
docker stop smoke-test 2>/dev/null || true
sleep 2
log "Smoke test passed (${smoke_elapsed}s)."
fi
log "Flattening image (docker export/import)..."
docker create --name flatten stack-local-emulator-slim /bin/true
docker export flatten | docker import \
--change 'WORKDIR /app' \
--change 'ENTRYPOINT ["/entrypoint.sh"]' \
--change 'EXPOSE 5432 6379 2500 9001 1100 8071 8123 9009 9090 8080 8101 8102' \
--change 'ENV DEBIAN_FRONTEND=noninteractive' \
- stack-local-emulator:final
log "Flatten done."
log "Saving final image to /var/tmp..."
docker rm flatten
docker save stack-local-emulator:final -o /var/tmp/final-image.tar
mv /var/lib/docker/volumes /var/tmp/volumes-backup
log "Nuking Docker storage and reloading..."
systemctl stop docker containerd
rm -rf /var/lib/docker /var/lib/containerd
systemctl start docker containerd
until docker info >/dev/null 2>&1; do sleep 1; done
docker load -i /var/tmp/final-image.tar
docker tag stack-local-emulator:final stack-local-emulator
docker rmi stack-local-emulator:final || true
rm -f /var/tmp/final-image.tar
systemctl stop docker
rm -rf /var/lib/docker/volumes
mv /var/tmp/volumes-backup /var/lib/docker/volumes
systemctl start docker
log "Docker storage rebuilt."
log "Zeroing free space for qcow2 compression..."
dd if=/dev/zero of=/zero.fill bs=1M 2>/dev/null || true
rm -f /zero.fill
sync
fstrim -av 2>/dev/null || true
log "slim-docker-image done."
- path: /usr/local/bin/wait-for-stack-ready
permissions: '0755'
content: |
#!/bin/bash
# Poll the stack container's backend + dashboard on the guest's own
# localhost until both respond healthy. Used at snapshot-build time to
# gate "emit STACK_SERVICES_READY" on the app actually being warm.
set -uo pipefail
TIMEOUT="${STACK_READY_TIMEOUT:-600}"
BACKEND_PORT="${STACK_READY_BACKEND_PORT:-8102}"
DASHBOARD_PORT="${STACK_READY_DASHBOARD_PORT:-8101}"
log() { /usr/local/bin/log-provision "wait-for-stack-ready: $*"; }
start=$SECONDS
next_heartbeat=$((start + 30))
log "waiting for backend:$BACKEND_PORT and dashboard:$DASHBOARD_PORT (timeout=${TIMEOUT}s)"
while true; do
backend_code=$(curl -s -o /dev/null -w '%{http_code}' --max-time 3 "http://127.0.0.1:${BACKEND_PORT}/health?db=1" 2>/dev/null || true)
dashboard_code=$(curl -s -o /dev/null -w '%{http_code}' --max-time 3 "http://127.0.0.1:${DASHBOARD_PORT}/handler/sign-in" 2>/dev/null || true)
if [ "$backend_code" = "200" ] && [ "$dashboard_code" = "200" ]; then
log "ready ($((SECONDS - start))s)"
exit 0
fi
if [ "$SECONDS" -ge "$next_heartbeat" ]; then
log "still waiting (backend=$backend_code dashboard=$dashboard_code, $((SECONDS - start))s elapsed)"
next_heartbeat=$((SECONDS + 30))
fi
if [ "$((SECONDS - start))" -ge "$TIMEOUT" ]; then
log "TIMEOUT after $((SECONDS - start))s (backend=$backend_code dashboard=$dashboard_code)"
docker ps -a 2>&1 | /usr/local/bin/log-provision-stream "wait-for-stack-ready: ps" || true
docker logs --tail 200 stack 2>&1 | /usr/local/bin/log-provision-stream "wait-for-stack-ready: stack" || true
systemctl status stack.service --no-pager -l 2>&1 | /usr/local/bin/log-provision-stream "wait-for-stack-ready: svc" || true
journalctl -u stack.service --no-pager -n 100 2>&1 | /usr/local/bin/log-provision-stream "wait-for-stack-ready: jrnl" || true
docker image ls 2>&1 | /usr/local/bin/log-provision-stream "wait-for-stack-ready: img" || true
exit 1
fi
sleep 2
done
- path: /usr/local/bin/trigger-fast-rotate
permissions: '0755'
content: |
#!/bin/bash
# Called via qemu-guest-agent on every snapshot resume. Reads fresh
# secrets from stdin (key=value lines, written by the host via QGA's
# guest-exec input-data) and execs rotate-secrets inside the stack
# container with those values exported.
set -euo pipefail
tmp="$(mktemp /var/run/stack-fresh-XXXXXX.env)"
cat > "$tmp"
chmod 0600 "$tmp"
# shellcheck disable=SC1090
set -a
source "$tmp"
set +a
rm -f "$tmp"
exec docker exec \
-e STACK_INTERNAL_PROJECT_PUBLISHABLE_CLIENT_KEY \
-e STACK_INTERNAL_PROJECT_SECRET_SERVER_KEY \
-e STACK_SEED_INTERNAL_PROJECT_SUPER_SECRET_ADMIN_KEY \
-e CRON_SECRET \
stack /usr/local/bin/rotate-secrets
- path: /etc/systemd/system/stack.service
content: |
[Unit]
Description=Hexclave local emulator
Wants=network-online.target docker.service
After=network-online.target docker.service
[Service]
Restart=always
RestartSec=5
TimeoutStartSec=0
ExecStart=/usr/local/bin/run-stack-container
ExecStop=/usr/bin/docker stop stack
[Install]
WantedBy=multi-user.target
- path: /usr/local/bin/provision-build
permissions: '0755'
content: |
#!/bin/bash
set -euo pipefail
if bash /usr/local/bin/mount-host-fs 2>/dev/null; then
export STACK_PROVISION_LOG_FILE=/host/provision.log
: > "$STACK_PROVISION_LOG_FILE"
else
export STACK_PROVISION_LOG_FILE=""
fi
write_marker_to_consoles() {
local marker="$1"
for dev in /dev/console /dev/ttyAMA0 /dev/ttyS0; do
echo "$marker" > "$dev" 2>/dev/null || true
done
}
cleanup() {
local status=$?
if [ "$status" -ne 0 ]; then
if [ -n "${STACK_PROVISION_LOG_FILE:-}" ]; then
printf 'ERROR: provision-build exited with code %s\n' "$status" >> "$STACK_PROVISION_LOG_FILE"
printf '%s\n' "STACK_CLOUD_INIT_FAILED" >> "$STACK_PROVISION_LOG_FILE"
fi
write_marker_to_consoles "STACK_CLOUD_INIT_FAILED"
sync || true
(sleep 2 && shutdown -P now) &
(sleep 15 && poweroff -f) &
fi
}
trap cleanup EXIT
SERIAL=""
for d in /dev/ttyAMA0 /dev/ttyS0; do
[ -c "$d" ] && SERIAL="$d" && break
done
if [ -n "$SERIAL" ]; then
exec > >(tee -a "$SERIAL") 2>&1
fi
log_provision() {
/usr/local/bin/log-provision "$*"
}
log_provision "runcmd starting"
systemctl disable --now ssh || true
systemctl mask ssh || true
# qemu-guest-agent: used by the host to inject fresh secrets + trigger
# rotate-secrets after a snapshot resume. Must be running INSIDE the VM
# at snapshot capture time — the virtio-serial port's "open" state is
# part of the migrated device state. If QGA wasn't connected at capture,
# the resumed VM's port stays closed and the host can't reach it.
systemctl enable qemu-guest-agent || true
systemctl start qemu-guest-agent || true
log_provision "installing emulator containers"
bash /usr/local/bin/install-emulator-containers
systemctl daemon-reload
systemctl enable stack.service
log_provision "starting build migrations"
bash /usr/local/bin/run-build-migrations
log_provision "starting slim-docker-image"
bash /usr/local/bin/slim-docker-image
# Capture mode: bring the stack container up, wait for full
# readiness, emit STACK_SERVICES_READY, then wait indefinitely for the
# host build script to capture VM state over QMP (stop + migrate + quit).
# The VM never shuts itself down in this path — the host tears it down
# once the savevm file has been written.
#
# CI never sets STACK_EMULATOR_CAPTURE_SAVEVM=1 (snapshots aren't
# portable across accelerators, so they're captured locally on first
# `stack emulator pull`). This branch only fires for opt-in local
# builds run with EMULATOR_CAPTURE_SAVEVM=1.
if [ -f /etc/stack-build.env ] && grep -q '^STACK_EMULATOR_CAPTURE_SAVEVM=1' /etc/stack-build.env 2>/dev/null; then
log_provision "capture mode: starting stack.service"
systemctl start stack.service || true
log_provision "waiting for backend + dashboard to be ready"
if ! /usr/local/bin/wait-for-stack-ready; then
log_provision "ERROR: stack services did not become ready"
exit 1
fi
# Ensure qemu-guest-agent is running so its virtio-serial port stays
# "open" in the snapshot — the host needs that port at runtime to
# trigger rotate-secrets.
log_provision "ensuring qemu-guest-agent is up"
systemctl restart qemu-guest-agent || true
sleep 2
if ! systemctl is-active --quiet qemu-guest-agent; then
log_provision "ERROR: qemu-guest-agent failed to start"
systemctl status qemu-guest-agent --no-pager -l 2>&1 | /usr/local/bin/log-provision-stream "qga"
exit 1
fi
log_provision "qemu-guest-agent active"
log_provision "services ready; signalling STACK_SERVICES_READY"
if [ -n "${STACK_PROVISION_LOG_FILE:-}" ]; then
printf '%s\n' "STACK_SERVICES_READY" >> "$STACK_PROVISION_LOG_FILE"
fi
write_marker_to_consoles "STACK_SERVICES_READY"
sync || true
# Clear the EXIT trap so the cleanup path doesn't mark this as failed
# when the host powers us off via QMP quit.
trap - EXIT
# Block forever; host will issue qmp quit after migrate completes.
while true; do sleep 3600; done
fi
log_provision "build pipeline complete"
if [ -n "${STACK_PROVISION_LOG_FILE:-}" ]; then
printf '%s\n' "STACK_CLOUD_INIT_DONE" >> "$STACK_PROVISION_LOG_FILE"
fi
write_marker_to_consoles "STACK_CLOUD_INIT_DONE"
shutdown -P now
runcmd:
- [bash, /usr/local/bin/provision-build]