emulator: only use -cpu cortex-a72 for cross-arch TCG

Same-arch TCG (e.g. arm64 guest on the arm64 ubuntu-24.04-arm runner
that has no nested virt) was falling through to -cpu cortex-a72 too.
Empirically that hangs wait-for-deps indefinitely — services never
reach a ready state — probably because QEMU's TCG emulation of named
CPU models is less well-tested than -cpu max, especially for the LSE
atomic fallback paths the dep services exercise.

The cortex-a72 workaround is only needed for cross-arch TCG, where V8
emits JIT instructions the amd64 host's TCG mistranslates. Restrict
it to that case; same-arch TCG now gets -cpu max, matching the known
working config from the diagnostics branch run on ubuntu-24.04-arm.
This commit is contained in:
Bilal Godil 2026-04-10 17:02:07 -07:00
parent 54ecd7c554
commit 5c3c436489

View File

@ -112,17 +112,25 @@ qemu_cmd_prefix_for_arch() {
case "$arch" in
arm64)
local accel="tcg"
# Under TCG (software emulation on an amd64 host) -cpu max advertises
# armv8.5+ features (PAC, BTI, SVE, LSE atomics…) that V8 happily emits
# JIT code for, but QEMU TCG mistranslates some of those instructions
# and the node process crashes with SIGTRAP during migrations. Falling
# back to cortex-a72 limits V8 to armv8.0-a, which TCG handles cleanly.
local cpu="cortex-a72"
local cpu="max"
if [ "$HOST_ARCH" = "arm64" ]; then
# Same-arch: prefer hardware acceleration, keep -cpu max. If no
# accelerator is available (e.g. Azure arm64 runners with no
# nested virt) we fall through to TCG, but same-arch TCG handles
# -cpu max correctly and more named CPU models have TCG bugs
# than -cpu max does.
case "$HOST_OS" in
darwin) accel="hvf"; cpu="max" ;;
linux) [ -w /dev/kvm ] && { accel="kvm"; cpu="max"; } ;;
darwin) accel="hvf" ;;
linux) [ -w /dev/kvm ] && accel="kvm" ;;
esac
else
# Cross-arch TCG (amd64 host emulating arm64 guest): -cpu max
# advertises armv8.5+ features (PAC, BTI, SVE, LSE…) that V8
# emits JIT code for, but the host's TCG mistranslates some of
# those instructions across architectures and node crashes with
# SIGTRAP during migrations. Dropping to cortex-a72 limits V8
# to armv8.0-a which cross-arch TCG handles cleanly.
cpu="cortex-a72"
fi
local firmware
firmware="$(find_aarch64_firmware)"