Three independent correctness bugs that all violated "decide before
publishing": each surface (wire reply, sysfs path, broadcast state)
committed to a state the code later had to contradict.
darwin: switch IOUSBHostPipe abort to IOUSBHostAbortOptionSynchronous and
add the EP-0 path via abortDeviceRequestsWithOption (previously a silent
no-op for control transfers). After abort returns, await a per-pending
drained channel before writing RET_UNLINK so the late RetSubmit
suppression in finishSubmit has actually taken effect — the drained
channel is allocated lazily by markSubmitUnlinked, so the hot submit
path is unchanged.
linux: collapse the per-controller VHCI abstraction. The kernel exposes
attach/detach/status/status.N only on vhci_hcd.0 with globally unique
port numbers; the previous code wrote vhci_hcd.N/attach for N>0 and
silently failed past the primary controller's port range. Glob status*
on the primary, key reservations on bare port int, and carry the
status-suffix as a diagnostic-only secondary index for Description().
export_ledger: ConsumeLeaseAndReserve no longer publishes busy=true and
then conditionally rolls it back. Read seq under fast first, then make
every decision (including the lease.Generation check) inside one slow
critical section before any busy mutation. This removes the window
where a concurrent BroadcastIfChanged could observe transient busy and
poison l.state with no corrective broadcast on rollback.
Address `golangci-lint` modernize and unused warnings:
- replace map copy loops with `maps.Copy`
- collapse `if x > y { x = y }` clamps to `min`/`max`
- switch `for i := 0; i < n; i++` to `for i := range n`
- use `t.Context()` in the relay handoff test
- delete the unused `removeOnce` field and `openBinaryDevice` helper
`darwinExportHost.Events` closed its producer channel after destroying
the IOKit notification port, but `IONotificationPortDestroy` does not
synchronously drain the dispatch queue: a callback block already
executing or enqueued could still call `signal()` on the closed
channel, or resolve the `cgo.Handle` after `ref.Delete()`. Retain the
serial dispatch queue inside the watcher and `dispatch_sync` an empty
block before releasing it, so no native callback is in flight when
the Go side proceeds.
`ServerService.Close` cancelled the context and immediately closed the
`ExportHost`. On Darwin that calls `IOUSBHostDevice destroy` for every
capture while `darwinServerDataSession.handleSubmit` goroutines may
still be inside `IOUSBHostPipe` requests on the same device. Track
active sessions, close them explicitly, and wait for the per-session
cleanup goroutines via a WaitGroup before tearing down the host.
Darwin URBs handled in concurrent goroutines mutated BoxUSBHostDevice's
pipes/interfaces dictionaries without serialization, so SetConfiguration
racing with endpoint IO could corrupt them or hand back stale pipes.
Guard the collections with os_unfair_lock; keep the slow IOKit calls
outside the lock so unrelated endpoints don't stall.
Iso IN responses were flat-copied buffer[:actual], losing offsets for
multi-packet transfers. Pack the wire payload per IsoPackets[i].Offset/
ActualLength to match Linux's stub_tx.c, and mirror the scatter on the
Darwin client side so a future multi-packet emitter is correct.
On Linux the server was closing the userspace conn inside
NewServerDataSession (handoff.Start closes h.conn synchronously in
direct-TCP mode), so the subsequent WriteOpRepImport always wrote to a
closed socket and every default-mode import failed. Split DataSession
into prepare + Start: the server now writes the reply on the live conn
and only then calls session.Start() to hand the wire to the kernel.
Close also closes h.conn now so the prepare-then-Close error path
doesn't leak the fd.
On Darwin FinishImport returned false after successfully re-capturing
a pending replacement for a stale busy export, so the server skipped
reconcileAndBroadcast and the ledger kept the device hidden until
another IOKit topology event. Return true to trigger reconcile; the
next Reconcile fast-paths the matching registryID back into the
snapshot.
Three independent regressions that silently degraded the USB/IP feature:
- Darwin: when a busy device re-enumerated (same busid, new IOUSBHost
registry ID), Reconcile marked the export stale but never captured
the replacement, and nothing reconciled after FinishImport — the
device only reappeared on an unrelated future topology event. Now
Reconcile records the pending registry ID on the stale entry,
FinishImport rehydrates inline, and the server triggers a
reconcileAndBroadcast after release as a safety net.
- DEVLIST: SerialString and entryDeviceKey expected a "\0serial=..."
trailer in the 256-byte Path field, but encodePathField never wrote
one, so serial-based matching silently failed against plain usbipd
peers and against sing-box peers that fell back to DEVLIST polling.
encodePathField now writes the trailer (wire-compatible with kernel
usbip-utils, which NUL-terminate the path) and ReadOpRepDevListBody
populates DeviceEntry.Serial on decode.
- Control: any write/read I/O failure during the CONTROL handshake was
wrapped as errControlUnsupported, so a single TCP RST or 5s deadline
on startup permanently downgraded the client to 5-second DEVLIST
polling. Now only io.EOF on the ACK read and frame-validation
failures stay unsupported; other I/O failures retry with
exponential backoff (1s→30s) up to 3 consecutive transients before
committing to the polling fallback.
Drop the ConfirmImport reference left over from a prior design (the
function never existed) and remove three doc-comments that restated
the function name or narrated visible lock dances.
Apply the <10-LoC / <3-non-test-caller heuristic across the package:
inline writeControlFrame, removeLeaseWaiter, vhciAttach, vhciHubForSpeed,
applyControlSnapshot, applyRemoteEntries, shouldRetryBusID, entrySerial,
and the trivial darwin/linux import-host constructors at their call
sites.
Delete the darwinClientSession wrapper -- darwinVirtualController already
satisfies DataSession, so attach the Description() method to it directly
and return the controller from darwinImportHost.Attach.
Fold the 11-line data_session.go into host.go beside AttachedSession,
which embeds DataSession.
Inline thin pass-through helpers (< 10 lines, < 3 non-test callers) and
the writeSysfs path wrappers; delete the test-only vhciUsedPorts plus
its singleflight machinery and the never-called clientAssignment.IsActive.
Remove four private single-impl interfaces (usbEventListener,
darwinUSBHostDeviceWatch, darwinServerDataDevice, darwinEndpointStateMachine)
and both function-table DI seams (usbipOps, darwinServerOps); production calls
the underlying functions directly. Inline ~25 single-call helpers and fuse
subordinate routines into their sole caller. Collapse single-field helper
structs into their underlying types. Fold server_linux.go/server_darwin.go
into host_linux.go/host_darwin.go.
Delete change-detector tests per .claude/rules/code-test.md: parse round-trip,
mirror, input-validation, and builder-property suites that never exercised
real syscalls, network, or cgo. Six test files removed; linux_test.go slimmed
to three real-system tests; darwin_integration_test.go retains the six
real-cgo tests. Surviving suite: 16 tests that all exercise real OS APIs or
spawn the official Linux usbip server.
Net: -5100 LoC (~37% of the package).
The ExportHost.Reconcile changed bool is computed by both Linux and
Darwin implementations but discarded at the only call site
(reconcileAndBroadcast); the ledger already detects change in
BroadcastIfChanged. linuxImportHost.trackPort had a dead add=true
branch — all callers passed false. darwinExport.Snapshot was setting
StatusReason to the same string as State (tautological). LeaseCheck
for stale captures returned the State name as the reason.
Also trim narrative comments in export_ledger.go that referenced
refactor history or the deleted lease.go file, inline the trivial
checkAvailableUnderLock helper at its only caller, and replace
cloneIsoPackets with slices.Clone.
ServerService used to own three mutexes coordinating six pieces of
mutable state (exports, busy, controlState, controlSeq, controlSubs,
LeaseManager). The lock-ordering rule that fixed the import/attach
race in 0c892411a lived only as a comment. Two-mutex (fast/slow)
ledger absorbs all of it; the two mutexes are never held
simultaneously, so the ordering rule becomes a property of the type
instead of a discipline callers must observe. TryReserveForImport
closes the window where two concurrent OpReqImports could both pass
the busy check before either marked busy.
Collapse ServerService and ClientService to one definition each by
hiding Linux vs Darwin behind ExportHost, ImportHost, and Export
seams. Along the way, extract three state machines that previously
lived as scattered fields on the service structs:
- LeaseManager owns its own mutex and closes the
availability-vs-insert TOCTOU by checking export busy inside Issue
under the same lock as the insert.
- DataSession gives the three per-import data-plane implementations
(Linux kernel handoff, Darwin server data session, Darwin virtual
controller) a uniform Done/Err/Close interface.
- clientAssignment encapsulates the matched/import-all target state
and exposes ApplyMatched/ApplyAll diffs to ClientService, which
keeps worker goroutine lifecycle.
Service busy tracking moves off the per-platform serverExport struct
onto ServerService.busy, since it follows the lease/import lifecycle
rather than physical claim/release. linux_test.go is migrated to
construct ServerService and ClientService through the new host
interfaces.