mirror of https://github.com/tailscale/tailscale.git synced 2026-06-06 21:01:11 +08:00

History

Brad Fitzpatrick b75921a7cb derp/derpserver: cache local peer lookups per client Avoid taking Server.mu for repeated sends from a client to the same small set of local peers. Each sclient now keeps a bounded, goroutine-local LRU of destination public key to clientSet. To cap memory for idle clients, cache entries track a coarsely updated last-used time. The hot path refreshes that timestamp only when it is older than 30 seconds, and incoming ping frames trim entries idle for more than 10 minutes. This keeps cleanup on the client run goroutine without adding another mutex or background goroutine. cmd/derper gets new --peer-cache-max-entries and --peer-cache-max-idle flags. Their zero values use the automatic defaults, and --peer-cache-max-entries=-1 disables the cache. The peer_lookup_cache_misses counter tracks how often lookupDest falls back to the authoritative Server.mu lookup. We do not count hits on the hot path; when the cache is enabled, hits can be derived from packets_received minus peer_lookup_cache_misses. This optimization is pulled out of the larger #13510 DERP flow-tracking work from 2024, which did a bunch more. We can rebase that bigger PR later and discuss its stats and memory impact on its own merits without losing this standalone optimization. The benchmark compares the same code with TS_DEBUG_DERP_DISABLE_PEER_CACHE set true for the before run and the default cached path for the after run: TS_DEBUG_DERP_DISABLE_PEER_CACHE=true go test ./derp/derpserver -run '^$' -bench '^BenchmarkLookupDestPeerCache$' -benchtime=2s -count=10 > before go test ./derp/derpserver -run '^$' -bench '^BenchmarkLookupDestPeerCache$' -benchtime=2s -count=10 > after go run golang.org/x/perf/cmd/benchstat@latest before after goos: linux goarch: amd64 pkg: tailscale.com/derp/derpserver cpu: Intel(R) Xeon(R) 6975P-C │ before │ after │ │ sec/op │ sec/op vs base │ LookupDestPeerCache-16 180.400n ± 0% 5.720n ± 1% -96.83% (p=0.000 n=10) │ before │ after │ │ B/op │ B/op vs base │ LookupDestPeerCache-16 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ ¹ all samples are equal │ before │ after │ │ allocs/op │ allocs/op vs base │ LookupDestPeerCache-16 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ ¹ all samples are equal Updates #3560 Change-Id: Ie31b540447211fd9415eea6cc235b83a87930093 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>		2026-05-12 13:04:07 -07:00
..
derpconst	all: remove AUTHORS file and references to it	2026-01-23 15:49:45 -08:00
derphttp	all: use Go 1.26 things, run most gofix modernizers	2026-03-06 13:32:03 -08:00
derpserver	derp/derpserver: cache local peer lookups per client	2026-05-12 13:04:07 -07:00
xdp	.github, tool/listpkgs: automatically find tests which use tstest.RequireRoot	2026-04-10 16:22:05 -07:00
client_test.go	derp: use AvailableBuffer for WriteFrameHeader, consolidate tests (#19101 )	2026-03-24 18:08:01 -04:00
derp_client.go	derp: align FrameType docs casing	2026-04-07 16:14:26 -07:00
derp_test.go	cmd/derper,derp: add metrics for rate limit hits (#19560 )	2026-04-29 10:29:09 -07:00
derp.go	derp: align FrameType docs casing	2026-04-07 16:14:26 -07:00
export_test.go	all: remove AUTHORS file and references to it	2026-01-23 15:49:45 -08:00
README.md	derp: add a README.md with some docs	2023-05-02 13:42:25 -07:00

README.md

DERP

This directory (and subdirectories) contain the DERP code. The server itself is in ../cmd/derper.

DERP is a packet relay system (client and servers) where peers are addressed using WireGuard public keys instead of IP addresses.

It relays two types of packets:

"Disco" discovery messages (see ../disco) as the a side channel during NAT traversal.
Encrypted WireGuard packets as the fallback of last resort when UDP is blocked or NAT traversal fails.

DERP Map

Each client receives a "DERP Map" from the coordination server describing the DERP servers the client should try to use.

The client picks its home "DERP home" based on latency. This is done to keep costs low by avoid using cloud load balancers (pricey) or anycast, which would necessarily require server-side routing between DERP regions.

Clients pick their DERP home and report it to the coordination server which shares it to all the peers in the tailnet. When a peer wants to send a packet and it doesn't already have a WireGuard session open, it sends disco messages (some direct, and some over DERP), trying to do the NAT traversal. The client will make connections to multiple DERP regions as needed. Only the DERP home region connection needs to be alive forever.

DERP Regions

Tailscale runs 1 or more DERP nodes (instances of cmd/derper) in various geographic regions to make sure users have low latency to their DERP home.

Regions generally have multiple nodes per region "meshed" (routing to each other) together for redundancy: it allows for cloud failures or upgrades without kicking users out to a higher latency region. Instead, clients will reconnect to the next node in the region. Each node in the region is required to to be meshed with every other node in the region and forward packets to the other nodes in the region. Packets are forwarded only one hop within the region. There is no routing between regions. The assumption is that the mesh TCP connections are over a VPC that's very fast, low latency, and not charged per byte. The coordination server assigns the list of nodes in a region as a function of the tailnet, so all nodes within a tailnet should generally be on the same node and not require forwarding. Only after a failure do clients of a particular tailnet get split between nodes in a region and require inter-node forwarding. But over time it balances back out. There's also an admin-only DERP frame type to force close the TCP connection of a particular client to force them to reconnect to their primary if the operator wants to force things to balance out sooner. (Using the (*derphttp.Client).ClosePeer method, as used by Tailscale's internal rarely-used cmd/derpprune maintenance tool)

We generally run a minimum of three nodes in a region not for quorum reasons (there's no voting) but just because two is too uncomfortably few for cascading failure reasons: if you're running two nodes at 51% load (CPU, memory, etc) and then one fails, that makes the second one fail. With three or more nodes, you can run each node a bit hotter.