Fix tc class/filter leaks on the bridge and reconcile them periodically#289
Fix tc class/filter leaks on the bridge and reconcile them periodically#289hiroTamada wants to merge 2 commits into
Conversation
Per-VM HTB classes and basic filters leaked on busy hosts, and the filter list is walked linearly for every VM-egress packet, degrading all VM network throughput as leaks accumulate. Root causes fixed: - Cleanup paths with no persisted class ID fell back to the hash-derived ID, deleting the wrong class/filter whenever the original add had been collision-probed to a different ID. The class is now resolved from the bridge filter table via the TAP's ifindex (the authoritative tap->class link); when nothing resolves, cleanup defers to the reconciler instead of guessing. - removeVMClass swallowed every delete error; failures are now logged and counted (hypeman_network_tc_cleanup_failures_total). - CleanupOrphanedClasses only ran at process startup and never swept dangling filters. It now runs every TAP GC tick under the tc mutex, sweeps filters not anchored to a live TAP, and counts removals (hypeman_network_tc_orphans_swept_total). It bails if no rt_iif matches parse, so an unexpected tc output format can't trigger a sweep of live state. - The persisted classid is cleared after instance release so the routine stop-then-delete double release doesn't re-delete the class. scripts/sweep-stale-tc.sh applies the same sweep out-of-band (dry-run by default) to recover already-polluted hosts without a restart. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
Created a monitoring plan for this PR. What this PR does: Stops VM network throughput from degrading on busy hosts by fixing stale tc class/filter leaks on the hypeman bridge and running periodic reconciliation every 60s. Intended effect:
Risks:
Status updates will be posted automatically on this PR as monitoring progresses. |
A persisted classID can go stale (saveClassID failing after a collision-probed add), in which case deleting by it removes the wrong class — or worse, one belonging to another live VM if the stale ID collides. Resolve the class from the TAP's own filter when one exists, and skip flowid-matched deletes whose anchor is a live interface. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 4ea5f81. Configure here.
| "bridge", bridgeName, "tap", tapName, "class", fullClassID, | ||
| "error", err, "output", strings.TrimSpace(string(output))) | ||
| m.recordTCCleanupFailure(ctx, "class_del") | ||
| } |
There was a problem hiding this comment.
Filter list failure skips safety
Medium Severity
When listBridgeFilters fails in removeVMClass, cleanup still proceeds to delete the HTB class using only the persisted classID. That bypasses the new filter-table checks (including classClaimedByOther) meant to avoid removing another VM’s class after collision probing, reintroducing the wrong-ID delete this change is meant to fix.
Reviewed by Cursor Bugbot for commit 4ea5f81. Configure here.


Summary
Per-VM HTB classes and
basicfilters leak on the bridge on busy hosts. Since the bridge filter list is evaluated linearly for every VM-egress packet, accumulated leaks degrade network throughput for every VM on the host (observed: ~1,880 stale class/filter pairs vs ~6 live VMs on one production host).Leak mechanisms fixed
Wrong-ID deletes. Cleanup paths that have no persisted class ID (periodic TAP reaper, orphaned-TAP cleanup, recreate-existing-TAP, TAP-create failure) fell back to the hash-derived class ID. Whenever the original
tc class addhad been collision-probed tohash+N, these paths deleted the wrong ID and leaked the real class and its filter — a feedback loop, since more stale classes cause more probing.removeVMClassnow resolves the class from the bridge filter table via the TAP's ifindex (meta(rt_iif eq N)is the authoritative tap→class link). When neither a persisted ID nor a filter resolves, it deletes nothing and defers to the reconciler instead of guessing.Silent delete failures.
removeVMClassran everytcdelete best-effort with no logging. Failures are now logged and counted viahypeman_network_tc_cleanup_failures_total{operation}.Reconcile never ran and missed filters.
CleanupOrphanedClassesonly ran at process startup and iterated classes only, so dangling filters (class deleted, filter left, or TAP gone) were invisible to it. It now:rt_iifematch parses, so an unexpectedtcoutput format can't cause a sweep of live statehypeman_network_tc_orphans_swept_total{kind}Double-release noise. The persisted classid file is cleared after an instance-scoped release, so the routine stop→delete double release no longer re-deletes the class.
Out-of-band recovery
scripts/sweep-stale-tc.shapplies the same sweep logic standalone (dry-run by default,--applyto delete) to recover already-polluted hosts immediately, without waiting for a deploy/restart. Candidates are snapshotted before keep-sets so anything allocated mid-run is never considered stale.Test plan
go build/go vetonlib/networkandlib/instancesgo test ./lib/network/(includes newparseBridgeFiltersunit tests)bash -n+ dry-run ofscripts/sweep-stale-tc.shmake test-linuxintegration suite — not run locally (requires KVM + HTB kernel modules not available in my environment); needs CI or a metal hosttc class show dev vmbr0 | wc -l, then--applyand confirmhypeman_network_tc_class_collisions_totaland softirq CPU drop🤖 Generated with Claude Code
Note
Medium Risk
Changes Linux bridge
tccleanup and periodic sweeps that delete kernel state; safeguards exist but mis-parsing or races could still affect live VMs’ rate limits.Overview
Fixes leaked bridge HTB classes and
tcfilters that slow every VM’s egress by growing the per-packet filter walk.Release and delete paths now resolve the real class via bridge filters (
meta(rt_iif eq …)), avoid deleting another VM’s class when collision-probed IDs differ from the hash, log failedtcdeletes, and clear persistedclassidafter instance release so stop→delete does not repeat cleanup.Periodic reconciliation: the 60s TAP GC tick also runs
CleanupOrphanedClasses(undertcMu), sweeping dangling filters then orphan classes using live TAP ifindexes, allocations, and parse safety checks. New metrics track cleanup failures and swept orphans;sweep-stale-tc.shoffers the same logic for one-off host recovery (dry-run by default).Reviewed by Cursor Bugbot for commit 4ea5f81. Bugbot is set up for automated code reviews on this repo. Configure here.