Pace bulk refresh against the GitHub API quota by sctice-ifixit · Pull Request #473 · iFixit/pulldasher

sctice-ifixit · 2026-06-23T18:18:25Z

Connects #467

Why

The bulk refreshes (bin/refresh-*) share one GitHub token, and one hourly quota, with the live webhook/socket refreshes that keep Pulldasher current. A backfill is a tight loop, so it drains the quota to zero and starves normal operation, taking Pulldasher down. Pulldasher already had @octokit/plugin-throttling and @octokit/plugin-retry, but both only react at exhaustion (429/5xx), which is too late. This adds a proactive pacer that reserves quota headroom and spreads bulk work across the rate-limit reset window.

Where this landed

Only the CLI backfill processes pace, and they gate at the point of real API spend (the queue consumer). The server, the webhook/socket path, single-item refreshes, and the startup open-pulls refresh all run an unpaced noopPacer. Nothing paces the server, so it can't be starved by its own pacing, and the reserve floor is guaranteed headroom for live traffic rather than something the server has to compete for.

New lib/pacer.js, a per-process pacer. observe(headers) records x-ratelimit-remaining/-reset/-used from every response the process makes (so its view stays current). gate() waits before a bulk request. The pure computePace even-spreads over the reset window (interval = (reset - now) / (remaining - reserve)) and hard-pauses below a configurable reserve floor (config.github.bulkReserve, default 1000, 20% of the 5000/hr token). The token-bucket cursor advances by the x-ratelimit-used delta between gates, so a pull's multi-call fan-out pushes bulk's next slot out. Pacing tracks real consumption, and bulk yields automatically.
The pacer is injected per-process, not toggled by a global flag. refresh.js grows a createPacedRefresh() helper (createPacer() + observeRateLimit(pacer) + createRefresh({ pacer })); the 4 bulk bins call it, the single-item refresh-pull/refresh-issue bins get the default unpaced instance. observeRateLimit(pacer) registers the Octokit after/error hooks closing over that pacer, with an install-once guard so a second call doesn't stack duplicate hooks.
Gating happens in the consumer drain and in pacedPaginate between fetch pages. pacedPaginate only gates when there's a next page (rel="next"), so there's no wasted trailing gate that could floor-pause a full reset window after the last page.
config.example.js documents bulkReserve.

How we got here (two rounds)

Round 1: pace the server, at enqueue. The first version paced inside the server process as well an inside the CLI processes, gating as bulk items were pushed onto the shared NotifyQueue. It got there by elimination: gating in the queue's pop handler parks the single shared consumer, so every webhook item queued behind a paused bulk item stalls too (up to the roughly 1h floor-pause). The single queue exists (commit 041e000, 2016) to serialize refreshes and to never refresh the same issue twice concurrently (racing DB writes), so splitting into a separate bulk queue would reintroduce that hazard. Enqueue-side gating kept both guarantees and kept the wait off the consumer. Gating once per item under-paces pulls ~8x (each item spends multiple calls), which is where the x-ratelimit-used delta calibration came from. QA'd on pulldasher-dev, opened as this PR.

Round 2: pace the consumer, CLI-only. Danny's review (r3478201984) pushed toward pacing the actual API spend rather than the enqueue. Reworked so the pacer gates the consumer drain and only runs in the CLI backfill bins, which made the server's startup openPulls unpaced (acceptable: it's startup-only and under a minute). This also dropped the need for enqueue-side gating to protect the consumer, since the server no longer paces at all. The planned usePacer module-global setter became the per-call observeRateLimit(pacer) + createPacedRefresh() shape above (no mutable module global). Commits 67ca46f (consumer-side move), 548eb29 (fold the queue pushes into the drain promise's executor).

Validation (QA)

Deployed to pulldasher-dev. A paced refresh-open-pulls iFixit/server-templates drained 1.5–8s per pull, widening under live webhook bursts (bulk yielding as intended). 58 webhooks received / 53 processed alongside it, 0 restarts, quota moved 4615 → 4561 (the 1000 floor never hit). One transient 401 on the actions-jobs endpoint (unrelated to pacing) was logged and the sweep continued.

The refresh-* backfills and the startup open-pulls refresh share one GitHub token — and one hourly quota — with the live webhook/socket refreshes that keep Pulldasher current. A bulk run is a tight loop, so it drains the quota to zero and starves normal operation, taking Pulldasher down. The throttle plugin only reacts once the quota is already exhausted, which is too late to protect live traffic. Add a proactive pacer (lib/pacer.js). It observes x-ratelimit-* on every response (live calls included, so its view stays fresh) and gates only the bulk path: requests are spread across the rate-limit reset window, and bulk pauses entirely below a configurable reserve floor (config.github.bulkReserve, default 1000). The token-bucket cursor advances by the x-ratelimit-used delta between gates, so a pull's multi-call fan-out — and any live-traffic spike — pushes bulk's next slot out; bulk yields automatically on both signals. Gate at enqueue (pushAllOnQueue) and between fetch pages (pacedPaginate), never inside the shared queue consumer. The wait happens off the queue, so a paced or floor-paused backfill can't park the single consumer that webhook refreshes also depend on — live traffic keeps draining throughout. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

djmetzle

Cool. CR ⏱️

Picks up master (merged into the base branch) under the stacked pacing work. Clean auto-merge. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

ianrohde

CR ⚪

The base branch was changed.

sctice-ifixit · 2026-06-25T19:34:15Z

QA 🌷 per the description.

djmetzle

CR 🏃

danielbeardsley · 2026-06-25T21:16:40Z

I had a feeling that I remember doing this before.. or something very similar (with a quota buffer). Hmm... Looks like we deleted it when we upgraded the octokit api: #211

https://github.com/iFixit/pulldasher/blob/811a7a222b38c43e2a85009f4c282141aba4713e/lib/rate-limit.js

Anyway, the octokit plugin supports throttling across instances but it requires a common storage medium (redis or similar) to coordinate.

sctice-ifixit · 2026-06-25T22:00:19Z

Anyway, the octokit plugin supports throttling across instances but it requires a common storage medium (redis or similar) to coordinate.

I believe this pull's approach should work across instances using the headers we get back from GH as the coordination mechanism.

danielbeardsley · 2026-06-26T19:49:49Z

deploy_block 👍 on the discussion above.

The first pass gated at enqueue (in `pushAllOnQueue`) and ran the server's startup `openPulls()` refresh through that same paced path. Both share the one queue consumer, so a floor-pause (waiting out the reserve once quota drops below the floor) could park the consumer that live webhook and socket refreshes also drain through, taking pulldasher down for the duration. Move the gate onto the consumer drain instead, where each pull's `parse()` fans out to the dozen-plus API calls that are the bulk of a backfill's spend. Only the CLI backfill bins install a real pacer (`createPacedRefresh`); the server, webhooks, and socket refreshes run an unpaced no-op pacer, so the gate never fronts live traffic. In a bin process the consumer drains nothing but bulk, so parking it on a paced slot starves nothing. Drop the mutable module-global pacer from `git-manager` in favor of explicit injection. `observeRateLimit(pacer)` installs a one-time quota observer on the shared Octokit client so the deep `parse()` fan-out (which the caller can't see) still feeds the pacer; the list functions take an optional pacer to pace pagination. `pacedPaginate` gates only between pages, never after the last. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

sctice-ifixit · 2026-06-30T23:04:40Z

un_deploy_block 🌷

The split (push the items, then `return new Promise`) was load-bearing only while enqueueing was async: the old loop awaited `pacer.gate()` between pushes, which can't run inside a synchronous Promise executor. With pacing moved to the consumer the loop is a plain `forEach`, so the pushes fold into the executor. It runs synchronously, so the queue ordering (items, then the resolve sentinel) is unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

ianrohde

No blockers.

Optional follow-up: lib/git-manager.js:22 console.log(config.github) dumps the oauth token to stdout. Not introduced here, but adjacent to the diff.

danielbeardsley · 2026-07-01T00:03:05Z

I read deeply through this and I approve!

However, unless I'm missing something, we'd still want to use a larger bulkReserve for cli-launched scripts vs the config used for the server.

Right now, I think it's possible that a cli script can starve out the front-end (Two separate pacers). If they have different bulk-reserves then the the cli refresh will only proceed while the frontend is idle and the token bucket builds back up.

sctice-ifixit · 2026-07-01T00:22:30Z

Right now, I think it's possible that a cli script can starve out the front-end (Two separate pacers).

Right now, the CLI is the only path that is paced, and it will never (modulo bugs) permit using any of the last 1000 quota available per hour. So once a CLI script sees there are less than 1000 requests remaining, it will pause until the quota is refilled (GH tells us when that will be in the headers). So the front-end should always have at least 1000 requests to work with in any given hour (in practice more), and it always goes as fast as it can. If we think 1000 isn't conservative enough, we can up it.

danielbeardsley · 2026-07-01T17:07:29Z

Right now, the CLI is the only path that is paced

Ah! Good point:

function createRefresh({ pacer = noopPacer }

The default is a noopPacer and thus pacing is opt-in and only the cli commands opt-in.

CR 👍

sctice-ifixit · 2026-07-01T18:07:59Z

QA 🌷 I updated the description to match where we ended up and re-ran QA per the description.

djmetzle previously approved these changes Jun 23, 2026

View reviewed changes

Merge backfill-transient-error-tolerance into backfill-quota-pacing

751c600

Picks up master (merged into the base branch) under the stacked pacing work. Clean auto-merge. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

ianrohde previously approved these changes Jun 23, 2026

View reviewed changes

Base automatically changed from backfill-transient-error-tolerance to master June 25, 2026 16:29

ianrohde previously approved these changes Jun 25, 2026

View reviewed changes

djmetzle previously approved these changes Jun 25, 2026

View reviewed changes

danielbeardsley reviewed Jun 25, 2026

View reviewed changes

Comment thread lib/refresh.js Outdated

danielbeardsley reviewed Jun 25, 2026

View reviewed changes

Comment thread lib/refresh.js Outdated

sctice-ifixit dismissed stale reviews from djmetzle and ianrohde via 67ca46f June 30, 2026 23:04

ianrohde approved these changes Jun 30, 2026

View reviewed changes

sctice-ifixit merged commit d2b1b77 into master Jul 1, 2026
1 check passed

sctice-ifixit deleted the backfill-quota-pacing branch July 1, 2026 18:08

Uh oh!

Conversation

sctice-ifixit commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why

Where this landed

Validation (QA)

Uh oh!

djmetzle left a comment

Choose a reason for hiding this comment

Uh oh!

ianrohde left a comment

Choose a reason for hiding this comment

Uh oh!

sctice-ifixit commented Jun 25, 2026

Uh oh!

djmetzle left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

danielbeardsley commented Jun 25, 2026

Uh oh!

sctice-ifixit commented Jun 25, 2026

Uh oh!

Uh oh!

danielbeardsley commented Jun 26, 2026

Uh oh!

sctice-ifixit commented Jun 30, 2026

Uh oh!

ianrohde left a comment

Choose a reason for hiding this comment

Uh oh!

danielbeardsley commented Jul 1, 2026

Uh oh!

sctice-ifixit commented Jul 1, 2026

Uh oh!

danielbeardsley commented Jul 1, 2026

Uh oh!

sctice-ifixit commented Jul 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

sctice-ifixit commented Jun 23, 2026 •

edited

Loading