Add S3-FIFO RAM cache eviction algorithm (ram_cache.algorithm = 2)#13255
Open
phongn wants to merge 2 commits into
Open
Add S3-FIFO RAM cache eviction algorithm (ram_cache.algorithm = 2)#13255phongn wants to merge 2 commits into
phongn wants to merge 2 commits into
Conversation
S3-FIFO (Yang et al., SOSP 2023) is a FIFO-based eviction policy: a small admission queue and a main queue (a 2-bit clock), plus a ghost queue of recently evicted keys. The small queue and ghost filter one-hit-wonders, giving scan resistance and strong hit rates on CDN and key-value workloads at low cost -- a hit needs no list reordering. Selectable as ram_cache.algorithm = 2 alongside CLFUS (0) and LRU (1). The policy is byte-budgeted; its eviction metadata (the ghost included, bounded by object size and an entry-count cap) is accounted within ram_cache.size so total memory stays within the configured budget. Like LRU and CLFUS it enforces one resident copy per key (a put with a new aux key discards the stale one) and allocates entries from a per-thread ProxyAllocator (Thread.h). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Pull request overview
This PR adds S3-FIFO as a third selectable RAM cache eviction policy (proxy.config.cache.ram_cache.algorithm = 2) alongside existing CLFUS (0) and LRU (1), wires it into cache initialization, extends the regression test to cover it, and updates admin documentation accordingly.
Changes:
- Introduces a new
RamCacheimplementation (RamCacheS3FIFO) and integrates it into the RAM-cache factory selection. - Extends configuration validation / constants to support algorithm value
2, and logs the chosen algorithm at cache init. - Updates the nightly
ram_cacheregression test and admin docs to include S3-FIFO.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| src/records/RecordsConfig.cc | Extends the allowed config range for proxy.config.cache.ram_cache.algorithm to include 2. |
| src/iocore/cache/RamCacheS3FIFO.cc | Adds the S3-FIFO RAM cache policy implementation. |
| src/iocore/cache/P_RamCache.h | Declares the S3-FIFO factory function. |
| src/iocore/cache/CMakeLists.txt | Adds RamCacheS3FIFO.cc to the cache build. |
| src/iocore/cache/CacheTest.cc | Extends the existing RAM cache regression test to run S3-FIFO. |
| src/iocore/cache/CacheProcessor.cc | Wires algorithm 2 into initialization and adds a debug log of the selected policy. |
| include/iocore/eventsystem/Thread.h | Adds a per-thread allocator freelist slot for S3-FIFO entries. |
| include/iocore/cache/Cache.h | Defines RAM_CACHE_ALGORITHM_S3FIFO as 2. |
| doc/admin-guide/storage/index.en.rst | Documents S3-FIFO as a supported RAM cache eviction algorithm and clarifies seen-filter applicability. |
| doc/admin-guide/files/records.yaml.en.rst | Updates the ram_cache.algorithm record documentation to describe the new 2 = S3-FIFO option. |
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add S3-FIFO RAM cache eviction algorithm (
ram_cache.algorithm = 2)Summary
This adds S3-FIFO as a selectable RAM cache eviction policy via
proxy.config.cache.ram_cache.algorithm = 2, alongside the existing CLFUS (0) and LRU (1). The default (LRU) is unchanged.S3-FIFO (Yang, Zhang, Qiu, Yue & Rashmi, "FIFO queues are all you need for cache eviction", SOSP 2023) is a FIFO-based policy: a small admission queue and a main queue, plus a ghost queue that remembers the keys of recently evicted objects. The small queue and ghost filter one-hit-wonders, which makes the policy scan-resistant and yields strong hit rates on CDN and key-value workloads — while keeping every operation cheap, since a hit only bumps a 2-bit counter and nothing is ever reordered.
Motivation
The two current RAM cache policies split sharply on real traffic: LRU favors recency and loses on frequency-skewed CDN workloads, while CLFUS favors frequency/size, is comparatively expensive, and can collapse on recency-heavy traffic. In benchmarking, S3-FIFO beats both LRU and CLFUS on every production trace tested (CDN and key-value), with a
puton par with LRU and far below CLFUS and agetbetween the two — making it an attractive general-purpose option to offer operators.What's in this PR
RamCacheS3FIFO.cc— the newRamCacheimplementation.Cache.h,CacheProcessor.cc,RecordsConfig.cc,P_RamCache.h,CMakeLists.txt— the algorithm enum, the factory wiring (with a one-linecache_initlog of the selected algorithm), the config range[0-2], the declaration, and the build entry.CacheTest.cc— S3-FIFO added to the existingram_cacheregression comparison.records.yaml.en.rst,storage/index.en.rst).Design
S3-FIFO keeps three FIFO queues: a small admission queue (~10% of capacity), a main queue (~90%, run as a 2-bit CLOCK), and a ghost queue holding only keys of recently evicted objects. A new object enters the small queue. When an object is evicted from the small queue it is promoted to the main queue if it was reused (frequency ≥ 2) and otherwise demoted to the ghost; a subsequent miss whose key is still in the ghost is admitted straight to the main queue. This "quick demotion" of one-hit-wonders is what gives S3-FIFO its scan resistance and CDN-friendly behavior.
The policy is byte-budgeted to fit the ATS RAM cache. The ghost stores keys, not data, but each remembered key still costs real memory, so the ghost is bounded both by its object-size footprint and by an entry-count cap, and that metadata is counted against
proxy.config.cache.ram_cache.size— so total resident memory (data plus all eviction metadata) stays within the configured budget regardless of object cardinality. Access is serialized per stripe by the existing stripe lock, exactly like LRU and CLFUS; no new concurrency primitives are introduced.The implementation follows the original paper and the reference implementation in libCacheSim (
S3FIFO.c).Benchmarks
S3-FIFO was selected after an independent evaluation by @bryancall across all five candidate policies (the incumbents plus W-TinyLFU, SIEVE, and S3-FIFO), using both an in-process microbenchmark and an end-to-end h2load sweep through a real ATS proxy. S3-FIFO was the best all-rounder — top or tied-top hit rate on every workload, passing scan resistance and adaptivity, with the cheapest
put. The full writeup and methodology are in @bryancall's benchmark on the evaluation PR: phongn#2. The tables below focus on S3-FIFO against the two incumbents this change adds it alongside.Hit rate — real production traces
Full-trace replay (libCacheSim
oracleGeneraltraces), hit rate over the second half after warmup, the same stream for every policy; higher is better.S3-FIFO has the best hit rate on every real trace and at every size measured.
Microbenchmark (@bryancall, Ryzen 9950X3D)
Per-operation cost, ns/op, lower is better:
S3-FIFO's
putis the cheapest measured (on par with LRU, ~35% below CLFUS); itsgetsits between LRU and CLFUS. On the correctness suite it passes scan resistance (hot-set retention 1.000 under a one-time scan), adaptivity after an abrupt working-set shift (new-set hit rate 1.000, stale-set retention 15/112), and gradual drift (0.978), and its synthetic-Zipf hit rate is top or tied-top at every cache size.End-to-end throughput (@bryancall)
In the h2load sweep through a real proxy, end-to-end req/s barely moved between algorithms — 3–5% on 1 KB objects (where S3-FIFO was the top performer) and flat on bandwidth-saturated large objects. The reason is the test box's tiering: a RAM-cache miss falls through to a fast local NVMe disk-cache hit rather than an origin fetch, so the eviction policy only chooses which tier serves a hit. The policy's value is in hit ratio under memory pressure, which surfaces as throughput only where a RAM miss is expensive (origin-bound or slow-tier deployments) — so this change is offered on its hit-ratio and per-operation-cost merits, not as a throughput win on every topology.
Configuration
The selected algorithm is logged at cache initialization (
cache_initdebug tag):ram_cache algorithm = 2 (S3-FIFO).Testing
The NIGHTLY-gated
ram_cacheregression test (traffic_server -R 2 -r ram_cache) now exercises S3-FIFO alongside LRU and CLFUS on the synthetic Zipfian workload. Booting withram_cache.algorithm: 2was verified to construct the S3-FIFO cache and log the selection.Notes
🤖 Generated with Claude Code