Skip to content

Fix compute_head_fingerprint folding case-only title/meta changes#2036

Open
jichaowang02-lang wants to merge 1 commit into
unclecode:developfrom
jichaowang02-lang:fix/head-fingerprint-case-sensitive-values
Open

Fix compute_head_fingerprint folding case-only title/meta changes#2036
jichaowang02-lang wants to merge 1 commit into
unclecode:developfrom
jichaowang02-lang:fix/head-fingerprint-case-sensitive-values

Conversation

@jichaowang02-lang

Copy link
Copy Markdown

Summary

compute_head_fingerprint lowercases the entire <head> before extracting signals:

head_lower = head_html.lower()
title_match = re.search(r'<title[^>]*>(.*?)</title>', head_lower, re.DOTALL)   # value is lowercased
...
match = re.search(pattern, head_lower)                                          # value is lowercased

So the captured values are lowercased, not just the tag/attribute matching. Two heads that differ only in the case of a title or meta value hash to the same fingerprint:

compute_head_fingerprint("<head><title>iPhone</title></head>")
  == compute_head_fingerprint("<head><title>IPHONE</title></head>")   # ❌ same hash

CacheValidator treats an equal fingerprint as "unchanged", so a genuinely updated page is reported FRESH and stale cached content is served.

Fix

Match tags/attributes case-insensitively with re.IGNORECASE against the original head, so extracted values keep their case. Tag/attribute case-insensitivity is preserved (e.g. <TITLE> / META NAME=...CONTENT= still parse), and identical content still hashes identically.

Testing

$ pytest tests/cache_validation/test_head_fingerprint.py -q
14 passed

Adds two regression tests:

  • test_value_case_change_changes_fingerprint — a case-only value change now changes the fingerprint (fails on the current code).
  • test_tag_and_attribute_case_does_not_change_fingerprint — markup-only case differences still yield the same fingerprint.

compute_head_fingerprint lowercased the entire <head> (`head_html.lower()`)
before extracting the title and meta values, so the captured signal values
were lowercased too. Two heads that differ only in the case of a title or
meta value (e.g. "iPhone" vs "IPHONE", "Buy Now" vs "BUY NOW") therefore
hashed to the same fingerprint. CacheValidator treats an equal fingerprint as
unchanged, so a genuinely updated page was reported FRESH and stale cached
content was served.

Match tags/attributes case-insensitively (re.IGNORECASE) against the original
head instead, so the extracted values keep their original case. Tag/attribute
case-insensitivity is preserved; identical content still hashes identically.

Adds regression tests: a case-only value change now changes the fingerprint
(fails on the old code), while tag/attribute-only case differences do not.
Copilot AI review requested due to automatic review settings June 23, 2026 01:02

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot was unable to review this pull request because the user who requested the review has reached their quota limit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants