Fix geocoder returning empty when a longer prefix lacks the requested language#323
Open
gaoflow wants to merge 1 commit into
Open
Fix geocoder returning empty when a longer prefix lacks the requested language#323gaoflow wants to merge 1 commit into
gaoflow wants to merge 1 commit into
Conversation
_prefix_description_for_number immediately returned an empty string when a
longer prefix was found in the geocoding data but did not carry an entry for
the requested language. This prevented the loop from trying shorter (less
specific) prefixes that do have the language data.
For example, Finnish Helsinki-area numbers (+3589x…) were mapped to 'Finland'
rather than 'Helsinki' when English or Finnish was requested, because the
5-digit prefix '35891' exists with only a Swedish ('se') entry while the
4-digit prefix '3589' carries 'en', 'fi', and 'sv' data.
Fix: instead of returning U_EMPTY_STRING when _find_lang yields None, allow
the loop to continue to shorter prefixes. Add a regression test that drives
this path using synthetic test geodata so that the fix is verifiable without
depending on the full production data files.
daviddrysdale
approved these changes
Jun 25, 2026
daviddrysdale
left a comment
Owner
There was a problem hiding this comment.
Thank you for spotting and fixing!
| TEST_GEOCODE_DATA['12123'] = {'de': u("TestDistrict")} | ||
| try: | ||
| # US_NUMBER3 is +1 212-812-0000; its prefix digits start with '12128' so | ||
| # it won't hit '12123'. Use a fabricated number whose digits begin '12123'. |
Owner
There was a problem hiding this comment.
nit: the first sentence of the comment is unhelpful, please remove it.
| self.assertEqual("South Korea", description_for_number(KO_MOBILE, _ENGLISH)) | ||
|
|
||
| def testPrefixFallsBackToShorterPrefixWhenLanguageMissing(self): | ||
| # When a longer prefix exists in the geocoding data but lacks the requested |
Owner
There was a problem hiding this comment.
Please could you add "Python version extra test" at the top of the comment as reminder that this has no equivalent Java test.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
_prefix_description_for_number(inprefix.py) immediately returned anempty string when a longer (more-specific) prefix existed in the geocoding
data but did not carry an entry for the requested language. This prevented
the loop from falling through to shorter prefixes that do have the language
data, causing
description_for_numberto fall back all the way to the countryname.
Concrete example
Finnish Helsinki-area numbers (
+3589x…) are described as'Finland'insteadof
'Helsinki'when English or Finnish is requested:Root cause: the production geocoding data contains a 5-digit prefix
'35891'with only{'se': 'Helsingfors'}(Swedish via the 'se' locale key),and a 4-digit prefix
'3589'with{'en': 'Helsinki', 'fi': 'Helsinki', 'sv': 'Helsingfors'}. Because'35891'is the longer (more-specific) matchand is found first, the old code returned
''without ever reaching'3589'.The same defect affects a handful of other prefixes (DRC, Armenia, Italy,
Taiwan) where a sub-prefix carries an alternative-language-only entry while the
parent prefix has the common languages.
Fix
Remove the
else: return U_EMPTY_STRINGbranch so that the loop continues toshorter prefixes when
_find_langreturnsNone. The function still returnsU_EMPTY_STRINGat the end of the loop when no prefix at all has usable data.Tests
A regression test (
testPrefixFallsBackToShorterPrefixWhenLanguageMissing)is added to
geocodertest.py. It injects a synthetic longer-prefix entryinto the test geodata (one that has only German) and verifies that a French
request correctly falls through to the shorter prefix's English data. All
280 existing tests continue to pass.
This pull request was prepared with the assistance of AI, under my direction and review.