Skip to content

Fix geocoder returning empty when a longer prefix lacks the requested language#323

Open
gaoflow wants to merge 1 commit into
daviddrysdale:devfrom
gaoflow:fix-geocoder-prefix-language-fallback
Open

Fix geocoder returning empty when a longer prefix lacks the requested language#323
gaoflow wants to merge 1 commit into
daviddrysdale:devfrom
gaoflow:fix-geocoder-prefix-language-fallback

Conversation

@gaoflow

@gaoflow gaoflow commented Jun 25, 2026

Copy link
Copy Markdown

Problem

_prefix_description_for_number (in prefix.py) immediately returned an
empty string when a longer (more-specific) prefix existed in the geocoding
data but did not carry an entry for the requested language. This prevented
the loop from falling through to shorter prefixes that do have the language
data, causing description_for_number to fall back all the way to the country
name.

Concrete example

Finnish Helsinki-area numbers (+3589x…) are described as 'Finland' instead
of 'Helsinki' when English or Finnish is requested:

>>> import phonenumbers
>>> from phonenumbers import geocoder
>>> num = phonenumbers.parse("+358912345678")
>>> geocoder.description_for_number(num, "en")
'Finland'   # ← wrong; expected 'Helsinki'
>>> geocoder.description_for_number(num, "fi")
'Suomi'     # ← wrong; expected 'Helsinki'

Root cause: the production geocoding data contains a 5-digit prefix
'35891' with only {'se': 'Helsingfors'} (Swedish via the 'se' locale key),
and a 4-digit prefix '3589' with {'en': 'Helsinki', 'fi': 'Helsinki', 'sv': 'Helsingfors'}. Because '35891' is the longer (more-specific) match
and is found first, the old code returned '' without ever reaching '3589'.

The same defect affects a handful of other prefixes (DRC, Armenia, Italy,
Taiwan) where a sub-prefix carries an alternative-language-only entry while the
parent prefix has the common languages.

Fix

Remove the else: return U_EMPTY_STRING branch so that the loop continues to
shorter prefixes when _find_lang returns None. The function still returns
U_EMPTY_STRING at the end of the loop when no prefix at all has usable data.

-            if name is not None:
-                return name
-            else:
-                return U_EMPTY_STRING
+            if name is not None:
+                return name
+            # Language not available for this prefix; try a less-specific (shorter) prefix.

Tests

A regression test (testPrefixFallsBackToShorterPrefixWhenLanguageMissing)
is added to geocodertest.py. It injects a synthetic longer-prefix entry
into the test geodata (one that has only German) and verifies that a French
request correctly falls through to the shorter prefix's English data. All
280 existing tests continue to pass.


This pull request was prepared with the assistance of AI, under my direction and review.

_prefix_description_for_number immediately returned an empty string when a
longer prefix was found in the geocoding data but did not carry an entry for
the requested language.  This prevented the loop from trying shorter (less
specific) prefixes that do have the language data.

For example, Finnish Helsinki-area numbers (+3589x…) were mapped to 'Finland'
rather than 'Helsinki' when English or Finnish was requested, because the
5-digit prefix '35891' exists with only a Swedish ('se') entry while the
4-digit prefix '3589' carries 'en', 'fi', and 'sv' data.

Fix: instead of returning U_EMPTY_STRING when _find_lang yields None, allow
the loop to continue to shorter prefixes.  Add a regression test that drives
this path using synthetic test geodata so that the fix is verifiable without
depending on the full production data files.

@daviddrysdale daviddrysdale left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for spotting and fixing!

TEST_GEOCODE_DATA['12123'] = {'de': u("TestDistrict")}
try:
# US_NUMBER3 is +1 212-812-0000; its prefix digits start with '12128' so
# it won't hit '12123'. Use a fabricated number whose digits begin '12123'.

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the first sentence of the comment is unhelpful, please remove it.

self.assertEqual("South Korea", description_for_number(KO_MOBILE, _ENGLISH))

def testPrefixFallsBackToShorterPrefixWhenLanguageMissing(self):
# When a longer prefix exists in the geocoding data but lacks the requested

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please could you add "Python version extra test" at the top of the comment as reminder that this has no equivalent Java test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants