Skip to content

MDEV-39995 JSON_CONTAINS and JSON_EQUALS do not compare strings based on semantic#5223

Open
gengtianuiowa wants to merge 1 commit into
MariaDB:12.3from
gengtianuiowa:MDEV-39995
Open

MDEV-39995 JSON_CONTAINS and JSON_EQUALS do not compare strings based on semantic#5223
gengtianuiowa wants to merge 1 commit into
MariaDB:12.3from
gengtianuiowa:MDEV-39995

Conversation

@gengtianuiowa

Copy link
Copy Markdown
Contributor

Description

JSON_CONTAINS, JSON_EQUALS, and JSON_OVERLAPS used raw byte-level comparison (memcmp) for JSON string values, which meant semantically equivalent strings like "A" and "\u0041" were incorrectly treate as different.

Fix: add json_string_compare() that decodes Unicode escape sequences before comparing, and fix json_normalize to produce a canonical form for strings with escapes so JSON_EQUALS works correctly.

Release Notes

N/A

How can this PR be tested?

All MTR tests should pass. New MTR test func_json_unicode_escape should also pass.

Before this change:


main.func_json_unicode_escape            [ fail ]
        Test ended at 2026-06-12 00:40:36

CURRENT_TEST: main.func_json_unicode_escape
--- /quick-rebuilds/mariadb-server/mysql-test/main/func_json_unicode_escape.result	2026-06-11 22:59:28.818558778 +0000
+++ /quick-rebuilds/mariadb-server/mysql-test/main/func_json_unicode_escape.reject	2026-06-12 00:40:36.684985232 +0000
@@ -10,18 +10,18 @@
 # JSON_CONTAINS: should return 1 for semantically equal strings
 SELECT JSON_CONTAINS('"A"', '"\\u0041"');
 JSON_CONTAINS('"A"', '"\\u0041"')
-1
+0
 SELECT JSON_CONTAINS('"\\u0041"', '"A"');
 JSON_CONTAINS('"\\u0041"', '"A"')
-1
+0
 # JSON_OVERLAPS: should return 1 for semantically equal strings
 SELECT JSON_OVERLAPS('"A"', '"\\u0041"');
 JSON_OVERLAPS('"A"', '"\\u0041"')
-1
+0
 # JSON_EQUALS: should return 1 for semantically equal strings
 SELECT JSON_EQUALS('"A"', '"\\u0041"');
 JSON_EQUALS('"A"', '"\\u0041"')
-1
+0
 # JSON_UNQUOTE correctly resolves the escape (proving they are the same)
 SELECT JSON_UNQUOTE('"A"') = JSON_UNQUOTE('"\\u0041"');
 JSON_UNQUOTE('"A"') = JSON_UNQUOTE('"\\u0041"')
@@ -38,7 +38,7 @@
 A
 SELECT JSON_CONTAINS('"A"', CAST(0x225C753030343122 AS CHAR));
 JSON_CONTAINS('"A"', CAST(0x225C753030343122 AS CHAR))
-1
+0
 SELECT JSON_CONTAINS(JSON_QUOTE(JSON_UNQUOTE('"A"')),
 JSON_QUOTE(JSON_UNQUOTE(CAST(0x225C753030343122 AS CHAR))));
 JSON_CONTAINS(JSON_QUOTE(JSON_UNQUOTE('"A"')),
@@ -50,32 +50,32 @@
 # \u0048\u0065\u006C\u006C\u006F = "Hello"
 SELECT JSON_CONTAINS('"Hello"', '"\\u0048\\u0065\\u006C\\u006C\\u006F"');
 JSON_CONTAINS('"Hello"', '"\\u0048\\u0065\\u006C\\u006C\\u006F"')
-1
+0
 SELECT JSON_EQUALS('"Hello"', '"\\u0048\\u0065\\u006C\\u006C\\u006F"');
 JSON_EQUALS('"Hello"', '"\\u0048\\u0065\\u006C\\u006C\\u006F"')
-1
+0
 SELECT JSON_OVERLAPS('"Hello"', '"\\u0048\\u0065\\u006C\\u006C\\u006F"');
 JSON_OVERLAPS('"Hello"', '"\\u0048\\u0065\\u006C\\u006C\\u006F"')
-1
+0
 # Mixed literal and escape in the same string: "H\u0065llo" = "Hello"
 SELECT JSON_EQUALS('"Hello"', '"H\\u0065llo"');
 JSON_EQUALS('"Hello"', '"H\\u0065llo"')
-1
+0
 #
 # Test within arrays and objects
 #
 SELECT JSON_CONTAINS('["A", "B"]', '["\\u0041"]');
 JSON_CONTAINS('["A", "B"]', '["\\u0041"]')
-1
+0
 SELECT JSON_CONTAINS('{"key": "A"}', '{"key": "\\u0041"}');
 JSON_CONTAINS('{"key": "A"}', '{"key": "\\u0041"}')
-1
+0
 SELECT JSON_EQUALS('["A", "B"]', '["\\u0041", "\\u0042"]');
 JSON_EQUALS('["A", "B"]', '["\\u0041", "\\u0042"]')
-1
+0
 SELECT JSON_EQUALS('{"key": "A"}', '{"key": "\\u0041"}');
 JSON_EQUALS('{"key": "A"}', '{"key": "\\u0041"}')
-1
+0
 #
 # Surrogate pairs: characters above U+FFFF encoded as two \uXXXX escapes.
 # U+1F600 (😀) = \uD83D\uDE00
@@ -84,28 +84,28 @@
 SET NAMES utf8mb4;
 SELECT JSON_EQUALS('"😀"', '"\\uD83D\\uDE00"');
 JSON_EQUALS('"?"', '"\\uD83D\\uDE00"')
-1
+0
 SELECT JSON_CONTAINS('"😀"', '"\\uD83D\\uDE00"');
 JSON_CONTAINS('"?"', '"\\uD83D\\uDE00"')
-1
+0
 SELECT JSON_OVERLAPS('"😀"', '"\\uD83D\\uDE00"');
 JSON_OVERLAPS('"?"', '"\\uD83D\\uDE00"')
-1
+0
 SELECT JSON_EQUALS('"😊"', '"\\uD83D\\uDE0A"');
 JSON_EQUALS('"?"', '"\\uD83D\\uDE0A"')
-1
+0
 SELECT JSON_CONTAINS('["😀", "hello"]', '["\\uD83D\\uDE00"]');
 JSON_CONTAINS('["?", "hello"]', '["\\uD83D\\uDE00"]')
-1
+0
 SELECT JSON_EQUALS('{"emoji": "😀"}', '{"emoji": "\\uD83D\\uDE00"}');
 JSON_EQUALS('{"emoji": "?"}', '{"emoji": "\\uD83D\\uDE00"}')
-1
+0
 #
 # Escaped object keys: \u006B\u0065\u0079 = "key"
 #
 SELECT JSON_EQUALS('{"key":"A"}', '{"\\u006B\\u0065\\u0079":"A"}');
 JSON_EQUALS('{"key":"A"}', '{"\\u006B\\u0065\\u0079":"A"}')
-1
+0
 SELECT JSON_CONTAINS('{"key":"A"}', '{"\\u006B\\u0065\\u0079":"A"}');
 JSON_CONTAINS('{"key":"A"}', '{"\\u006B\\u0065\\u0079":"A"}')
 1
@@ -114,16 +114,16 @@
 #
 SELECT JSON_EQUALS('"é"', '"\\u00E9"');
 JSON_EQUALS('"é"', '"\\u00E9"')
-1
+0
 SELECT JSON_CONTAINS('"é"', '"\\u00E9"');
 JSON_CONTAINS('"é"', '"\\u00E9"')
-1
+0
 SELECT JSON_OVERLAPS('["é"]', '["\\u00E9"]');
 JSON_OVERLAPS('["é"]', '["\\u00E9"]')
-1
+0
 #
 # CJK: 中 = U+4E2D
 #
 SELECT JSON_EQUALS('"中"', '"\\u4E2D"');
 JSON_EQUALS('"中"', '"\\u4E2D"')
-1
+0

After this change:

main.func_json_unicode_escape            [ pass ]

Basing the PR against the correct MariaDB version

  • _This is a bug fix, and the PR is based against the branch 12.3.

Copyright

All new code of the whole pull request, including one or several files that are either new files or modified ones, are contributed under the BSD-new license. I am contributing on behalf of my employer Amazon Web Services, Inc.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses MDEV-39995 by introducing semantic comparison for JSON strings, ensuring that Unicode escape sequences (e.g., "\u0041" and "A") are treated as equal. It implements a new json_string_compare function, integrates it into JSON_CONTAINS and JSON_OVERLAPS logic, and updates JSON normalization to decode and re-encode escaped strings and keys. The review feedback highlights critical security vulnerabilities in strings/json_normalize.c, where the use of my_alloca for stack allocation based on arbitrary JSON string and key lengths could lead to stack overflows. It is recommended to use heap allocation with my_malloc and my_free instead.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread strings/json_normalize.c
Comment thread strings/json_normalize.c Outdated
@gkodinov gkodinov added the External Contribution All PRs from entities outside of MariaDB Foundation, Corporation, Codership agreements. label Jun 12, 2026

@gkodinov gkodinov left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution! This is a preliminary review.

Please see into my_alloca() use. And the compile failures.
There's also some stylistic suggestions.

Comment thread strings/json_lib.c Outdated
Comment thread strings/json_lib.c Outdated
Comment thread strings/json_normalize.c
Comment thread strings/json_normalize.c
Comment thread strings/json_normalize.c Outdated
Comment thread strings/json_normalize.c Outdated
@gkodinov gkodinov self-assigned this Jun 12, 2026
… on semantic

JSON_CONTAINS, JSON_EQUALS, and JSON_OVERLAPS used raw byte-level
comparison (memcmp) for JSON string values, which meant semantically
equivalent strings like "A" and "\u0041" were incorrectly treated
as different.

Fix: add json_string_compare() that decodes Unicode escape sequences
before comparing, and fix json_normalize to produce a canonical form
for strings with escapes so JSON_EQUALS works correctly.

All new code of the whole pull request, including one or several files
that are either new files or modified ones, are contributed under the
BSD-new license. I am contributing on behalf of my employer Amazon Web
Services, Inc.
@gengtianuiowa

Copy link
Copy Markdown
Contributor Author

Comments resolved. Please review again, thanks!

@gengtianuiowa gengtianuiowa requested a review from gkodinov June 12, 2026 16:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

External Contribution All PRs from entities outside of MariaDB Foundation, Corporation, Codership agreements.

Development

Successfully merging this pull request may close these issues.

2 participants