MDEV-39995 JSON_CONTAINS and JSON_EQUALS do not compare strings based on semantic#5223
MDEV-39995 JSON_CONTAINS and JSON_EQUALS do not compare strings based on semantic#5223gengtianuiowa wants to merge 1 commit into
Conversation
There was a problem hiding this comment.
Code Review
This pull request addresses MDEV-39995 by introducing semantic comparison for JSON strings, ensuring that Unicode escape sequences (e.g., "\u0041" and "A") are treated as equal. It implements a new json_string_compare function, integrates it into JSON_CONTAINS and JSON_OVERLAPS logic, and updates JSON normalization to decode and re-encode escaped strings and keys. The review feedback highlights critical security vulnerabilities in strings/json_normalize.c, where the use of my_alloca for stack allocation based on arbitrary JSON string and key lengths could lead to stack overflows. It is recommended to use heap allocation with my_malloc and my_free instead.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
gkodinov
left a comment
There was a problem hiding this comment.
Thank you for your contribution! This is a preliminary review.
Please see into my_alloca() use. And the compile failures.
There's also some stylistic suggestions.
… on semantic JSON_CONTAINS, JSON_EQUALS, and JSON_OVERLAPS used raw byte-level comparison (memcmp) for JSON string values, which meant semantically equivalent strings like "A" and "\u0041" were incorrectly treated as different. Fix: add json_string_compare() that decodes Unicode escape sequences before comparing, and fix json_normalize to produce a canonical form for strings with escapes so JSON_EQUALS works correctly. All new code of the whole pull request, including one or several files that are either new files or modified ones, are contributed under the BSD-new license. I am contributing on behalf of my employer Amazon Web Services, Inc.
e37a2ee to
beaba10
Compare
|
Comments resolved. Please review again, thanks! |
Description
JSON_CONTAINS, JSON_EQUALS, and JSON_OVERLAPS used raw byte-level comparison (memcmp) for JSON string values, which meant semantically equivalent strings like "A" and "\u0041" were incorrectly treate as different.
Fix: add json_string_compare() that decodes Unicode escape sequences before comparing, and fix json_normalize to produce a canonical form for strings with escapes so JSON_EQUALS works correctly.
Release Notes
N/A
How can this PR be tested?
All MTR tests should pass. New MTR test
func_json_unicode_escapeshould also pass.Before this change:
After this change:
Basing the PR against the correct MariaDB version
Copyright
All new code of the whole pull request, including one or several files that are either new files or modified ones, are contributed under the BSD-new license. I am contributing on behalf of my employer Amazon Web Services, Inc.