MDEV-10526: Add binary string support to bitwise operators#5190
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces support for binary-to-binary bitwise operations (AND, OR, XOR, NOT, and bitwise shifts) on binary strings by implementing new handler classes and adding corresponding error messages for mismatched operand sizes. The review feedback highlights several critical issues: a potential integer overflow/wrap-around in the left-shift index calculation on 32-bit systems, incorrect ASCII string conversion of numeric arguments in mixed-type bitwise operations, and potential undefined behavior when handling empty binary strings due to passing a null pointer to memset.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| size_t src_idx= i + byte_shift; | ||
| if (src_idx < len) | ||
| { |
There was a problem hiding this comment.
The calculation size_t src_idx = i + byte_shift is susceptible to integer overflow/wrap-around on 32-bit systems if len is very large (e.g., close to size_t max). If i + byte_shift overflows, src_idx wraps around and can evaluate to a value less than len, leading to incorrect data copying from wrapped-around indices instead of zero-filling. To prevent this, perform the bounds check using subtraction (byte_shift < len - i) before computing src_idx.
if (byte_shift < len - i)
{
size_t src_idx= i + byte_shift;| StringBuffer<128> b_buf; | ||
| String *b= item->arguments()[1]->val_str(&b_buf); |
There was a problem hiding this comment.
In mixed-type bitwise operations (e.g., binary_string & numeric), calling val_str on the numeric argument converts it to its decimal string representation (e.g., 255 becomes the 3-byte ASCII string "255" / 0x323535), rather than its binary representation (e.g., 0xFF). This causes mismatched length errors (e.g., VARBINARY(1) & 255 compares length 1 with length 3) or incorrect bitwise operations if the lengths happen to match. To align with standard bitwise behavior (and MySQL 8.0 compatibility), numeric arguments should be converted to a binary string representation of their integer value, padded or truncated to match the length of the other binary string operand.
| size_t len= a->length(); | ||
|
|
||
| if (to->realloc(len)) |
There was a problem hiding this comment.
When the input binary string a is empty (len == 0), calling to->realloc(0) can return a null pointer or do nothing. Subsequently, passing a null pointer to memset (e.g., memset(out_ptr, 0, len)) is technically undefined behavior in C/C++, even if the length is 0. Adding an early exit for len == 0 avoids this potential undefined behavior and improves efficiency by bypassing unnecessary allocation and loop overhead.
size_t len= a->length();
if (len == 0)
{
to->length(0);
to->set_charset(&my_charset_bin);
item->null_value= false;
return to;
}
if (to->realloc(len))fa5cd7c to
862a606
Compare
grooverdan
left a comment
There was a problem hiding this comment.
Should be off the main branch as a new feature.
from @abarkov
"Would be nice to add tests that uuid, inet6, inet4, geometry are not allowed for bit operations. Later we can probably implement bit operations for uuid, inet6, inet4. But to avoid compatibility problems we need to make sure they return error now."
| bool fix_length_and_dec(Item_handled_func *item) const override | ||
| { | ||
| item->max_length= item->arguments()[0]->max_length; | ||
| item->collation.set(&my_charset_bin, DERIVATION_IMPLICIT); |
There was a problem hiding this comment.
from @abarkov
"I think should be DERIVATION_COERCIBLE instead of DERIVATION_IMPLICIT"
|
|
||
| bool fix_length_and_dec(Item_handled_func *item) const override | ||
| { | ||
| item->max_length= item->arguments()[0]->max_length; |
There was a problem hiding this comment.
from @abarkov
"check that max_length is calculated correctly in all functions"
This will show up in mtr --cursor-protocol on test that have truncated test result output on fields.
| DBUG_ASSERT(item->fixed()); | ||
| StringBuffer<128> a_buf; | ||
| String *a= item->arguments()[0]->val_str(&a_buf); | ||
| if (item->arguments()[0]->null_value || a == nullptr) |
There was a problem hiding this comment.
from @abarkov
"The test for item->arguments()[0]->null_value is redundant"
| } | ||
|
|
||
| Longlong_null shift_count_null= item->arguments()[1]->to_longlong_null(); | ||
| if (item->arguments()[1]->null_value || shift_count_null.is_null()) |
There was a problem hiding this comment.
from @abarkov "same here. It enough to test shift_count_null.is_null()"
Bitwise operators (&, |, ^, ~, <<, >>) previously cast all arguments to BIGINT, silently truncating values wider than 64 bits. This broke operations on BINARY, VARBINARY, BLOB, INET6, and UUID columns. Introduces binary_mode detection in fix_length_and_dec(). When any non-literal argument has STRING_RESULT with binary charset, operators switch to byte-by-byte processing via a new Handler_str subclass, returning LONGBLOB of the same length as the input. Bare hex literals (x'FF', 0xFF) and bit literals (b'1010') retain integer mode for backward compatibility. Existing int/decimal handler classes for Item_func_bit_or and Item_func_bit_and are moved from item_cmpfunc.cc to item_func.cc for consistency. New error codes: ER_INVALID_BITWISE_OPERANDS_SIZE ER_INVALID_BITWISE_AGGREGATE_OPERANDS_SIZE Aggregate function support (BIT_AND/BIT_OR/BIT_XOR) to follow in a subsequent commit. Closes: MDEV-10526
- Change DERIVATION_IMPLICIT to DERIVATION_COERCIBLE in all binary handler fix_length_and_dec() methods - Update return_type_handler() to vary by max_length using blob_type_handler/type_handler_varchar/type_handler_string pattern, matching Item_char_typecast_func_handler_fbt_to_binary - Fix max_length calculation for two-operand operators (&, |, ^) to use MY_MAX of both operand lengths - Remove redundant null_value checks in val_str() — nullptr check on val_str() return value is sufficient - Add early exit for empty string (len == 0) in all handlers - Fix potential size_t overflow in shift left bounds check: use (byte_shift < len - i) instead of (i + byte_shift < len) - Remove duplicate size_t len variable in xor/and/or handlers
…ates
Item_sum_bit::reset_field() and update_field() previously assumed
integer mode, using int8store/uint8korr to read/write 8 raw bytes
to result_field. In binary mode result_field is a string-typed
field, so this corrupted temp table memory and crashed the server
(SIGSEGV in ha_maria::write_block_record) during GROUP BY queries.
Fix uses Field::store()/val_str() for binary mode, matching the
pattern used by Item_sum_min_max for string aggregates.
Add formal test file mysql-test/main/func_bitops_binary.test
covering:
- All 6 scalar operators on VARBINARY
- INET6_ATON subnet masking (primary use case)
- NULL handling
- Mismatched length errors
- Current hex/binary literal behavior (pending mentor decision
on x'FF' semantics)
- INET6/UUID CAST restrictions preserved (per Alexander's request)
- BIT_AND/BIT_OR/BIT_XOR aggregates including GROUP BY
- Empty result set returns neutral elements (not NULL)
- 512-byte aggregate size guard
- Integer mode backward compatibility
- CREATE TABLE AS SELECT type preservation
Per Alexander Barkov's request, Section 6 verified that uuid and inet6 cast types continue to error on bitwise operations. This extends coverage to inet4 and geometry types as well, confirming all four types (uuid, inet6, inet4, geometry) correctly raise ER_ILLEGAL_PARAMETER_DATA_TYPE_FOR_OPERATION (4079) and remain out of scope for this patch.
221664d to
6b9dd9e
Compare
|
Alexey Botchkov seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
Draft for review. Implements byte-by-byte binary string mode for all scalar bitwise operators. Aggregate function
support (BIT_AND/BIT_OR/BIT_XOR) to follow.
Tested: