MDEV-38992 TABLESAMPLE Clause Implementation#5208
Conversation
There was a problem hiding this comment.
Code Review
This pull request implements the SQL Standard TABLESAMPLE clause (supporting SYSTEM and BERNOULLI methods) for MariaDB. The changes introduce parser support, parameter handling for prepared statements and stored procedures, and mark these queries as unsafe for statement-based replication. The review feedback highlights several critical issues: a compilation or conversion bug in String::append_double due to an unsupported qs_append overload, safety and validation issues in fix_tablesample_fields (including unsafe reference pointers and incorrect handling of unbound parameters during PREPARE), silent conversion of NULL parameters to 0.0 in prepared statements, and misleading error reporting (ER_SYNTAX_ERROR instead of ER_NOT_SUPPORTED_YET) for semantic violations on views, derived tables, or system tables.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| bool String::append_double(double d) | ||
| { | ||
| if (realloc(str_length+FLOATING_POINT_BUFFER+2)) | ||
| return TRUE; | ||
|
|
||
| qs_append(d); | ||
|
|
||
| return FALSE; | ||
| } |
There was a problem hiding this comment.
Using qs_append(d) with a double argument is highly problematic. qs_append does not have an overload for double in String or Binary_string. This will result in either a compilation failure or an implicit conversion of the double to a char or integer type, leading to corrupted output in query logging and replication. Instead, use my_gcvt to format the floating-point number into a temporary buffer and then append it.
bool String::append_double(double d)
{
char buf[FLOATING_POINT_BUFFER + 3];
size_t len= my_gcvt(d, MY_GCVT_ARG_DOUBLE, sizeof(buf) - 1, buf, NULL);
return append(buf, len);
}| int fix_tablesample_fields(THD *thd) | ||
| { | ||
| DBUG_ENTER("Lex_tablesample::fix_tablesample_fields"); | ||
| DBUG_ASSERT(thd); | ||
| bool err= sampling_percentage->fix_fields_if_needed(thd, NULL); | ||
| if(err) | ||
| DBUG_RETURN(1); | ||
| if (sampling_percentage->const_item()) | ||
| { | ||
| double d= sampling_percentage->val_real(); | ||
| if (d < 0.0 || d > 100.0) | ||
| DBUG_RETURN(1); | ||
| } | ||
|
|
||
| DBUG_RETURN(0); | ||
| } |
There was a problem hiding this comment.
There are multiple issues in fix_tablesample_fields:
- Passing
NULLas the reference pointer tofix_fields_if_neededis unsafe. If the item needs to be replaced during resolution, it will fail or cause a crash. Pass&sampling_percentageinstead. - The method does not check
sampling_percentage->null_value. If a parameter is bound toNULL, it will be silently accepted as0.0instead of raising an error. - For prepared statements,
sampling_percentage->const_item()is true duringPREPAREbut the parameter has no value yet (state == Item_param::NO_VALUE). We must skip validation for unbound parameters duringPREPAREto avoid incorrect failures.
int fix_tablesample_fields(THD *thd)
{
DBUG_ENTER("Lex_tablesample::fix_tablesample_fields");
DBUG_ASSERT(thd);
bool err= sampling_percentage->fix_fields_if_needed(thd, &sampling_percentage);
if (err)
DBUG_RETURN(1);
if (sampling_percentage->const_item())
{
if (sampling_percentage->type() == Item::PARAM_ITEM)
{
Item_param *param= static_cast<Item_param*>(sampling_percentage);
if (param->state == Item_param::NO_VALUE)
DBUG_RETURN(0);
}
double d= sampling_percentage->val_real();
if (sampling_percentage->null_value || d < 0.0 || d > 100.0)
DBUG_RETURN(1);
}
DBUG_RETURN(0);
}| if (param->tablesample_clause_param && !param->has_double_value()) | ||
| { | ||
| if (param->set_tablesample_clause_param(param->val_real())) | ||
| DBUG_RETURN(true); | ||
| } |
There was a problem hiding this comment.
If the tablesample parameter is NULL, calling set_tablesample_clause_param will convert the parameter's state to SHORT_DATA_VALUE with a value of 0.0. This silently converts NULL to 0.0 instead of preserving the NULL value to be caught as an error during execution. Check !param->is_null() before setting the parameter value.
if (param->tablesample_clause_param && !param->has_double_value())
{
if (!param->is_null() && param->set_tablesample_clause_param(param->val_real()))
DBUG_RETURN(true);
}| if (param->tablesample_clause_param && !param->has_double_value()) | ||
| { | ||
| if (param->set_tablesample_clause_param(param->val_real())) | ||
| DBUG_RETURN(true); | ||
| } |
There was a problem hiding this comment.
If the tablesample parameter is NULL, calling set_tablesample_clause_param will convert the parameter's state to SHORT_DATA_VALUE with a value of 0.0. This silently converts NULL to 0.0 instead of preserving the NULL value to be caught as an error during execution. Check !param->is_null() before setting the parameter value.
if (param->tablesample_clause_param && !param->has_double_value())
{
if (!param->is_null() && param->set_tablesample_clause_param(param->val_real()))
DBUG_RETURN(true);
}| if (table_list->tablesample_clause) { | ||
| if (table_list->is_view_or_derived() || | ||
| get_table_category(table_list->db, table_list->table_name) != TABLE_CATEGORY_USER || | ||
| table_list->tablesample_clause->fix_tablesample_fields(thd)) | ||
| { | ||
| my_error(ER_SYNTAX_ERROR, MYF(0)); | ||
| DBUG_RETURN(1); | ||
| } | ||
| } |
There was a problem hiding this comment.
Using ER_SYNTAX_ERROR for semantic errors (such as attempting to use TABLESAMPLE on views, derived tables, or system tables, or when the sampling percentage is out of range) is misleading because the SQL syntax itself is correct. Instead, more specific errors should be reported. For unsupported table types, ER_NOT_SUPPORTED_YET is much more appropriate. Furthermore, if fix_tablesample_fields fails, it might have already set an error, so we should avoid blindly overwriting it with ER_SYNTAX_ERROR.
if (table_list->tablesample_clause) {
if (table_list->is_view_or_derived())
{
my_error(ER_NOT_SUPPORTED_YET, MYF(0), "TABLESAMPLE on views or derived tables");
DBUG_RETURN(1);
}
if (table_list->table && table_list->table->s->table_category != TABLE_CATEGORY_USER)
{
my_error(ER_NOT_SUPPORTED_YET, MYF(0), "TABLESAMPLE on system tables");
DBUG_RETURN(1);
}
if (table_list->tablesample_clause->fix_tablesample_fields(thd))
{
if (!thd->is_error())
my_error(ER_WRONG_ARGUMENTS, MYF(0), "TABLESAMPLE");
DBUG_RETURN(1);
}
}There was a problem hiding this comment.
Pull request overview
Adds initial parser + execution-time syntax validation support for the SQL:2016 TABLESAMPLE clause (SYSTEM/BERNOULLI) across regular statements, stored routines, and prepared statements, including basic binlog-unsafe marking and MTR coverage. Optimizer integration is explicitly out of scope for this stage.
Changes:
- Extend the SQL grammar to parse
TABLESAMPLE {SYSTEM|BERNOULLI} (<pct>)after table references and attach it toTABLE_LIST. - Add
Lex_tablesampleplus validation insetup_tables()to reject sampling on views/derived/CTE/system tables and to validate percentage range. - Add prepared-statement/SP plumbing for TABLESAMPLE parameters (binding + query rewrite logging), plus new MTR tests.
Reviewed changes
Copilot reviewed 16 out of 16 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| sql/table.h | Adds TABLE_LIST::tablesample_clause pointer to carry parsed TABLESAMPLE info. |
| sql/sql_yacc.yy | Introduces TABLESAMPLE tokens and grammar rules; attaches Lex_tablesample to table references; marks statement binlog-unsafe for non-trivial sampling. |
| sql/sql_type.h | Renames LIMIT-type predicate to a generalized numeric-clause predicate (now used by TABLESAMPLE variable handling). |
| sql/sql_tablesample.h | New Lex_tablesample container and fix_tablesample_fields() validation helper. |
| sql/sql_string.h | Adds String::append_double() declaration (used for rewritten query logging). |
| sql/sql_string.cc | Implements String::append_double(). |
| sql/sql_prepare.cc | Adds TABLESAMPLE parameter binding/conversion to double for prepared statements (incl. logging path). |
| sql/sql_lex.h | Adds BINLOG_STMT_UNSAFE_TABLESAMPLE and new LEX helpers to create TABLESAMPLE SP-variable items. |
| sql/sql_lex.cc | Refactors LIMIT variable creation into a shared numeric helper and adds TABLESAMPLE SP-variable creators. |
| sql/sql_base.cc | Validates TABLESAMPLE usage in setup_tables() (disallow view/derived/system, validate percentage). |
| sql/sp_head.cc | Extends SP variable query rewrite logging to print TABLESAMPLE parameters as doubles. |
| sql/lex.h | Adds TABLESAMPLE and BERNOULLI keywords. |
| sql/item.h | Tracks TABLESAMPLE parameter-ness on rewritable items; adds double-typed binding helpers on Item_param; renames SP-variable numeric validation helper. |
| sql/item.cc | Adds TABLESAMPLE handling in Item_param::set_from_item(). |
| mysql-test/main/tablesample.test | New MTR coverage for parsing/validation across basic queries, PS, SP, and disallowed table types. |
| mysql-test/main/tablesample.result | Expected output for the new TABLESAMPLE MTR test. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if (type_handler()->is_numeric_clause_valid_type()) | ||
| return true; | ||
| my_error(ER_WRONG_SPVAR_TYPE_IN_LIMIT, MYF(0)); | ||
| return false; |
| virtual bool is_numeric_clause_valid_type() const | ||
| { | ||
| return false; | ||
| } |
…lause 1. Inspired from the implementation of the limit clause, added support for TABLESAMPLE clause in queries, stored procedures and prepared statements. 2. Added syntax validation for sampling percentage, correct usage with tables(system tables or derived table cannot be sampled). 3. Added tests validating the implementation of the above two.
Initial stage of the TABLESAMPLE (MDEV-38992) clause implementation
Parsing and syntax validation
Optimizer integration
(to be added)