Testing:WL245
From Askmonty.org
Contents |
Testing:WL245
mwl:245 EXISTS to IN transformation
Inputs
- mwl:245 EXISTS to IN transformation
- E-mail from Sanja of 2011-12-04:
The tree is here: lp:~maria-captains/maria/5.3-exists2in exists2in happens when: 1) Correct NULL returning is not important (i.e. "top" item of WHERE clause or its top AND/OR construction) 2) EXISTS subquery has not aggregate functions, ordering, LIMIT or UNION subquery. 3) WHERE clause of the EXISTS subquery contains (in top AND construction or alone) expression <local_field> = <outer_expression>, (something like this: EXISTS (SELECT 1 FROM inner_table WHERE inner_table.field = 2*outer_table.field AND maybe_something_else). 4) The dependency is the only outer reference Conversion is like this: ... WHERE EXISTS (SELECT 1 FROM inner_table WHERE inner_table.field = 2*outer_table.field AND maybe_something_else)... ... WHERE 2*outer_table.field IN (SELECT inner_table.field FROM inner_table WHERE 1 = 1 AND maybe_something_else).. after which optimizations applied. Note: As far as ON condition could be moved to WHERE it night have sense to check it also.
- E-mail from Sanja of 2011-12-07:
-- EXIST to IN then semijoin (has priority over IN to EXISTS)
set optimizer_switch='exists_to_in=on,in_to_exists=on,semijoin=on,materialization=off,subquery_cache=off';
SELECT * FROM t1 WHERE EXISTS ( SELECT a FROM t3 WHERE t3.b = t1.a);
a
c
explain extended
SELECT * FROM t1 WHERE EXISTS ( SELECT a FROM t3 WHERE t3.b = t1.a);
id select_type table type possible_keys key key_len ref rows filtered Extra
1 PRIMARY t1 system aa NULL NULL NULL 1 100.00
1 PRIMARY t3 ref bb bb 4 const 1 100.00 FirstMatch(t1)
Warnings:
Note 1276 Field or reference 'test.t1.a' of SELECT #2 was resolved in SELECT #1
Note 1003 select 'c' AS `a` from `test`.`t1` semi join (`test`.`t3`) where (`test`.`t3`.`b` = 'c')
-- EXIST to IN then IN to EXISTS
set optimizer_switch='exists_to_in=on,in_to_exists=on,semijoin=off,materialization=off,subquery_cache=off';
SELECT * FROM t1 WHERE EXISTS ( SELECT a FROM t3 WHERE t3.b = t1.a);
a
c
explain extended
SELECT * FROM t1 WHERE EXISTS ( SELECT a FROM t3 WHERE t3.b = t1.a);
id select_type table type possible_keys key key_len ref rows filtered Extra
1 PRIMARY t1 system NULL NULL NULL NULL 1 100.00
2 DEPENDENT SUBQUERY t3 ALL bb NULL NULL NULL 2 100.00 Using where
Warnings:
Note 1276 Field or reference 'test.t1.a' of SELECT #2 was resolved in SELECT #1
Note 1003 select 'c' AS `a` from `test`.`t1` where <in_optimizer>('c',<exists>(select `test`.`t3`.`b` from `test`.`t3` where (<cache>('c') = `test`.`t3`.`b`)))
-- EXIST2IN then MATERIALIZATION
set optimizer_switch='exists_to_in=on,in_to_exists=off,semijoin=off,materialization=on,subquery_cache=off';
SELECT * FROM t1 WHERE EXISTS ( SELECT a FROM t3 WHERE t3.b = t1.a);
a
c
explain extended
SELECT * FROM t1 WHERE EXISTS ( SELECT a FROM t3 WHERE t3.b = t1.a);
id select_type table type possible_keys key key_len ref rows filtered Extra
1 PRIMARY t1 system NULL NULL NULL NULL 1 100.00
2 DEPENDENT SUBQUERY t3 ALL NULL NULL NULL NULL 2 100.00
Warnings:
Note 1276 Field or reference 'test.t1.a' of SELECT #2 was resolved in SELECT #1
Note 1003 select 'c' AS `a` from `test`.`t1` where <in_optimizer>('c','c' in ( <materialize> (select `test`.`t3`.`b` from `test`.`t3` where 1 ), <primary_index_lookup>('c' in <temporary table> on distinct_key where (('c' = `<subquery2>`.`b`)))))
-- NO EXIST2IN
set optimizer_switch='exists_to_in=off,subquery_cache=off';
SELECT * FROM t1 WHERE EXISTS ( SELECT a FROM t3 WHERE t3.b = t1.a);
a
c
explain extended
SELECT * FROM t1 WHERE EXISTS ( SELECT a FROM t3 WHERE t3.b = t1.a);
id select_type table type possible_keys key key_len ref rows filtered Extra
1 PRIMARY t1 system NULL NULL NULL NULL 1 100.00
2 DEPENDENT SUBQUERY t3 ref bb bb 4 const 1 100.00
Warnings:
Note 1276 Field or reference 'test.t1.a' of SELECT #2 was resolved in SELECT #1
Note 1003 select 'c' AS `a` from `test`.`t1` where exists(select `test`.`t3`.`a` from `test`.`t3` where (`test`.`t3`.`b` = 'c'))
Usage scenario
Transform queries like:
... WHERE EXISTS
(SELECT 1 FROM inner_table
WHERE inner_table.field = 2*outer_table.field AND maybe_something_else)...
and
... WHERE NOT EXISTS
(SELECT 1 FROM inner_table
WHERE inner_table.field = 2*outer_table.field AND maybe_something_else)...
into
... WHERE 2*outer_table.field IN
(SELECT inner_table.field FROM inner_table
WHERE 1 = 1 AND maybe_something_else)..
... WHERE 2*outer_table.field IS NULL OR 2*outer_table.field IN
(SELECT inner_table.field FROM inner_table
WHERE inner_table.field IS NOT NULL AND maybe_something_else)..
To allow optimizations made for IN/ALL/ANY subqueries.
Conversion is possible only if:
1)real NULL is not important (top element of WHERE/ON AND/OR list, i.e. NULL equal to FALSE)
2)the subquery has the only dependence which we bring out of it
3)the subquery is simple (has no aggregate function, GROUp BY, ORDER BY, LIMIT HAVING and so on)
For NOT EXISTS conversion the subquery should be marked that its left part can't be NULL.
Design, architecture and specification
- WorkLog number and title:
mwl:245 EXISTS to IN transformation
- Are HLS and LLS present?
None
- Reviewer name:
None
- Customer requirements:
None known
- Other implementations:
6.0 that has never been released, probably the same one...
Testability and observability
- Does the feature have an on/off switch?
exists_to_in
- Is the operation of the feature observable?
No trace in SHOW STATUS or EXPLAIN, can only be guessed by the final optimizations
The patch
- List of modified files:
mysql-test/r/optimizer_switch.result mysql-test/r/subselect_exists2in.result mysql-test/r/subselect_exists_to_in.result mysql-test/suite/pbxt/r/subselect.result mysql-test/t/subselect_exists2in.test mysql-test/t/subselect_exists_to_in.test sql/item.cc sql/item.h sql/item_cmpfunc.cc sql/item_cmpfunc.h sql/item_subselect.cc sql/item_subselect.h sql/item_sum.cc sql/mysql_priv.h sql/mysqld.cc sql/opt_subselect.cc sql/sql_select.cc
- Review comments:
Paste a link to the review and read it for any concerns that the reviewer may have raised
- MTR tests:
Does the patch include enough MTR tests?
- DGCov output:
Generate a report or obtain it from a developer and make sure it is acceptable for all parties
Feedback from interested parties
- Feedback from developer:
Paste here any useful information provided by the developer in informal discussions, emails or chat Make sure any concerns raised by the developer are recorded in this document
- Feedback from reviewer:
Paste here any discussions with the reviewer related to concerns, risks or testing, if not included in the formal review comments for the patch itself
- Feedback from customer:
Summarize here any relevant discussions with the customer
Query Patterns
If the feature relates to the optimization of a particular set of queries, provide here examples that were extracted from the sources of information listed above, in particular the Worklog and any MTR test cases provided with the patch
Risks
This section lists any risks inherent to the Worklog, as identified by reviewing the inputs mentioned
Refactoring
- Areas that have been refactored:
List areas that have been refactored in order to accommodate the new functionality
Interactions
- Areas of potential interaction:
List areas in the server that may have an interaction with the feature
- Does the feature contain engine-specific code, hooks or hacks?
List any engine-specific considerations there may be
- Does the feature define a new API?
- Does the feature relate to transactions or transactional integrity?
If yes, is testing required on all storage engines, or using cross-engine transactions or binlog involvement?
- Does the feature relate to any of the core server features?
Do we need to test with views, triggers, stored procedures, functions, events, partitions, foreign keys, time zones
Upgrade/Downgrade
- Does the feature change any on-disk formats?
If yes, testing would be required that upgrading to the new format works properly
- Does the feature change the format of the system tables in the mysql database?
If yes, testing would be required to make sure that any upgrade/downgrade scripts are able to handle the change gracefully
- Does the feature change or retire or deprecate any mysqld configuration options?
List options here
- Is downgrade to be supported?
If using the new feature makes the database non-downgradable, this must be a conscious design decision
User-visible changes
- Does the feature change the mysql protocol?
If yes, full connectors testing is required to make sure the change is compatible
- Does the feature change any result formatting or data types?
If yes, testing via the mysql client is required to verify that the new format or data type properly arrives at the client
- Does the feature change the syntax or the output of a SHOW command?
If yes, clients, connectors and tools that issue that SHOW command and process its output should be tested
- Does the feature change anything with respect to INFORMATION_SCHEMA and PERFORMANCE_SCHEMA ?
Performance and Optimizer
- Does the feature have any performance implications?
Is the feature expected to deliver any performance improvement? Is the feature feared to have a performance impact?
- Does the feature interact with the Optimizer or indexes?
Does the feature change anything related to indexes or data stored in them? Does the feature change anything with respect to any optimizations?
Replication
Does the feature generate or transmit data that may be sent or processed by the slave or tools such as mysqlbinlog?
- Which binary log formats should be tested?
Certain optimizations or scenarios may only trigger for a particular binary log mode
- Are existing replication RQG grammars sufficient to trigger the new code?
- Do we need to test DDL in replication, log rotation and such?
- Does the change affect the master as well, or just the slave? Is a transactional integrity test indicated?
- Do we need to simulate replication failures, server restarts and such?
- Do we need to test cloning new slaves from scratch, binary log replay and other replication maintenance activities?
- Do we need to cover exotic data types, overly large records, statements that are only safe in a particular binlog mode, etc?
Extreme conditions and corner cases
- Does the feature deal with extreme conditions and corner cases?
e.g. huge values, huge tables, huge row lengths, blobs, high concurrencies
- Is any form of stress testing indicated for the feature?
Does the feature have any thread interactions, locking or concurrency mechanisms?
Testing Plan
This section identifies the testing that will be performed based on the risks identified above
Additional MTR tests
Describe any MTR tests that will be implemented in addition to the ones already present in the patch
RQG tests
- Are new Random Query Generator grammars required?
If existing grammars do not cover the new feature
- Are new Validators or Reporters required?
If the existing RQG validation code can not verify the proper operation of the feature
- RQG tests lines to run:
provide actual RQG command lines, along with a description of the desired effect
Manual testing
Describe any manual testing that will be performed involving connectors, clients or installers
Benchmarks and stress tests
Describe any benchmarks and stress tests that will be performed. Include the exact parameters and command lines that will be used
Hand-off
This section identifies final checks to be performed on the test content when it is delivered
Requirements for MTR tests
- Are all new mysqld options, SHOW STATUS variables, I_S tables and such covered with an MTR test?
- Are all new error messages covered with a test?
- Is code coverage acceptable to all parties involved?
Requirements for Random Query Generator tests
- Has the proper set of RQG grammars been chosen or created?
- Do we have a .CC file listing all interesting mysqld options and RQG variants, in order to run a combinations test? Was the CC file pushed into RQG's LP repository?
- Has an at least 24-hour combination test been run?
- Was code coverage run on the RQG test?
- Were all new grammars and Perl modules pushed into the RQG LP repository?
- Were the RQG tests prepared to be buildbot-ready, as opposed to manually-monitored?
- Does the test include any safeguards against false negatives?
e.g. all queries generated are bad, which causes no actual replication, which is still reported as test success
- Are the tests running stably and green in the feature tree's BuildBot?
- Are any bug reports missing reproducible test cases, core files, RQG command lines and the like?
Requirements for Code Coverage
This makes it much faster to review code as one knows what it tested and what one needs to review more carefully
- Ideally all code should be tested by MTR;
- If code can only be reproduced by an RQG test or other external test, sufficient information must be provided in the Worklog for the test to be reproducible;
- All code should either be tested or reviewed and marked with /* purecov inspected */.
- Code that is suspected to be unreachable should have an DBUG_ASSERT(0) and be marked with /* purecov: deadcode */.
- Was an MTR code coverage report for the feature tree generated by fedora13.selfip.org/lcov?
- Has the dgcov coverage tool been run?
- Are there any notable regressions in the code coverage?
- Was code coverage run on the RQG tests alone, along with an effort to increase it?
- Is the code covered by comments?
Requirements for the code trees
- Was the stable tree merged into the feature tree recently?
- Were all RQG tests pushed into lp:randgen and documented on the RQG manual pages?
- Were all RQG or other automatable tests installed in BuildBot?
- Is the feature tree BuildBot green?
Requirements for closing the Worklog as complete
- Was the Worklog updated with information on the testing performed?
- Was the QA check box in the Worklog ticked?
- Were all related bugs on Launchpad closed? This includes A) bugs that were tagged to belong to the feature tree, B) bugs assigned to the developer regardless of tree, C) bugs assigned to the tester and D) and orphan bugs just hanging around the same theme or time frame;
- Did the WL complete the entire review procedure, including any back-and-forth emails?
- Has the new feature been documented?

