WorkLog Frontpage Log in / Register
High-Level Description | Task Dependencies | High-Level Specification | Low-Level Design | File Attachments | User Comments | Time Estimates | Funding and Votes | Progress Reports

 DS-MRR for clustered PKs: more efficient buffer use
Title
Task ID123
Queue
Version N/A
Status
PriorityN/A
Copies to
Created byPsergey03 Jul 2010Done
Supervisor N/A  
Lead Architect    
Architecture Review  
Implementor  
Code Review  
QA  
Documentation  
 High-Level Description
Let DS-MRR/CPK implementation use buffer space more efficiently. See HLS for
details.
 Task Dependencies
Others waiting for Task 123Task 123 is waiting forGraph
91 MariaDB 5.3
125 Make DS-MRR sort the ranges before scanning the index
 
 High-Level Specification
1. Context
----------
After looking at MWL#121 implementation, Igor had a comment that DS-MRR/CPK
code does unnecessary data copying, as shown in this example:

Consider a query:

  SELECT * FROM t1, t2 WHERE t2.primary_key=t1.col

Suppose the join order is (t1, t2), and t2's access method is 
eq_ref(t2.primary_key=t1.col), which done with Batched Key Access + DS-MRR/CPK.
The execution will proceed according to this scenario:

1. Scan table t1, accumulate values of t1.* in the join buffer 
2. Start an MRR scan on table t2 on t2.primary_key=t1.col.  If t2's primary key
   is clustered, DS-MRR/CPK will be used.
2.1 DS-MRR/CPK will run through the provided sequence(t1.col), collect values
    of t1.col in an array and then sort it.
2.2 DS-MRR/CPK will then do a set of t2.primary_key=t1.col lookups in disk
    order.

It was stated that we have a problem on step 2.1, where we run through the 
sequence of lookup keys and copy them into array: This step is 
 - necessary when BKA module calculates lookup keys on the fly, but
 - looks redundant when BKA module's buffer contains 
       
          array{ t1.colX, ... t1.col[=needed lookup key] ,... }

   in its buffer, i.e. there is already a directly-addressable array (w/ gaps)
   of lookup keys somewhere in the memory.

2. Suggested remedy
-------------------

The suggested remedy was as follows:
- Let BKA module inform MRR that lookup keys are directly addressable in its
  buffers
- Let DS-MRR/CPK implementation take advantage of that by storing/sorting key
  pointers.

This strategy will be used only when BKA code actually has the lookup keys in
the array. If it doesn't, current MWL#121 code is used. 

2.1 SQL Layer part
~~~~~~~~~~~~~~~~~~
Within the scope of this WL
- multi_range_read_info() will inform the SQL layer about whether MRR
  implementation would benefit from materialized materialized keys

- SQL layer will inform MRR implementation in multi_range_read_init() about
  whether the sequence has materialized ranges. If that is the case, this WL's
  code will be used.


2.2 MRR part
~~~~~~~~~~~~
When MRR implementation is provided with materialized keys, it will take
advantage of that by sorting not (key,range_id) pairs but (key_pointer,
range_id) pairs.
 Low-Level Design
 File Attachments
 NameTypeSizeByDate
 User Comments
 Time Estimates
NameHours WorkedLast Updated
Total0 
 Hrs WorkedProgressCurrentOriginal
Total000
 
 Funding and Votes
Votes: 0: 0%
 Make vote: Useless    Nice to have    Important    Very important    

Funding: 0 offers, total 0 Euro
 Progress Reports
(Psergey - Sat, 09 Oct 2010, 10:58
    
Dependency created: WL#125 now depends on WL#123

(Psergey - Sat, 09 Oct 2010, 10:54
    
Dependency created: WL#91 now depends on WL#123

(Psergey - Sat, 09 Oct 2010, 10:54
    
Code Review updated:  -> Igor

(Psergey - Tue, 13 Jul 2010, 22:45
    
High-Level Specification modified.
--- /tmp/wklog.123.old.4441	2010-07-13 22:45:09.000000000 +0000
+++ /tmp/wklog.123.new.4441	2010-07-13 22:45:09.000000000 +0000
@@ -1,75 +1,59 @@
+1. Context
+----------
+After looking at MWL#121 implementation, Igor had a comment that DS-MRR/CPK
+code does unnecessary data copying, as shown in this example:
 
-1. Buffer usage
----------------
+Consider a query:
 
-1.1 Efficient buffer space use problem
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-Suppose we're given a sequence of $N ranges, and $SIZE kbytes of memory. The
-buffer will be used for
-1. storage/sorting of lookup keys
-2. storage/sorting of rowids
-
-If we take a simple approach, we divide the buffer into two parts, one for
-lookup keys and another for rowids. The problem is that we don't know for 
-certain how many rowids we will get for each lookup key, so we will either
-not be able use the full buffer or will run out of buffer space and will
-have to do two disk-ordered reads instead of one.
-
-We could re-use the space occupied by lookup keys to store rowids (actually,
-<rowid, range_id> pairs) but that doesn't solve the problem of not being able
-to tell how many lookup keys we should have in one sort+sweep-read step.
+  SELECT * FROM t1, t2 WHERE t2.primary_key=t1.col
 
-1.2 Solution
-~~~~~~~~~~~~
-The buffer should be divided in proportion of expected sizes: lookup keys 
-occupy 
-
-   (sizeof(key) + sizeof(range_id)) * n_keys                      (1)
-
-rowids will occupy
+Suppose the join order is (t1, t2), and t2's access method is 
+eq_ref(t2.primary_key=t1.col), which done with Batched Key Access + DS-MRR/CPK.
+The execution will proceed according to this scenario:
 
-   (sizeof(rowid) + sizeof(range_id)) * rec_per_key * n_keys      (2)
+1. Scan table t1, accumulate values of t1.* in the join buffer 
+2. Start an MRR scan on table t2 on t2.primary_key=t1.col.  If t2's primary key
+   is clustered, DS-MRR/CPK will be used.
+2.1 DS-MRR/CPK will run through the provided sequence(t1.col), collect values
+    of t1.col in an array and then sort it.
+2.2 DS-MRR/CPK will then do a set of t2.primary_key=t1.col lookups in disk
+    order.
 
-(TODO: what to do if rec_per_key value is unknown?)
-We can divide both (1) and (2) by n_keys and that will give the needed
-proportion.
+It was stated that we have a problem on step 2.1, where we run through the 
+sequence of lookup keys and copy them into array: This step is 
+ - necessary when BKA module calculates lookup keys on the fly, but
+ - looks redundant when BKA module's buffer contains 
 
+          array{ t1.colX, ... t1.col[=needed lookup key] ,... }
 
-We will take advantage of the fact that {key, range_id} pairs for which we got 
-{rowid, range_id} pairs are not needed anymore. This will be done as follows:
+   in its buffer, i.e. there is already a directly-addressable array (w/ gaps)
+   of lookup keys somewhere in the memory.
 
-- Store {key, range_id} pairs from the start of the buffer
-- sort them in reverse order
-- start index-ordered read (that would require traversing the sorted array
-  backwards)
-- keep track of how much more space becomes free after we've done lookup for
-  the next {key, range_id} pair.
-- let the accumulated array of {rowid, range_id} pairs to grow from the end of
-  the buffer.
+2. Suggested remedy
+-------------------
 
-Using this scheme, one array will be at start of the buffer and gradually
-shrinking and the other at end of the buffer and gradually growing.  Overflow
-detection code should make sure that arrays never have any intersection with
-one another.
+The suggested remedy was as follows:
+- Let BKA module inform MRR that lookup keys are directly addressable in its
+  buffers
+- Let DS-MRR/CPK implementation take advantage of that by storing/sorting key
+  pointers.
 
-2. Grouping identical keys 
---------------------------
-BKA could possibly supply multiple identical keys, that is, MRR implementation
-will get many pairs like:
+This strategy will be used only when BKA code actually has the lookup keys in
+the array. If it doesn't, current MWL#121 code is used. 
    
-   {keyX, rangeY1}, {keyX, rangeY2}, {keyX, rangeY3}, ...
+2.1 SQL Layer part
+~~~~~~~~~~~~~~~~~~
+Within the scope of this WL
+- multi_range_read_info() will inform the SQL layer about whether MRR
+  implementation would benefit from materialized materialized keys
 
-since we do index reads in key order, we'll be able to detect this case 
-(compare current key with the next/previous) and avoid making multiple lookups
-with the same value of keyX.
+- SQL layer will inform MRR implementation in multi_range_read_init() about
+  whether the sequence has materialized ranges. If that is the case, this WL's
+  code will be used.
 
-This, however, will not be implemented within scope of this WL entry. We'll
-rely on the fact that storage engine makes multiple repetitive index lookups to
-be sufficiently fast.
-
-3. User control
----------------
-Introduce an @@optimizer_switch flag to control the behavior.
-Flag name:      mrr_sort_keys
-Default value:  ON
 
+2.2 MRR part
+~~~~~~~~~~~~
+When MRR implementation is provided with materialized keys, it will take
+advantage of that by sorting not (key,range_id) pairs but (key_pointer,
+range_id) pairs.

(Psergey - Tue, 13 Jul 2010, 22:38
    
High-Level Specification modified.
--- /tmp/wklog.123.old.4251	2010-07-13 22:38:08.000000000 +0000
+++ /tmp/wklog.123.new.4251	2010-07-13 22:38:08.000000000 +0000
@@ -1,44 +1,75 @@
-1. Context
-----------
-After looking at MWL#121 implementation, Igor had a comment that DS-MRR/CPK
-code does unnecessary data copying, as shown in this example:
-
-Consider a query:
-
-  SELECT * FROM t1, t2 WHERE t2.primary_key=t1.col
-
-Suppose the join order is (t1, t2), and t2's access method is 
-eq_ref(t2.primary_key=t1.col), which done with Batched Key Access + DS-MRR/CPK.
-The execution will proceed according to this scenario:
-
-1. Scan table t1, accumulate values of t1.* in the join buffer 
-2. Start an MRR scan on table t2 on t2.primary_key=t1.col.  If t2's primary key
-   is clustered, DS-MRR/CPK will be used.
-2.1 DS-MRR/CPK will run through the provided sequence(t1.col), collect values
-    of t1.col in an array and then sort it.
-2.2 DS-MRR/CPK will then do a set of t2.primary_key=t1.col lookups in disk
-    order.
-
-It was stated that we have a problem on step 2.1, where we run through the 
-sequence of lookup keys and copy them into array: This step is 
- - necessary when BKA module calculates lookup keys on the fly, but
- - looks redundant when BKA module's buffer contains 
-       
-          array{ t1.colX, ... t1.col[=needed lookup key] ,... }
-
-   in its buffer, i.e. there is already a directly-addressable array (w/ gaps)
-   of lookup keys somewhere in the memory.
-
-The suggested remedy was as follows:
-
-1.1 AccessKeysByPointersProposal
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-- Let BKA module inform MRR that lookup keys are directly addressable in its
-  buffers
-- Let DS-MRR/CPK implementation take advantage of that by storing/sorting key
-  pointers.
 
-This strategy will be used only when BKA code actually has the lookup keys in
-the array. If it doesn't, current MWL#121 code is used. 
+1. Buffer usage
+---------------
 
+1.1 Efficient buffer space use problem
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Suppose we're given a sequence of $N ranges, and $SIZE kbytes of memory. The
+buffer will be used for
+1. storage/sorting of lookup keys
+2. storage/sorting of rowids
+
+If we take a simple approach, we divide the buffer into two parts, one for
+lookup keys and another for rowids. The problem is that we don't know for 
+certain how many rowids we will get for each lookup key, so we will either
+not be able use the full buffer or will run out of buffer space and will
+have to do two disk-ordered reads instead of one.
+
+We could re-use the space occupied by lookup keys to store rowids (actually,
+<rowid, range_id> pairs) but that doesn't solve the problem of not being able
+to tell how many lookup keys we should have in one sort+sweep-read step.
+
+1.2 Solution
+~~~~~~~~~~~~
+The buffer should be divided in proportion of expected sizes: lookup keys 
+occupy 
+
+   (sizeof(key) + sizeof(range_id)) * n_keys                      (1)
+
+rowids will occupy
+
+   (sizeof(rowid) + sizeof(range_id)) * rec_per_key * n_keys      (2)
+
+(TODO: what to do if rec_per_key value is unknown?)
+We can divide both (1) and (2) by n_keys and that will give the needed
+proportion.
+
+
+We will take advantage of the fact that {key, range_id} pairs for which we got 
+{rowid, range_id} pairs are not needed anymore. This will be done as follows:
+
+- Store {key, range_id} pairs from the start of the buffer
+- sort them in reverse order
+- start index-ordered read (that would require traversing the sorted array
+  backwards)
+- keep track of how much more space becomes free after we've done lookup for
+  the next {key, range_id} pair.
+- let the accumulated array of {rowid, range_id} pairs to grow from the end of
+  the buffer.
+
+Using this scheme, one array will be at start of the buffer and gradually
+shrinking and the other at end of the buffer and gradually growing.  Overflow
+detection code should make sure that arrays never have any intersection with
+one another.
+
+2. Grouping identical keys 
+--------------------------
+BKA could possibly supply multiple identical keys, that is, MRR implementation
+will get many pairs like:
+   
+   {keyX, rangeY1}, {keyX, rangeY2}, {keyX, rangeY3}, ...
+
+since we do index reads in key order, we'll be able to detect this case 
+(compare current key with the next/previous) and avoid making multiple lookups
+with the same value of keyX.
+
+This, however, will not be implemented within scope of this WL entry. We'll
+rely on the fact that storage engine makes multiple repetitive index lookups to
+be sufficiently fast.
+
+3. User control
+---------------
+Introduce an @@optimizer_switch flag to control the behavior.
+Flag name:      mrr_sort_keys
+Default value:  ON
 

(Psergey - Mon, 05 Jul 2010, 00:13
    
Privacy level updated.
--- /tmp/wklog.123.old.8178	2010-07-05 00:13:46.000000000 +0000
+++ /tmp/wklog.123.new.8178	2010-07-05 00:13:46.000000000 +0000
@@ -1 +1 @@
-y
+n

(Psergey - Sat, 03 Jul 2010, 16:41
    
High-Level Specification modified.
--- /tmp/wklog.123.old.27909	2010-07-03 16:41:16.000000000 +0000
+++ /tmp/wklog.123.new.27909	2010-07-03 16:41:16.000000000 +0000
@@ -39,6 +39,6 @@
   pointers.
 
 This strategy will be used only when BKA code actually has the lookup keys in
-the array. If it doesn't, current MWL#21 code is used. 
+the array. If it doesn't, current MWL#121 code is used. 
 
 

(Psergey - Sat, 03 Jul 2010, 16:38
    
High-Level Specification modified.
--- /tmp/wklog.123.old.27673	2010-07-03 16:38:04.000000000 +0000
+++ /tmp/wklog.123.new.27673	2010-07-03 16:38:04.000000000 +0000
@@ -1 +1,44 @@
+1. Context
+----------
+After looking at MWL#121 implementation, Igor had a comment that DS-MRR/CPK
+code does unnecessary data copying, as shown in this example:
+
+Consider a query:
+
+  SELECT * FROM t1, t2 WHERE t2.primary_key=t1.col
+
+Suppose the join order is (t1, t2), and t2's access method is 
+eq_ref(t2.primary_key=t1.col), which done with Batched Key Access + DS-MRR/CPK.
+The execution will proceed according to this scenario:
+
+1. Scan table t1, accumulate values of t1.* in the join buffer 
+2. Start an MRR scan on table t2 on t2.primary_key=t1.col.  If t2's primary key
+   is clustered, DS-MRR/CPK will be used.
+2.1 DS-MRR/CPK will run through the provided sequence(t1.col), collect values
+    of t1.col in an array and then sort it.
+2.2 DS-MRR/CPK will then do a set of t2.primary_key=t1.col lookups in disk
+    order.
+
+It was stated that we have a problem on step 2.1, where we run through the 
+sequence of lookup keys and copy them into array: This step is 
+ - necessary when BKA module calculates lookup keys on the fly, but
+ - looks redundant when BKA module's buffer contains 
+       
+          array{ t1.colX, ... t1.col[=needed lookup key] ,... }
+
+   in its buffer, i.e. there is already a directly-addressable array (w/ gaps)
+   of lookup keys somewhere in the memory.
+
+The suggested remedy was as follows:
+
+1.1 AccessKeysByPointersProposal
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+- Let BKA module inform MRR that lookup keys are directly addressable in its
+  buffers
+- Let DS-MRR/CPK implementation take advantage of that by storing/sorting key
+  pointers.
+
+This strategy will be used only when BKA code actually has the lookup keys in
+the array. If it doesn't, current MWL#21 code is used. 
+
 


Report Generator:
 
Saved Reports:

WorkLog v4.0.0
  © 2010  Sergei Golubchik and Monty Program AB
  © 2004  Andrew Sweger <yDNA@perlocity.org> and Addnorya
  © 2003  Matt Wagner <matt@mysql.com> and MySQL AB