WorkLog Frontpage Log in / Register
High-Level Description | Task Dependencies | High-Level Specification | Low-Level Design | File Attachments | User Comments | Time Estimates | Funding and Votes | Progress Reports

 Cross-engine consistency for START TRANSACTION WITH CONSISTENT SNAPSHOT
Title
Task ID136
Queue
Version N/A
Status
PriorityN/A
Copies toSergei

Created byKnielsen03 Sep 2010Done
Supervisor N/A  
Lead Architect    
Architecture Review  
Implementor  
Code Review  
QA 17 Feb 2011
Documentation  
 High-Level Description
With current MySQL/MariaDB, there is no imposed commit ordering. So even with
START TRANSACTION WITH CONSISTENT SNAPSHOT, you can end up with a snapshot
where in engine E, A is committed but not B; and in engine F, B is committed
but not A.

MWL#116 introduces consistent commit order, making this inconsistency
impossible. A and B will be consistently ordered (say A commits before B), and
it is not possible to see <B committed but not A> in any engine.

But there is still another inconsistency possible, where we get a snapshot
where both A and B are committed in one engine, but only A is committed in
another.

This worklog is about making START TRANSACTION WITH CONSISTENT SNAPSHOT always
create a snapshot with a consistent picture between engines about which
transaction is committed and which is not.

Additionally, the binlog should participate in START TRANSACTION WITH
CONSISTENT SNAPSHOT so that the user can get a binlog position consistent with
the snapshot created (cheaply, without any further locking). This will allow
to implement `mysqldump --consistent-snapshot` backup (or similar) that will
be able to create a backup suitable for provisioning a slave without the need
to issue FLUSH TABLES WITH READ LOCK (which can blocks queries for some time).
Such binlog position might be exposed as new binlog_snapshot_file and
binlog_snapshot_position status variables; it is probably a bad idea to
introduce transactional semantics for SHOW MASTER STATUS.

The change will work for storage engines which support MVCC, and which
implement the commit_ordered() handler method of MWL#116 (but not for other
engines).

In order to test this, it will be necessary that the PBXT engine (or some
other engine, but PBXT seems the obvious candidate) is made to implement
commit_ordered().
 Task Dependencies
Others waiting for Task 136Task 136 is waiting forGraph
116 Efficient group commit for binary log 
 
 High-Level Specification
When the user runs START TRANSACTION WITH CONSISTENT SNAPSHOT, every MVCC
engine that implements this (currently PBXT and InnoDB/XtraDB) as well as
binlog will atomically record a snapshot of the state of committed
transactions.

Then subsequent SELECTs against PBXT and InnoDB will see exactly those
transactions committed before taking the snapshot, and no others.

Introduce two new status variables for the binlog:

    binlog_snapshot_file
    binlog_snapshot_position

These provide the binlog position like SHOW MASTER STATUS, but if running in a
transaction started with START TRANSACTION WITH CONSISTENT SNAPSHOT, they
provide the binlog position corresponding to the exact point in time that the
snapshot was taken. Thus this binlog position includes exactly those commits
that are visible from the given snapshot in InnoDB and PBXT, and none other.

Then `mysqldump --single-transaction --master-data` does not need to take a
global read lock (LOCK TABLES WITH READ LOCK); just reading the consistent
binary position from above status variables is enough.

This allows a fully non-locking mysqldump backup for provisioning a slave. It
avoids the problem with LOCK TABLES WITH READ LOCK and long-running queries;
LOCK TABLES WITH READ LOCK waits for all queries to finish, and while it does
this, new queries (even read-only queries) cannot start.
 Low-Level Design
With MWL#116, for engines that implement commit_ordered(), the part of the
code that makes the committed transaction visible runs like this:

    lock(LOCK_commit_ordered);
    for e in <all participating engines>
        e->commit_ordered()
    unlock(LOCK_commit_ordered);

This means that we should be able to ensure cross-engine consistency for
START TRANSACTION WITH CONSISTENT SNAPSHOT simply by holding
LOCK_commit_ordered() while taking the snapshot in each engine.

Then also the second type of inconsistency is impossible: the short window
where B is visible in one engine and invisible in another is protected by
LOCK_commit_ordered, so START TRANSACTION WITH CONSISTENT SNAPSHOT will wait
for all engines to make the transaction visible before proceeding (and
subsequent commits will wait for START TRANSACTION WITH CONSISTENT SNAPSHOT
before making new commits visible).

Consistent snapshot for binlog is implemented as follows:

1. Introduce a global "last committed binlog position". This is updated
   whenever events are flushed to the binary log, eg. after group commit
   and in write(Event *).

2. Introduce in binlog_trx_data a "last committed binlog position" for
   a given transaction.

3. Implement the start_consistent_snapshot method for the binlog
   handlerton. This will copy the last committed binlog position from the
   global state into the transaction local one. Since START TRANSACTION WITH
   CONSISTENT SNAPSHOT runs under protection of LOCK_commit_ordered, this will
   ensure that this binlog position is consistent with the snapshot taken in
   other engines.

4. For the status variables binlog_snapshot_file and binlog_snapshot_pos, copy
   the values from the transaction-local state set by START TRANSACTION WITH
   CONSISTENT SNAPSHOT. If not set, just return the values from the global
   state.

Finally, modify mysqldump to avoid FLUSH TABLES WITH READ LOCK when
possible. If run with --single-transaction and --master-data, mysqldump will
run SHOW STATUS LIKE "binlog_snapshot_%" against the server. If it finds the
new status variables, this means that the feature in this worklog is present
on the server. In this case, mysqldump will not run FLUSH TABLES WITH READ
LOCK, and it will obtain the necessary binlog position using SHOW STATUS LIKE
"binlog_snapshot_%" rather than SHOW MASTER STATUS.
 File Attachments
 NameTypeSizeByDate
 User Comments
 Time Estimates
NameHours WorkedLast Updated
All Sub Tasks74 
Total74 
 Hrs WorkedProgressCurrentOriginal
Sub Tasks7400
Total7400
 
 Funding and Votes
Votes: 0: 0%
 Make vote: Useless    Nice to have    Important    Very important    

Funding: 0 offers, total 0 Euro
 Progress Reports
(Knielsen - Tue, 17 Apr 2012, 12:58
    
Status updated.
--- /tmp/wklog.136.old.29470	2012-04-17 12:58:55.000000000 +0000
+++ /tmp/wklog.136.new.29470	2012-04-17 12:58:55.000000000 +0000
@@ -1,2 +1,2 @@
-Code-Review
+Complete
 

(Knielsen - Thu, 17 Feb 2011, 10:51
    
QA updated: Psergey -> Pstoev

(Pstoev - Thu, 17 Feb 2011, 10:32
    
QA updated:  -> Psergey
QA signoff

(Knielsen - Mon, 08 Nov 2010, 10:58
    
Low Level Design modified.
--- /tmp/wklog.136.old.25206	2010-11-08 10:58:21.000000000 +0000
+++ /tmp/wklog.136.new.25206	2010-11-08 10:58:21.000000000 +0000
@@ -33,17 +33,17 @@
    ensure that this binlog position is consistent with the snapshot taken in
    other engines.
 
-4. For the status variables binlog_trx_file and binlog_trx_pos, copy the
-   values from the transaction-local state set by START TRANSACTION WITH
+4. For the status variables binlog_snapshot_file and binlog_snapshot_pos, copy
+   the values from the transaction-local state set by START TRANSACTION WITH
    CONSISTENT SNAPSHOT. If not set, just return the values from the global
    state.
 
 Finally, modify mysqldump to avoid FLUSH TABLES WITH READ LOCK when
 possible. If run with --single-transaction and --master-data, mysqldump will
-run SHOW STATUS LIKE "binlog_trx_%" against the server. If it finds the new
-status variables, this means that the feature in this worklog is present on
-the server. In this case, mysqldump will not run FLUSH TABLES WITH READ LOCK,
-and it will obtain the necessary binlog position using SHOW STATUS LIKE
-"binlog_trx_%" rather than SHOW MASTER STATUS.
+run SHOW STATUS LIKE "binlog_snapshot_%" against the server. If it finds the
+new status variables, this means that the feature in this worklog is present
+on the server. In this case, mysqldump will not run FLUSH TABLES WITH READ
+LOCK, and it will obtain the necessary binlog position using SHOW STATUS LIKE
+"binlog_snapshot_%" rather than SHOW MASTER STATUS.
 
 

(Knielsen - Mon, 08 Nov 2010, 10:57
    
High-Level Specification modified.
--- /tmp/wklog.136.old.25043	2010-11-08 10:57:33.000000000 +0000
+++ /tmp/wklog.136.new.25043	2010-11-08 10:57:33.000000000 +0000
@@ -8,8 +8,8 @@
 
 Introduce two new status variables for the binlog:
 
-    binlog_trx_file
-    binlog_trx_position
+    binlog_snapshot_file
+    binlog_snapshot_position
 
 These provide the binlog position like SHOW MASTER STATUS, but if running in a
 transaction started with START TRANSACTION WITH CONSISTENT SNAPSHOT, they

(Knielsen - Mon, 08 Nov 2010, 10:57
    
High Level Description modified.
--- /tmp/wklog.136.old.24953	2010-11-08 10:57:11.000000000 +0000
+++ /tmp/wklog.136.new.24953	2010-11-08 10:57:11.000000000 +0000
@@ -21,9 +21,9 @@
 to implement `mysqldump --consistent-snapshot` backup (or similar) that will
 be able to create a backup suitable for provisioning a slave without the need
 to issue FLUSH TABLES WITH READ LOCK (which can blocks queries for some time).
-Such binlog position might be exposed as new binlog_trx_file and
-binlog_trx_position status variables; it is probably a bad idea to introduce
-transactional semantics for SHOW MASTER STATUS.
+Such binlog position might be exposed as new binlog_snapshot_file and
+binlog_snapshot_position status variables; it is probably a bad idea to
+introduce transactional semantics for SHOW MASTER STATUS.
 
 The change will work for storage engines which support MVCC, and which
 implement the commit_ordered() handler method of MWL#116 (but not for other

(Knielsen - Sun, 07 Nov 2010, 21:53
    
Status updated.
--- /tmp/wklog.136.old.24244	2010-11-07 21:53:45.000000000 +0000
+++ /tmp/wklog.136.new.24244	2010-11-07 21:53:45.000000000 +0000
@@ -1,2 +1,2 @@
-Un-Assigned
+Code-Review
 

(Knielsen - Sun, 07 Nov 2010, 21:53
    
Lead Architect updated:  -> Knielsen
Implementor updated:  -> Knielsen
Code Review updated:  -> Sergei

(Knielsen - Sun, 07 Nov 2010, 21:45
    
Low Level Design modified.
--- /tmp/wklog.136.old.23591	2010-11-07 21:45:09.000000000 +0000
+++ /tmp/wklog.136.new.23591	2010-11-07 21:45:09.000000000 +0000
@@ -38,5 +38,12 @@
    CONSISTENT SNAPSHOT. If not set, just return the values from the global
    state.
 
+Finally, modify mysqldump to avoid FLUSH TABLES WITH READ LOCK when
+possible. If run with --single-transaction and --master-data, mysqldump will
+run SHOW STATUS LIKE "binlog_trx_%" against the server. If it finds the new
+status variables, this means that the feature in this worklog is present on
+the server. In this case, mysqldump will not run FLUSH TABLES WITH READ LOCK,
+and it will obtain the necessary binlog position using SHOW STATUS LIKE
+"binlog_trx_%" rather than SHOW MASTER STATUS.
 
 

(Knielsen - Fri, 05 Nov 2010, 12:29
    
High-Level Specification modified.
--- /tmp/wklog.136.old.1393	2010-11-05 12:29:38.000000000 +0000
+++ /tmp/wklog.136.new.1393	2010-11-05 12:29:38.000000000 +0000
@@ -1,20 +1,29 @@
-With MWL#116, for engines that implement commit_ordered(), the part of the
-code that makes the committed transaction visible runs like this:
+When the user runs START TRANSACTION WITH CONSISTENT SNAPSHOT, every MVCC
+engine that implements this (currently PBXT and InnoDB/XtraDB) as well as
+binlog will atomically record a snapshot of the state of committed
+transactions.
 
-    lock(LOCK_commit_ordered);
-    for e in <all participating engines>
-        e->commit_ordered()
-    unlock(LOCK_commit_ordered);
-
-This means that we should be able to ensure cross-engine consistency for
-START TRANSACTION WITH CONSISTENT SNAPSHOT simply by holding
-LOCK_commit_ordered() while taking the snapshot in each engine.
-
-Then also the second type of inconsistency is impossible: the short window
-where B is visible in one engine and invisible in another is protected by
-LOCK_commit_ordered, so START TRANSACTION WITH CONSISTENT SNAPSHOT will wait
-for all engines to make the transaction visible before proceeding (and
-subsequent commits will wait for START TRANSACTION WITH CONSISTENT SNAPSHOT
-before making new commits visible).
+Then subsequent SELECTs against PBXT and InnoDB will see exactly those
+transactions committed before taking the snapshot, and no others.
+
+Introduce two new status variables for the binlog:
+
+    binlog_trx_file
+    binlog_trx_position
+
+These provide the binlog position like SHOW MASTER STATUS, but if running in a
+transaction started with START TRANSACTION WITH CONSISTENT SNAPSHOT, they
+provide the binlog position corresponding to the exact point in time that the
+snapshot was taken. Thus this binlog position includes exactly those commits
+that are visible from the given snapshot in InnoDB and PBXT, and none other.
+
+Then `mysqldump --single-transaction --master-data` does not need to take a
+global read lock (LOCK TABLES WITH READ LOCK); just reading the consistent
+binary position from above status variables is enough.
+
+This allows a fully non-locking mysqldump backup for provisioning a slave. It
+avoids the problem with LOCK TABLES WITH READ LOCK and long-running queries;
+LOCK TABLES WITH READ LOCK waits for all queries to finish, and while it does
+this, new queries (even read-only queries) cannot start.
 
 
-- View All Progress Notes (15 total) --


Report Generator:
 
Saved Reports:

WorkLog v4.0.0
  © 2010  Sergei Golubchik and Monty Program AB
  © 2004  Andrew Sweger <yDNA@perlocity.org> and Addnorya
  © 2003  Matt Wagner <matt@mysql.com> and MySQL AB