The world's most popular open source database
Robin Schumacher is MySQL's Director of Product Management and has over 13 years of database experience in DB2, MySQL, Oracle, SQL Server and other database engines. Before joining MySQL, Robin was Vice President of Product Management at Embarcadero Technologies.
In Part 1 of this article series, we looked at how the Falcon transactional storage engine was architected and how it compared to some of the other MySQL storage engines. In this article, I’ll focus on how Falcon performs transaction management, including some special points on where Falcon differs from some of MySQL’s other transactional engines.
The first thing to understand about transactions and Falcon is that the engine takes a multi-generational approach to managing both transactions and concurrency. This means that the engine keeps multiple iterations/generations of rows available in memory to ensure the highest possible levels of uninterrupted data access.
Falcon supports ACID-level transactional operations, which are handled in memory via the record cache. Once in-process transactions are committed, the operations are flushed to the Falcon Log for asynchronous application to the database files. The only exceptions to this rule are new BLOBs, which are immediately applied to the database files.
Falcon differs from proprietary databases such as Oracle as well as other open source RDBMS’s like Firebird in that all transactional operations are kept in memory inside the record cache. Falcon maintains a 4-byte transaction ID in the record header of the rows involved in the transaction. When a transaction commits, its transaction id is stored in the log at the front of the block containing the committed version of every record it changed, and the block containing its index changes. To provide each transaction a stable snapshot of the data that existed the moment the transaction was submitted, the Falcon record cache may contain multiple versions of data.
As was mentioned, each row in the record cache contains a transaction ID of the transaction that created it. Rows with older versions of data include a pointer to the older version of the data, and deleted rows are represented in memory by a row header with no data and contain a flag that is set to indicate a deleted row. The log entry for a deleted record is a record number with a flag indicating that the record was deleted by a committed transaction.
Falcon offers auto/non-auto commit (which is accomplished above the storage engine layer in MySQL), and provides savepoints and group commit for intelligent transaction control. Distributed transactions/two-phase commit are not in the alpha version of Falcon, but will be supported before the GA release.
Falcon is a multi-generational transaction engine that uses MVCC (multi-version concurrency control) as its primary concurrency control mechanism. This means that readers never block writers and vice versa, with the end result being access to data whenever it is needed.
The Falcon engine defaults to the repeatable read isolation mode level with no phantoms being possible. When a transaction attempts to modify or delete a row that another in-process transaction has already obtained, the previous transaction will wait until the other transaction either commits or rolls back. If the first transaction commits, then the other transaction receives an error and must try again. However, if the first transaction rolls back, then the other transaction will succeed. It’s important to note that all transactions will always read and operate on consistent data, which mirrors how many proprietary databases like Oracle operate.
For example, suppose one MySQL session performs the following actions:
mysql> create table t (c1 int) engine=falcon; Query OK, 0 rows affected (0.05 sec) mysql> insert into t values (1),(2),(3); Query OK, 3 rows affected (0.00 sec) Records: 3 Duplicates: 0 Warnings: 0 mysql> set autocommit=0; Query OK, 0 rows affected (0.00 sec) mysql> update t set c1=4 where c1=1; Query OK, 1 row affected (0.00 sec) Rows matched: 1 Changed: 1 Warnings: 0
Note that the session above has not committed its update. Now suppose another session logs on and attempts to update the same row as session one:
mysql> update t set c1=5 where c1=1;
The second session will wait until the first session either commits or rolls back its change. Suppose the first session commits its change; the following will occur in the second session:
mysql> update t set c1=5 where c1=1; ERROR 1020 (HY000): Record has changed since last read in table 't'
As shown, an error is returned to the second session because the first session altered the same data that was the second session’s target. Had the first session rolled back its change, Falcon would have allowed the second session’s update to go through. Note that this behavior does differ from the InnoDB storage engine in that InnoDB would not have returned an error back to session two – the update would have gone through (if a timeout had not occurred) with zero rows having been affected.
InnoDB implements MVCC with a combination of multiple versions of record and write locks on key ranges and records. In Repeatable Read transaction isolation mode, InnoDB has update anomalies that require the use of the non-standard
SELECT … FOR UPDATE syntax. The result of a simple
SELECT and a
SELECT … FOR UPDATE may differ within the same repeatable read transaction.
Falcon uses versions of records and transaction identifiers to provide repeatable reads and to recognize and prevent inconsistent updates by concurrent transactions. Repeatable read is handled by providing each transaction with the version of each record that was committed when the transaction started.
Falcon manages concurrent updates by recognizing that the most current version of a record is not visible to the transaction that wants to modify or delete the record. In that situation, Falcon causes the second transaction to wait for the first to finish. If the first transaction commits, the second transaction gets an error indicating that its update or delete will violate transaction isolation. If the first transaction rolls back, the second transaction’s update or delete succeeds.
What this means is that under some cases where InnoDB allows an update or delete to wait and succeed, Falcon will cause it to wait and fail. On the other hand, Falcon’s read is actually repeatable with no special clauses needed to allow it to return different values for a record it had previously read. Nor is it possible in Falcon, as it is in InnoDB, to overwrite changes made by a concurrent transaction – changes which are not visible to the overwriting transaction.
Besides repeatable read, the most common lock isolation level is read committed (which Falcon also supports in alpha; the only planned isolation level for GA that is not in the Alpha version is serializable).
With respect to locking conflicts, the MySQL Falcon engine uses a sophisticated graphing model for deadlock detection that uses a lock table to resolve lock dependencies and deadlock issues. This form of deadlock resolution is superior to the standard timeout and similar models that other databases systems use.
Because Falcon is a memory-based MVCC database (yes, it can page transactions to disk if necessary), the engine offers very fast rollback capabilities should the need to undo data changes arise. Being an Oracle DBA for years, I can’t tell you the number of times I suffered through agonizingly slow rollback times for major data changes – rollbacks that sometimes took double the time of the actual transaction itself.
Such is not the case with Falcon. To prove this point, let’s work through a quick exercise. Let’s pump a little over one million rows into an InnoDB table, perform a rollback, and then do the same thing with Falcon:
mysql> show create table big_table\G *************************** 1. row *************************** Table: big_table Create Table: CREATE TABLE `big_table` ( `client_transaction_id` int(11) NOT NULL DEFAULT '0', `client_id` int(11) NOT NULL DEFAULT '0', `investment_id` int(11) NOT NULL DEFAULT '0', `action` varchar(10) NOT NULL DEFAULT '', `price` decimal(12,2) NOT NULL DEFAULT '0.00', `number_of_units` int(11) NOT NULL DEFAULT '0', `transaction_status` varchar(10) NOT NULL DEFAULT '', `transaction_sub_timestamp` datetime NOT NULL, `transaction_comp_timestamp` datetime NOT NULL, `description` varchar(200) DEFAULT NULL, `broker_id` bigint(10) DEFAULT NULL, `broker_commission` decimal(10,2) DEFAULT NULL ) ENGINE=InnoDB DEFAULT CHARSET=latin1 1 row in set (0.00 sec) mysql> select count(*) from big_table; +----------+ | count(*) | +----------+ | 1120500 | +----------+ 1 row in set (1.36 sec) mysql> create table test_rollback like big_table; Query OK, 0 rows affected (0.02 sec) mysql> set autocommit=0; Query OK, 0 rows affected (0.00 sec) mysql> insert into test_rollback select * from big_table; Query OK, 1120500 rows affected (23.70 sec) Records: 1120500 Duplicates: 0 Warnings: 0 mysql> rollback; Query OK, 0 rows affected (22.16 sec) mysql> alter table test_rollback engine=falcon; Query OK, 0 rows affected (0.55 sec) Records: 0 Duplicates: 0 Warnings: 0 mysql> insert into test_rollback select * from big_table; Query OK, 1120500 rows affected (17.00 sec) Records: 1120500 Duplicates: 0 Warnings: 0 mysql> rollback; Query OK, 0 rows affected (1.66 sec) mysql> select count(*) from test_rollback; +----------+ | count(*) | +----------+ | 0 | +----------+ mysql> show global variables like '%fal%'; +--------------------------+-----------+ | Variable_name | Value | +--------------------------+-----------+ | falcon_log_dir | | | falcon_max_record_memory | 209715200 | | falcon_min_record_memory | 104857600 | | falcon_page_cache_size | 104857600 | | falcon_log_mask | 0 | | falcon_debug_server | OFF | | falcon_page_size | 4096 | | have_falcon | YES | +--------------------------+-----------+ 8 rows in set (0.00 sec)
Falcon does a little better than InnoDB on the 1.1 million row insert, but the rollback operation is where the big difference is seen – a 92% reduction in rollback time. Of course, well put together applications should do all they can to minimize rollbacks, but since they do happen, it’s nice to know that Falcon is optimized to get them over with as quickly as possible.
Those needing ACID transaction support, smart concurrency control, and crash recovery will find that Falcon should fit the bill for the type of online applications being developed today. The next article in this series will look at how Falcon manages tables and indexes as well as a few other things, so be sure to look for that soon. In the meantime, please download the latest Falcon alpha release and let us know what you think. You can download a binary for Linux and Windows at http://dev.mysql.com/downloads/mysql/6.0.html and also compile from source if you’d like. And don’t forget to visit the Falcon forum to post questions and such.
Thanks, as always, for supporting MySQL!