December 2 - ndgriffeth/Class-Notes-and-Lectures GitHub Wiki

Advanced MySQL Programming

Reading: Chapter 13

Transactions

A transaction is a sequence of database accesses that are to be treated as a single access, and that must act like they happen all at once.

Motivation

Race Conditions

Transactions are related to simple race conditions, but more complicated. So let's start with a classical race condition:

Two programs are executing, and each of them needs to change the same operating system resource. For example, each needs to add a file name to the print queue. Suppose the following steps are used to update the print queue:

1. Create a print queue entry.
2. Read the pointer to the print queue.
3. Change the pointer to point to the new print queue entry.
4. Write the pointer to the print queue.

As long as these steps are all implemented correctly, and only one program is running, this will work. However... with two programs, A and B, suppose the steps are executed as follows:

A does 1 (create the new queue entry), 2 (reads the pointer), and 3 (changes the pointer)  
B does 1, 2, and 3.  
A does 4 (writes the new value of the point).
B does 4.

What value ends up in the pointer? (What happens to A's print job?)

The usual solution to the above is locking. A obtains a lock on the object while it reads and updates it, then releases the lock so that B can obtain it and update the print queue.

Discussion.

  1. What other race conditions do you know about?

  2. What problems can arise with locking?

Transactions

The classical example of a transaction is a transfer of money between two bank accounts. In a really simple database, the account information would be stored in two rows of an accounts table, and requires modifying both rows. This is fine as long as no one else is accessing the table. If multiple users may be accessing the table (think thousands of ATMs...) then there can be lots of problems.

You might consider locking the entire table. However, that's over-kill. You only need to lock the two rows. Now, is it enough to lock them one at a time?

Answer.

Discussion.

  1. Consider the report transaction described above, which totals the amounts in the database. Suppose that when it starts, it locks every row for reading. Suppose also that multiple new accounts are added while it is running. How is it likely to behave? How would you fix this?
  2. What types of databases will need transaction support the most?
  3. Describe some other modifications to a database that should be treated as transactions.

Properties

ACID:

* Atomicity
* Consistency
* Isolation
* Durability

Atomicity means that each transaction happens completely or not at all.

Attempting to do a transfer shouldn't leave a record of a deposit without the corresponding withdrawal, 
or vice-versa.

Consistency means that each transaction takes the database from one valid state to another.

Suppose the report transaction, described above, also stores the total in a table in the database.  This total 
should always be equal to the sum of the amounts in all the accounts -- but it won't be.  The database would be 
inconsistent.

Isolation means that even if multiple transactions actually execute at the same time, the result of executing all of them is the same as if they had run one after another.

The above example illustrates this problem: The sum should be taken before the transfer begins or after it ends, 
never in the middle of the transfer.

Durability means that once a transaction has been executed, it is there forever.

Discussion.
What will go wrong if atomicity is violated, but nothing else?
What will go wrong if just one of the other properties is violated?

Using transactions with MySQL and PHP

MySQL

Important commands, but only if you're using MySQL directly:

* START TRANSACTION: any MySQL statement will be provisional, until the COMMIT or ROLLBACK is issued.  
All will be made permanent in the database at the same time by COMMIT, or will be removed
by ROLLBACK.  This turns autocommit off (alternately, you can set autocommit to OFF directly), so that 
changes to the database will not be visible to anyone else until you commit.
* COMMIT: make permanent all of the SQL updates that happened since the last START TRANSACTION.
* ROLLBACK: roll back all of the SQL updates that happened since the last START TRANSACTION.

Exercise:
See how a transaction works:

  1. Open up two connections to MySQL (you can use the SQL monitor from two terminal windows, or use two different connections in the Workbench, or use one SQL monitor and one Workbench).
  2. Turn off autocommit.
  3. Using bookorama, add a new book to the books table.
  4. Look at all rows in books. Can you see the new book using the connection where you created it? Can you see it using a different connection?
  5. Using the connection where you added the book, commit the transaction. Now where can you see the new book?

PHP

Using transactions from PHP -- don't issue the MySQL commands above, instead use the PHP versions:

* mysqli_autocommit(false): to turn off AUTOCOMMIT, thereby starting a transaction.
* mysqli_commit(): this is the COMMIT
* mysqli_rollback(): this is the ROLLBACK

Exercise and Assignment.

  1. [Optional if you use transactions in your project.] Create a form and a PHP transaction to add a new employee to the employees database, along with a title, salary, and department assignment.
  2. [Optional if you use sessions in your project.] Create a form and a PHP session to move a collection of employees to a new department. The form should accept one employee at a time, and have two buttons, "Next" and "Confirm". When the "Next" button is pressed, the form is re-displayed to accept the next employee number; when the "Confirm" button is pressed, the list of employees is displayed along with a form that asks for the from and to departments and that contains a "Commit" button. When the commit button is pressed, the employees are moved as a single transaction.

How transactions are implemented

Note that you must use the InnoDB engine with a table in order to use transactions on it. This is actually the default engine in MySQL 5.5 and above (MyISAM was the old default).

The most common, and usually the best, implementation is actually one of the simplest:

Obtain a lock on each row before you change it.
Hold the locks until you have committed or rolled back all changes.
Release all the locks.

This is called two-phase locking, because in the first phase you can only obtain locks, and in the second phase you can only release them. It provides isolation. Transactions may, however, get into a deadlock situation, in which each is waiting on the other to release a lock.

To get atomicity and durability, you have to make sure the changes will be applied even if the database server crashes while applying the updates for the transaction. There is a fairly straightforward way of doing this:

As you perform each update (probably on an in-memory copy of the table for efficiency), 
you also write a "re-do" and an "un-do" on a permanent, serial log.  
     The *re-do* will make sure the update was made -- it should contain a copy of the new row.
     The *un-do* will restore the original row.
Once all of these are in the permanent log, you can safely choose to either commit or rollback.

The above is called a two-phase commit. In the first phase (prepare to commit), you make sure the log is safely stored on permanent storage. The final step of this is to store the decision (commit or rollback) as the decision about the transaction. If the server fails, there is a permanent record from which the database can be restored.

The above process for guaranteeing that transactions are atomic and durable also provides a solution to the deadlock problem. When two transactions are deadlocked, the only solution is to abort one of them. THis requires rolling back any changes it may have made. The above procedure guarantees that changes can always be rolled back.

An interesting side-note is that the Unix file system was vulnerable to inconsistencies after a system failed, because recent writes were in buffers instead of in files at the time of the crash. A program called "fsck" would be run when the operating system was re-booted to check file system consistency. Sometimes you would lose files as a result.

However, Mac OS X used the logging idea described above to implement a journaling file system -- writes to files are stored in safe but fast storage, and so they can be recovered after a crash.

Long-running transactions

Since transactions must lock all the rows that they access, long-running transactions present performance problems -- they may hold up lots of the activity on a system. The report introduced at the beginning of this discussion would lock every record in the database. That may be necessary, but sometimes exact results are not important -- perhaps an approximate total is enough. In that case, you could just run the report without locking anything. You just have to recognize that the result may not be the exact amount.

Storage Engines

Link: http://dev.mysql.com/doc/refman/5.0/en/storage-engines.html

  • MyISAM: simple, non-transactional engine; default until 5.5. Fast sequential scanning, fast access when indexes are provided.
  • Memory: in-memory, fastest, good for temporary tables
  • InnoDB: most general purpose, recommended unless there's a good reason to use a different one. Default in 5.6. Supports transactions and foreign key constraints. Various levels of ACID properties are provided.
  • CSV: Stores tables in a form readable by spreadsheet programs.
  • Archive: for large amounts of unindexed data, add- and read-only. For data that you have to keep but never update and rarely use. Data is compressed for small footprint.
  • Merge: Allows you to split large tables for faster searching of smaller individual tables, assuming you can easily identify the ones you need.
  • Federated: Allows you to include tables of remote databases automatically. Requires a local table definition including a reference to the remote table.

Additional notes about how engines behave and which to use

Setting the engine:

CREATE TABLE <tablename> ENGINE=<enginename>

or

ALTER TABLE < tablename> ENGINE=<enginename>

Discussion:
Pick the right engine to use for each type of database activity:

  1. Storing and retrieving current locations of users for a Find-My-Friend type application.
  2. Storing and updating accounts information in a bank.
  3. Tracking temperature changes around the world every minute.

Review Questions

  1. What is a transaction?
  2. What are the ACID properties? Explain each.
  3. What actions are needed to use transactions? When would you use each?
  4. How does the system implement the ability for the user to either rollback or commit the sequence of actions in a transaction?
  5. How does the system implement isolation, i.e., no other transaction can see partial results of a transaction?
  6. What engine would you use:
    1. if you want to use transactions?
    2. if you don't need transactions but want an efficient engine?
⚠️ **GitHub.com Fallback** ⚠️