整理自CSE 444 Database Internals, Spring 2019 的课程Lectures,课程地址:https://courses.cs.washington.edu/courses/cse444/19sp/
Lectures 13 Transaction Schedules
Transaction
Definition: a transaction is a sequence of updates to the database with the property that either all complete, or none completes (all-or-nothing)
ROLLBACK
If the app gets to a place where it can’t complete the transaction successfully, it can execute ROLLBACK
This causes the system to “abort” the transaction
- – Database returns to a state without any of the changes made by the transaction
Several reasons: user, application, system
Transactions
Major component of database systems
Critical for most applications; arguably more so than SQL
ACID Properties
Atomicity: Either all changes performed by transaction occur or none occurs
Consistency: A transaction as a whole does not violate integrity constraints
Isolation: Transactions appear to execute one after the other in sequence
Durability: If a transaction commits, its changes will survive failures
What Could Go Wrong?
Why is it hard to provide ACID properties?
Concurrent operations
- Isolation problems
- We saw one example earlier
Failures can occur at any time
- Atomicity and durability problems
- Later lectures
Transaction may need to abort
Terminology Needed For Lab 3 Buffer Manager Policies
STEAL or NO-STEAL
- Can an update made by an uncommitted transaction overwrite the most recent committed value of a data item on disk?
FORCE or NO-FORCE
- Should all updates of a transaction be forced to disk before the transaction commits?
Concurrent Execution Problems
Write-read conflict: dirty read, inconsistent read
- A transaction reads a value written by another transaction that has not yet committed
Read-write conflict: unrepeatable read
- A transaction reads the value of the same object twice. Another transaction modifies that value in between the two reads
Write-write conflict: lost update
- Two transactions update the value of the same object. The second one to write the value overwrites the first change
Schedules
A schedule is a sequence of interleaved actions from all transactions
A schedule is serializable if it is equivalent to a serial schedule
The role of the scheduler is to ensure that the schedule is serializable
Q: Why not run only serial schedules ? I.e. run one transaction after the other ?
A: Because of very poor throughput due to disk latency.
Lesson: main memory databases may schedule TXNs serially
Conflict Serializability
Definition A schedule is conflict serializable if it can be transformed into a serial schedule by a series of swappings of adjacent non-conflicting actions
Testing for Conflict-Serializability
Precedence graph:
- A node for each transaction Ti ,
- An edge from Ti to Tj whenever an action in Ti conflicts with, and comes before an action in Tj
- No edge for actions in the same transaction
- The schedule is serializable if the precedence graph is acyclic
Example 1
Schedules with Aborted Transactions
When a transaction aborts, the recovery manager undoes its updates
But some of its updates may have affected other transactions !
Recoverable Schedules
A schedule is recoverable if:
- It is conflict-serializable, and
- Whenever a transaction T commits, all transactions that have written elements read by T have already committed
Cascading Aborts
Review of Schedules
Lectures 14 Transactions: Locking
Scheduler
The scheduler:
Module that schedules the transaction’s actions, ensuring serializability
Two main approaches
- Pessimistic: locks
- Optimistic: timestamps, multi-version, validation
Pessimistic Scheduler
Simple idea:
- Each element has a unique lock
- Each transaction must first acquire the lock before reading/writing that element
- If the lock is taken by another transaction, then wait
- The transaction must release the lock(s)
Notation
Two Phase Locking (2PL)
The 2PL rule:
- In every transaction, all lock requests must precede all unlock requests
- This ensures conflict serializability ! (will prove this shortly)
Example with Multiple Transactions
Theorem: 2PL ensures conflict serializability
Strict 2PL
Strict 2PL: All locks held by a transaction are released when the transaction is completed; release happens at the time of COMMIT or ROLLBACK
Schedule is recoverable
Schedule avoids cascading aborts
Summary of Strict 2PL
- Ensures serializability, recoverability, and avoids cascading aborts
- Issues: implementation, lock modes, granularity, deadlocks, performance
The Locking Scheduler
Task 1: -- act on behalf of the transaction
Add lock/unlock requests to transactions
- Examine all READ(A) or WRITE(A) actions
- Add appropriate lock requests
- On COMMIT/ROLLBACK release all locks
- Ensures Strict 2PL !
Task 2: -- act on behalf of the system
Execute the locks accordingly
- Lock table: a big, critical data structure in a DBMS !
- When a lock is requested, check the lock table – Grant, or add the transaction to the element’s wait list
- When a lock is released, re-activate a transaction from its wait list
- When a transaction aborts, release all its locks
- Check for deadlocks occasionally
Lock Modes
Lock Granularity
Fine granularity locking (e.g., tuples)
- High concurrency
- High overhead in managing locks
Coarse grain locking (e.g., tables, predicate locks)
- Many false conflicts
- Less overhead in managing locks
Deadlocks
Cycle in the wait-for graph:
- – T1 waits for T2
- – T2 waits for T3
- – T3 waits for T1
Deadlock detection
- – Timeouts
- – Wait-for graph
Deadlock avoidance
- – Acquire locks in pre-defined order
- – Acquire all locks at once before starting
Phantom Problem
- So far we have assumed the database to be a static collection of elements (=tuples)
- If tuples are inserted/deleted then the phantom problem appears
A “phantom” is a tuple that is invisible during part of a transaction execution but not invisible during the entire execution
In our example:
- – T1: reads list of products
- – T2: inserts a new product
- – T1: re-reads: a new product appears !
In a static database
- Conflict serializability implies serializability
In a dynamic database, this may fail due to phantoms
Strict 2PL guarantees conflict serializability, but not serializability
1. Isolation Level: Dirty Reads
2. Isolation Level: Read Committed
3. Isolation Level: Repeatable Read
4. Isolation Level Serializable