Introduction to transaction processing

(Press ? for help, n and p for next and previous slide)

Christopher Olbrich (License Information)

1 Introduction

1.1 Learning Objectives

  • Give examples for dirty read and lost update anomalies
  • Explain reasons for the development of the transaction concept
  • Name and explain the ACID properties of transactions
  • Explain the idea behind serializability of transactions
  • Discuss advantages and limitations of locking scheduling algorithms

1.2 Prerequisites

  • Basic understanding of data management
  • No need to be familiar with SQL
    • This presentation will focus on transaction processing on an abstraction level
    • SQL is therefore not part of this introduction to transaction processing

1.3 Background: Database-management System (DBMS)

  • A DBMS gives access to the data within a database

    DBMS

    DBMS by Christopher Olbrich under CC BY-SA 4.0; from GitLab

  • The DBMS controls the access to the database
    • Creation and administration of the database
    • Read and write operations
  • A DBMS uses transactions to control access to the database

1.4 Core Questions

  • What is a transaction?
  • How can transactions be characterized?
  • What are database anomalies and how to avoid them?
  • How to enable concurrent execution of transactions?

1.5 A Note on Literature

  • To ensure a certain quality old literature from some of the pioneers in transaction processing is cited:
    • In particular Jim Gray [EGLT76], Andreas Reuter and Theo Härder [HR83]
  • Besides the book by Gottfried Vossen and Gerhard Weikum is used [WV01]
    • This is one of the best and extensive books about transaction processing
    • I recommend this book if you have a further interest in transaction processing

1.6 Review Questions

  • After each chapter one or more review question are asked
  • You should try to find an answer to the questions, in order to check if you understand the chapter
  • The correct answer is provided when you continue with the presentation

2 Motivation: Database Anomalies

2.1 Exemplary Use-case

  • Banking is frequently used to explain the need for transaction processing

    Cash-mashine

    Cash-mashine by 3dman_eu under CC0; from Pixabay

    • Data-access happens concurrently, because multiple users need access to data (e.g., account balance)
    • A failure of the system or of a transaction should not affect the database
    • The database system should always reflect the correct account balance for each customer

2.1.1 The Database System

  • Example: Let’s assume that a bank only has one database system for its three customers
  • The database only has one table
    • Within this table the account balance of each customer is saved
  • Data access happens through transactions
 Id   Name    Balance 
  1 
  2 
  3 
 Alice 
 Bob   
 Eve   
 100     
 200     
 50      

2.1.2 Exemplary Transaction

  • A transaction symbolizes a unit of work
    • The unit of work is limited by begin and commit
    • Transactions can be aborted and are not valid anymore
  • Transactions can read data and manipulate data by writing a new value
  • Example: Add 100 to the balance of Alice
Time  Transaction T1         
  1 
  2 
  3 
  4 
  5 
 begin                  
 read(Balance of Alice) 
 Balance = Balance + 100
 write(Balance of Alice)
 commit                 

2.1.3 Execution of Transaction

  • In the following examples transactions are not executed one after another, but concurrently
    • Concurrent execution is important to decrease the waiting and response time and to increase efficiency
    • Example: Bob should be able to get money from a bank machine, while Alice is transferring money to him
  • Result: The database can show wrong behaviour, which is called a database anomaly
    • Two examples of database anomalies are explained in this chapter: lost update & dirty read

2.2 Lost Update

  • T1 & T2 read the same value for the balance of Alice
  • T1 writes its changes, followed by T2
  • Problem: T2 overrides the update by T1, which is therefore a lost update
Time  Transaction T1          Transaction T2          
  1 
  2 
  3 
 begin                  
 read(Balance of Alice) 
 Balance = Balance - 10 
                        
                        
                        
  4 
  5 
  6 
                        
                        
                        
 begin                  
 read(Balance of Alice) 
 Balance = Balance + 100
  7 
  8 
 write(Balance of Alice)
 read(Balance of Bob)   
                        
                        
  9 
 10 
                        
                        
 write(Balance of Alice)
 commit                 
 11 
 12 
 13 

2.3 Dirty Read

  • T1 writes 90 as the balance of Alice
  • T2 reads this balance, writes 190 and commits
  • T1 is aborted and is no longer valid
  • Problem: T2 was executed with a wrong value for the balance of Alice
Time  Transaction T1           Transaction T2         
   1
   2
   3
   4
 begin                  
 read(Balance of Alice) 
 Balance = Balance - 10 
 write(Balance of Alice)
                        
                        
                        
                        
   5
   6
   7
   8
   9
                        
                        
                        
                        
                        
 begin                  
 read(Balance of Alice) 
 Balance = Balance + 100
 write(Balance of Alice)
 commit                 
  10  failure & aborted                               

2.4 Review Questions

  • What is the name of the database anomaly?
    • lost-update
  • What is the resulting value for the Balance of Bob?
    • 240
  • What would be the value, if the transactions were executed one after another?
    • 210

Balance of Bob = 200

Time  Transaction T1          Transaction T2          
  1 
  2 
  3 
 begin                  
 read(Balance of Bob)   
 Balance = Balance - 30 
                        
                        
                        
  4 
  5 
  6 
                        
                        
                        
 begin                  
 read(Balance of Bob)   
 Balance = Balance + 40 
  7 
  8 
 write(Balance of Bob)  
 read(Balance of Eve)   
                        
                        
  9 
 10 
                        
 ...                    
 write(Balance of Bob)  
 ...                    

3 Transaction Concept

3.1 Requirements for Transactions

  • Database anomalies show, that especially concurrent execution of database operations leads to problems
  • Key challenge: Keep data in a database consistent despite concurrent data access and different types of failures [WV01]
    • Concurrent or parallel execution of programs and data access can threaten the database consistency
      • The lost update anomaly would be an example for this challenge
    • Process and computer failures can interrupt programs and should not influence the database
      • The dirty read anomaly is an example for a transaction failure

3.2 History of the Transaction Concept

  • After Edgar F. Codd published his influential paper about the relational model for database systems in 1969, the first practical systems based on this model were implemented

    • These systems had problems with concurrent execution and crash recovery

    Jim Gray speeking (editet)

    Jim Gray speeking (editet) by Tony Hey, Stewart Tansley, Kristin Tolle (Eds.) under CC BY-SA 3.0; from Wikimedia

  • James “Jim” Nicholas Gray played a huge role in inventing and defining the transaction as an answer to this problem
    • He won the Turing award in 1998 for his work on the transaction concept

3.3 Definition of a Transaction

  • Transactions were defined in [EGLT76], co-authored by Jim Gray
  • A transaction is a unit of work respectively a unit of consistency
    • Like sending money from to another bank account
  • “… each transaction, when executed alone, transforms a consistent state into a new consistent state; that is, transactions preserve consistency.” [EGLT76]
  • Besides, the ACID properties (discussed in the next chapter) are used to characterize transactions and extend this definition

3.4 Transactions as an Abstraction Concept

  • Transactions are an abstraction concept
  • The DBMS deals with concurrency and failure, so application programs can be developed as if they were executed in a sequential manner in a failure-free environment [WV01]
  • In particular the scheduler as part of the DBMS is responsible for the administration of read and write operations

3.5 Key Terms

  • Transactions need to be demarcated from other transactions, which is done by the terms begin and commit
    • Begin: Represents the starting point of a transaction
    • Commit: Marks the successful end of a transaction
  • A rollback marks the unsuccessful end of a transaction
    • The transaction is then aborted
Time  A simple transaction 
  1 
  2 
  4 
  5 
 begin                
 read(X)              
 write(Y)             
 commit               

3.6 Review Questions

  • Which of the following statements is correct?
    • The relational model for database systems was developed for better transaction processing
    • A transaction transforms a consistent state into another consistent state
    • The application program cares about transaction processing and not the DBMS
    • A transaction always ends with a commit

4 ACID Properties of Transactions

4.1 ACID

  • The ACID properties were first mentioned in a paper by computer scientists Theo Härder and Andreas Reuter in 1983 and used to characterize the transaction concept
    • They built on the earlier work by Jim Gray
  • ACID properties must be fulfilled to achieve indivisibility of a transaction
  • Due to the importance of this properties, the description of each property is cited from their paper (with bold face added) [HR83]

4.2 Atomicity

  • “It must be of the all-or-nothing type […] and the user must, whatever happens, know which state he or she is in.” [HR83]
  • All-or-nothing principle
    • Either all steps of a transaction should be reflected in the database or no step
    • If an error occurs before the commit of a transaction, no changes should be reflected in the database
      • In this case a rollback is performed and the changes will be reset
      • Transaction recovery deals with undoing the changes a transaction made until it was aborted

4.3 Consistency

  • “A transaction reaching its normal end (EOT, end of transaction), thereby committing its results, preserves the consistency of the database.” [HR83]
  • A transaction moves a database from a consistent state into another consistent state
  • It follows that every transaction ending successfully only commits legal changes to the database

4.4 Isolation

  • “Events within a transaction must be hidden from other transactions running concurrently.” [HR83]
  • Concurrently running transactions should not influence each other
    • When transactions influence each other database anomalies happen
  • How transactions can be executed concurrently without influencing each other is discussed in the chapter concurrency control

4.5 Durability

  • “Once a transaction has been completed and has committed its results to the database, the system must guarantee that these results survive any subsequent malfunctions.” [HR83]
  • The saving of data should be guaranteed
  • A subsequent malfunction could be a hardware or system failure

4.6 Review Questions

  • Which of the following statements fits to which ACID principle?
    • Think about your answer before you check out the solutions!
  • Concurrently running transactions should not influence each other.
    • Isolation
  • Changes by a transaction that does not reach it’s normal end should not be reflected in the database.
    • Atomicity

5 Concurrency Control

5.1 Concurrency

  • Concurrency is an important principle in Computer Science
  • It deals with the execution of…
    • parts or units of a program
    • algorithms
    • or transactions
  • …out of order or in partial order without affecting the final outcome.

5.2 Serializability of Transactions

  • Database anomalies show that controlled execution of concurrent transactions is needed
    • This can be done mathematically and theoretically with the concept of schedules and histories
    • A schedule is an abstract model to describe how transactions are executed in a database system
  • Fundamental idea: Each individual transaction maintains the consistency of the underlying data. It follows that a serial history (Transactions are executed sequential) is correct. A history (Transactions may be executed concurrently) is called serializable if it is equivalent to a serial history. [WV01]

5.3 Concurrency Control Algorithms

  • The scheduler as part of the DBMS receives an input schedule and transforms it into an serializable output schedule [WV01]
  • A scheduler uses a scheduling algorithm to produce serializable schedules for concurrent transactions [WV01]
    • Locking, non-locking, and hybrid scheduling algorithms can be distinguished
  • Locking scheduling algorithms are the most prominent type of scheduling algorithm
    • Therefore they are explained on the following slides

5.4 Locking Scheduling Algorithms

  • Core idea: Synchronize access to shared data by using locks on data items [WV01]
    • Locks can be set or released for transactions by the scheduler
    • If a lock is set by a transaction it is not available to other transactions
    • Example for a data item: Balance of Alice in the chapter on Database anomalies
  • A locking scheduling algorithm may use two types of locks, since transactions can read or write data
    • Read-lock: can be obtained by multiple transactions
    • Write-lock: can only be obtained by one transaction

5.4.1 Key Terms

  • Lock request: A scheduler checks if a transaction can acquire a lock for a data item
  • Lock conflict: Happens when a transaction acquires a lock for a data item that is already locked
  • Lock wait: A transaction that could not acquire a lock is blocked
  • Lock release: Transaction releases one of its locks. Then the scheduler checks whether a waiting transaction can now acquire the lock

5.5 Two-phase Locking Protocol (2PL)

  • The 2PL is the most famous locking protocol, used in many commercial database systems. [WV01]

    Lock aquiring and release under 2PL

    Lock aquiring and release under 2PL by Christopher Olbrich under CC BY-SA 4.0; from GitLab

  • Idea of the 2PL: Lock acquisition and lock release are strictly separated for a transaction
    • Growing phase: Locks are acquired
    • Shrinking phase: Locks are released

5.5.1 2PL Deadlock Example

 Time   Transaction T1                Transaction T2             
    1 
    2 
 begin                       
 writelock(Balance of Alice) 
                            
                            
    3 
    4 
                             
                             
 begin                      
 writelock(Balance of Bob)  
    5 
    6 
    7 
 read(Balance of Alice)      
 Balance = Balance - 10      
 write(Balance of Alice)     
                            
                            
                            
    8 
    9 
   10 
                             
                             
                             
 read(Balance of Bob)       
 Balance = Balance - 5      
 write(Balance of Bob)      
   11   writelock(Balance of Bob)                                
   12                                 writelock(Balance of Alice)

Deadlock: T1 waiting for a lock held by T2 and vice versa

5.6 Deadlock Handling

  • The example from the previous page shows a Deadlock, because T1 requests a lock held by T2 and vice versa
    • The 2PL is not deadlock free
  • Deadlock Handling is the main challenge of locking scheduling algorithms
  • This presentation describes the core ideas and strategies for dealing with deadlocks in general

5.6.1 Conservative 2PL

  • The Conservative 2PL (preclaiming) is a variant of the 2PL which addresses the problem of deadlocks

    Lock aquiring and release under conservative 2PL

    Lock aquiring and release under conservative 2PL by Christopher Olbrich under CC BY-SA 4.0; from GitLab

  • Idea: Transactions can only acquire locks prior to execution [WV01]
    • Advantage: Deadlocks are avoided
    • Disadvantage: Transactions are limited, because they need to declare all their required locks before execution
  • Besides other variants of the 2PL exist, which address other challenges: Strict 2PL & strong 2PL

5.7 Review Questions

  • What is correct answer concerning the task of a scheduler for concurrency control?
    • Notify programs, which data items are locked
    • Produce a serializable output schedule
    • Select transactions that don’t effect each other
    • Execute transactions one after another
  • The second statement is the correct answer. If you don’t understand why, revisit this slide.

6 Conclusions

6.1 Summary

  • Transaction processing is an important concept of database systems
    • Transactions move a database from a consistent into another consistent state
    • ACID properties describe requirements for transactions
  • Concurrency control is a main challenges of transaction processing
    • Enable the serializability of transactions through concurrency control algorithms
    • 2PL as an example for locking schedule algorithms

Bibliography

  • [EGLT76] Eswaran, Gray, Lorie & Traiger, The Notions of Consistency and Predicate Locks in a Database System, Commun. ACM 19(11), 624-633 (1976). http://doi.acm.org/10.1145/360363.360369
  • [HR83] Haerder & Reuter, Principles of Transaction-oriented Database Recovery, ACM Comput. Surv. 15(4), 287-317 (1983). http://doi.acm.org/10.1145/289.291
  • [WV01] Weikum & Vossen, Transactional Information Systems: Theory, Algorithms, and the Practice of Concurrency Control and Recovery, Morgan Kaufmann Publishers Inc., 2001.

License Information

Except where otherwise noted, this work, “Introduction to transaction processing”, is © 2018 by Christopher Olbrich, published under the Creative Commons license CC BY-SA 4.0.

No warranties are given. The license may not give you all of the permissions necessary for your intended use.

In particular, trademark rights are not licensed under this license. Thus, rights concerning third party logos (e.g., on the title slide) and other (trade-) marks (e.g., “Creative Commons” itself) remain with their respective holders.