Transactions - pykello/pykello.github.com GitHub Wiki
Seattle Report 2022
link What changes?
- Increased complexity & variability of failure scenarios
- Increase communication latency
- Increased performance variability
This has resulted in trade-offs in:
- Consistency level
- Isolation level
- Availability
- Latency
- Through-put under contention
- Elasticity & scalability
Two schools of thought:
- Hard to process at scale, reduce consistency & isolation guarantees. At the cost of increased developer consistency.
- Cost of bug-free application is high unless strong consistency & isolation. Therefore don't sacrifice correctness guarantees, and increase perf as much as you can.
Why Serializable Isolation isn't default?
Dan Ports
- Academic Researchers: SERIALIZABLE isn’t the default? But everyone enables it first thing, right?
- database users: SERIALIZABLE exists?
Daniel J. Abadi
Part 1
Database Isolation
- As if no concurrent running xacts, while running xacts concurrently.
- Problems of perfect isolation, even in well-designed systems
- Latency cost (how long does it take to complete xacts)
- Throughput (xacts/sec)
- Isolation levels: trade-offs
Anomalies
- Read old inventory
- New inventory = old inventory - 1
- Orders = Orders + 1
If initial inventory was 42, then at all times #inventory+#orders=42.
- Lost update anomaly. 2 concurrent xacts write 41 & increase orders by 1 (=43)
- Dirty write anomaly
- Dirty read anomaly. If the value of updated inventory was visible between 2 & 3
- Non-repeatable read anomaly. Same read returns different values
- Phantom read anomaly. Additional records appear
- Write-skew
Caveats
- Not all databases mean the same by SERIALIZABLE
- In a well-designed system perf difference between SERIALIZABLE and READ COMMITTED can be negligible. (paper)
- Different classes of anomaly in SERIALIZABLE class in dist systems.
Related paper: ACIDRain
Part 2. Correctness Anomalies Under Serializable Isolation
In distributed & replicated systems new bugs appeared. We needed stronger guarantees than Serializable Isolation.
Example: Alice has a balance of $50, replicated in 2 regions. Replication usually isn't synchronous. What happens if withdraw happens concurrently in 2 regions?
Even in a serial order anomaly can happen.
One Copy Serializability (1SR)
Equivalent to serializability in an unreplicated system with "one copy" of every item.
Guarantees some serial order, no constraints of what that serial order is. Later xacts are allowed to be processed before earlier ones.
Anomalies
- Stale Read/Immortal Writes. Read a value back in time
- Causal Reverse. Later write which was caused by an earlier write, time travels to before the earlier write. Can happen in sharded/partitioned databases.
Examples:
- 2021/08: Aurora returning stale read
- 2015/04: MongoDB stale reads
- Spanner allows stale reads for performance
Strict Serializable
Avoids the above anomalies
Oracle Serializable ?
create table t(a int);
insert into t values(1);
insert into t values(2);
insert into t values(3);
-- session 1
set transaction isolation level serializable;
select * from t where a = 1;
-- session 2
set transaction isolation level serializable;
select * from t where a = 2;
-- session 1
update t set a = 4 where a = 2;
-- session 2
update t set a = 5 where a = 1;
-- session 1
commit;
-- session 2
commit;
The above fails in postgres but succeeds in oracle.
Saga
References:
Problem
In microservices we want to have a database-per-microservice. Encapsulating domain data lets each service:
- Use its best data store type and schema
- Scale its own data store as necessary
- Be insulated from other services' failures
Problem: Ensuring data consistency across service-specific databases.
Why not 2PC?
- Locking
- All parties must support 2PC model
Solution
- Series of local transactions
- Each local xact updates db & publishes a message or event to trigger the next xact
- Each xact has a compensating xact
Pivot:
- Go/no-go point in a saga. If the pivot xact commits, the saga runs until completion. Can be neither compensable nor retriable. It can be the last compensable xact or the first retriable xact.
- All xacts after pivot are retriable and guaranteed to succeed.
Implementation
- Choreography exchange messages without a central broker
- Orchestration central coordinator
More Anomaly examples
- A Read-Only Transaction Anomaly Under Snapshot Isolation
- Automating the Detection of Snapshot Isolation Anomalies
