L24 Review - simonzhang0158/scribenotes GitHub Wiki
We will show how to check serializable and conflict serializable for the examples. And we will give a not conflict serializable but serializable example.
- T1:
R(A)
W(A)
W(A)
- T2:
W(A)``R(A)
Recall definition of Serializable:
- If execute the schedule the end state of the database is equivalent as some serial schedule.
For this example the end result should be equivalent to either
(T1,T2)
- T1:
R(A)``W(A)``W(A)
- T2:
W(A)``R(A)
(T2,T1)
- T2:
W(A)``R(A)
- T1:
R(A)``W(A)``W(A)
Define T1.W(A): A+1
T2.W(A): A=10
- T1:
R(A=0)``W(A=1+0)
W(A=1+0)
- T2:
W(A=10)``R(A=10)
Outcome A=1
*Note that T2.W(A) is called blind write since it is writing without reading.
- T1:
R(A=10)``W(A=1+10)``W(A=1+10)
- T2:
W(A=10)``R(A=10)
Outcome A=10
- T1:
R(A=10)``W(A=1+10)``W(A=1+10)
- T2:
W(A=10)``R(A=10)
Outcome A=11
Since there is a dependency cycle between T1 and T2, then it is not conflict serializable. The cycle is from operation T1.W(A)1 and T2.W(A) being in conflict, and operations T2.R(A) and T2.W(A)2 being in conflict.
- T1:
R(A)``W(A)
1W(A)
2 - T2:
⇘``W(A)``R(A)``⇗
- T1:
W(A)
1W(A)
2 - T2:
⇘``R(A)``⇗
Since there is a dependency cycle between T1 and T2, then it is not conflict serializable. The cycle is from operation T1.W(A)1 and T2.R(A) being in conflict, and operations T2.R(A) and T2.W(A)2 being in conflict.
However it is serializable. Since it is equivalent to T2,T1.
- Serial ⊆ Conflict Serializable ⊆ Serializable
In this example, we will cover access costs computing for a table in primary/secondary, candidate key/not candidate key, πid, and overflow page cases.
- Fill factor: 50%
- DE size: 10 bytes
- Tuple size: 50 bytes
- Page Size: 500 bytes
- T1: 20,000 tuples
- Primary B+ INDEX T1(id)
- 𝜎id=10(T1) id is key
We first need to compute how many tuples one page can contain. Note that page size is 500 bytes with 50% fill factor, and tuple size is 50 bytes. Therefore the amount of tuples one page can contain is
- 500/2/50=5 tuple/page
We have 20,000 tuples in total, so we need
- 20k/5=4,000 pages
If we do not use index, then on average we need half of it which is 2,000.
We can say T1 = 20,000 tuples ≅ 4,000 pages
We first need to find fan out to help us get the height of the tree. The size of index is DE size: 10 bytes and the page size is 500 bytes with 50% fill factor. Therefore
- Fan out: 500/2/10=25 index/page
Then we can use fan out to get the Height of tree
- Height of tree=⌈log254,000⌉=3
The access cost will be the height plus one more leaf page:
- Cost= h+1(leaf page)=4
We need to know selectivity, the default selectivity is 5%.
Note that the selectivity is defined as the number of output records / number of input records. The reason why we can use the selectivity also for page calculations is because the leaf pages are sorted (which is assumed by Primary B+ INDEX). If they were not, then it could be one page per tuple.
We first compute how much leaf pages we need to read, note at above we computed that we need 4,000 pages in total. Thus 5% of it is:
- Data pages=4,000*0.05=200 pages
The access cost will be the height plus the amount of leaf pages we need to read:
- cost=h+200=203
At above we computed fan out is 25 and we have 20,000 tuples in total.
- Leaf pages: 20,000/25=800
Therefore the new height of tree is:
- new height of tree=⌈log25800⌉=3 (this is different from the previous one even they are both 3)
The access cost will be the height plus one more leaf page and since we are in secondary case we need one more page on disk, note that this is under assumption that there is no overflow page (we covered overflow case below):
- Cost = 3(new height)+1(leaf page)+1(disk)=5
The difference between this problem and the previous one is that since we are only looking for the id which is the index we do not need the information from disk. Thus
- Cost = 3(height)+1(leaf page)=4
Assume on average each bucket has 10 pages, then the page we need to read in disk will be 10
- Cost = 3(height)+1(leaf page)+10(disk)=14