1. If we don't create an index, we must examine every pair of tuples t1 in deposit and t2 in customer. This means examining 10,000 * 200 = 2,000,000 pairs!
2. If we execute this query cleverly, we can cut down the number of block accesses. We use the following method:
for each tuple d 2 deposit do
begin
for each tuple c 2 customer do
begin
examine pair (d; c) to see if a
tuple should be added to the result
end
end
3.
- We read each tuple of deposit once.
- This could require 10,000 block accesses.
- The total number of block access, if the tuples are not stored together physically, would be 10,000 + 10,000 * 200 = 2,010,000.
- If we put customer in the outer loop, we get 2,000,200 accesses.
- If the tuples of deposit are stored together physically, fewer accesses are required (at 20 per block,10,000/20 = 500 block accesses).
- We read each tuple of customer once for each tuple of deposit.
- This suggests we read each tuple of customer 10,000 times, giving as many as 2,000,000 accesses to read customer tuples!
- This would give a total of 2,000,500 accesses.
- We can reduce accesses signi cantly if we store customer tuples together physically.
- At 20 tuples per block, only 10 accesses are required to read the entire relation (as opposed to 200).
- Then we only need 10 * 10,000 = 100,000 block accesses for customer.
- This gives a total of 100,500.
4. Text says further savings are possible if we use customer in the outer loop.
- Now we reference each tuple of deposit once for each tuple of customer.
- If deposit tuples are stored together physically, then since 20 tuples t on one block, ndeposit=20 = 500 accesses are needed to read the entire relation.
- Since customer has 200 tuples, we read the deposit relation 200 times.
- Earlier printings of the text say at most 200 * 500 = 10,000 block accesses to deposit are required.
- WRONG! 200 * 500 is 100,000.
- Total cost is then 100,000 for inner loop plus 10 accesses to read the customer relation once for a total of 100,010.
- Compared to previous estimate of 100,500, the savings are small (490).
5. Note that we are considering worst-case number of block reads, where every time a block is needed it is not in the buffer.
Good buffer management can reduce this considerably.
