1. We now consider the strategies for computing a 3-way join:
branch ∝ deposit ∝ customer
2. We'll assume that
- ndeposit = 10,000
- ncustomer = 200
- nbranch = 50
3. Strategy 1:
- Compute deposit 1 customer using one of the previous methods.
- As cname is a key for customer, the result of this join has at most 10,000 tuples (the number in deposit).
- If we then build an index on branch for bname, we can compute
branch ∝ (deposit ∝ customer)
by considering each tuple t of
(deposit ∝ customer)
and looking up tuples in branch with the same bname value using the index.
- (How many accesses, roughly?)
4. Strategy 2:
- Compute the join without any indices.
- This requires checking 50 * 10,000 * 200 possibilities = 100,000,000.
- Not too good of an idea...
5. Strategy 3:
- Perform the pair of joins at once.
- Construct two indices:
- One on branch for bname.
- One on customer for cname. - Then consider each tuple t in deposit.
- For each t, look up corresponding tuples in customer and in branch with equal values on respective join attributes.
- Thus we examine each tuple of deposit exactly once. Using strategy 3 it is often possible to perform a three-way join more effciently than by using two two-way joins.
6. It is hard to calculate exact costs for 3-way joins. Costs depend on how the relations are stored, distribution of values and presence of indices.
