1. Consider the query to nd the assets and branch-names of all banks who have depositors living in Port Chester. In relational algebra, this is
â…¡bname,assets( ∑ccity="Port Chester" (customer ∝ deposit ∝ branch))
- This expression constructs a huge relation,
customer ∝ deposit ∝ branch
of which we are only interested in a few tuples.
- We also are only interested in two attributes of this relation.
- We can see that we only want tuples for which ccity = \Port Chester".
- Thus we can rewrite our query as:
∑ccity="Port Chester"(customer)) ∝ deposit ∝ branch)
- This should considerably reduce the size of the intermediate relation.
2. Suggested Rule for Optimization:
- Perform select operations as early as possible.
- If our original query was restricted further to customers with a balance over $1000, the selection cannot be done directly to the customer relation above.
- The new relational algebra query is
∑ccity="PortChester" ∧ balance>1000 (customer ∝ deposit ∝ branch))
- The selection cannot be applied to customer, as balance is an attribute of deposit.
- We can still rewrite as
∑ccity="Port Chester" ∧ balance>1000 (customer ∝ deposit)) ∝ branch)
- If we look further at the subquery (middle two lines above), we can split the selection predicate in two:
∑ccity="Port Chester"( ∑balance>1000
(customer ∝ deposit))
- This rewriting gives us a chance to use our "perform selections early" rule again.
- We can now rewrite our subquery as:
∑ccity="Port Chester"(customer) ∝ ∑balance>1000(deposit)
3. Second Transformational Rule:
- Replace expressions of the form P1 ^ P2 (e) by P1(P2(e)) where P1 and P2 are predicates and e is a relational algebra expression.
- Generally,
∑P1( ∑P2 (e)) = ∑P2( ∑P1(e)) = ∑P1 ∧ P2 (e)
