1. Parallel-join: split the pairs to be tested over several processors. Each processor computes part of the join,and then the results are assembled (merged).
2. Ideally, the overall work of computing join is partitioned evenly over all processors. If such a split is achieved without any overhead, a parallel join using N processors will take 1/N times as long as the same join would take on a single processor.
3. In practice, the speedup is less dramatic because
(a) Overhead is incurred in partitioning the work among the processors.
(b) Overhead is incurred in collecting the results computed by each processor.
(c) If the split is not even, the final result cannot be obtained until the last processor has finished.
(d) The processors may compete for shared system resources, e.g., for A ∝ B (e.g., deposit 1 customer),if each processor uses its own partition of A, and the main memory cannot hold the entire B, the processors need to synchronize the access of B so as to reduce the number of times that each block of B must be read in from disk.
4. A parallel hash algorithm to reduce memory contention.
Choose a hash function whose range is {1, . . .,N} which allows us to assign each of the N processors to exactly one hash bucket. Since the final outer for-loop of the hash-join algorithm iterates over buckets, each processor can process the iteration that corresponds to its assigned bucket. Since no tuple is assigned to more than one bucket, so there is no contention for B tuples. Since each processor considers one pair of tuples
at a time, the total main memory requirements of the parallel hash join algorithm are suffciently low that contention for space in main memory is unlikely.
