1. To make a wise choice between the methods seen, database designer must consider the following issues:
- Is the cost of periodic re-organization of index or hash structure acceptable?
- What is the relative frequence of insertion and deletion?
- Is it desirable to optimize average access time at the expense of increasing worst-case access time?
- What types of queries are users likely to pose?
2. The last issue is critical to the choice between indexing and hashing. If most queries are of the form
select A1,A2,...,An
from r
where Ai = c
then to process this query the system will perform a lookup on an index or hash structure for attribute Ai with value c.
3. For these sorts of queries a hashing scheme is preferable.
- Index lookup takes time proportional to log of number of values in R for Ai.
- Hash structure provides lookup average time that is a small constant (independent of database size).
4. However, the worst-case favors indexing:
- Hash worst-case gives time proportional to the number of values in R for Ai.
- Index worst case still log of number of values in R.
5. Index methods are preferable where a range of values is specified in the query, e.g.
select A1,A2,...,An
from r
where Ai ≤ c2 and Ai ≥ c1
This query nds records with Ai values in the range from c1 to c2.
6.
- Using an index structure, we can nd the bucket for value c1, and then follow the pointer chain to read the next buckets in alphabetic (or numeric) order until we nd c2.
- If we have a hash structure instead of an index, we can nd a bucket for c1 easily, but it is not easy to find the "next bucket".
- A good hash function assigns values randomly to buckets.
- Also, each bucket may be assigned many search key values, so we cannot chain them together.
- To support range queries using a hash structure, we need a hash function that preserves order.
- For example, if K1 and K2 are search key values and K1 < K2 then h(K1) < h(K2).
- Such a function would ensure that buckets are in key order.
- Order-preserving hash functions that also provide randomness and uniformity are extremely diffcult to find.
- Thus most systems use indexing in preference to hashing unless it is known in advance that range queries will be infrequent.
