中文网站
  Advanced Search
Read the latest Blogs from IT professionals in the field. Read and write community created documents. Need IT help? Ask our staff. Connect with your peers. Check our Tech Shop for posters, books and software tools. Home

12.3 Database Systems: Estimation of Query-Processing Cost

1. To choose a strategy based on reliable information, the database system may store statistics for each relation r:

  • nr - the number of tuples in r
  • sr - the size in bytes of a tuple of r (for fixed-length records).
  • V (A, r) - the number of distinct values that appear in relation r for attribute A.

2. The first two quantities allow us to estimate accurately the size of a Cartesian product.

  • The Cartesian product r × s contains nrns tuples.
  • Each tuple of r × s occupies sr + ss bytes.
  • The third statistic is used to estimate how many tuples satisfy a selection predicate of the form <attribute-name> = <value>
  • We need to know how often each value appears in a column.
  • If we assume each value appears with equal probability, then

∑A=a(r)

is estimated to have                  

        tuples.

  • This may not be the case, but it is a good approximation of reality in many relations.
  • We assume such a uniform distribution for the rest of this chapter.
  • Estimation of the size of a natural join is more diffcult
  • Let r1(R1) and r2(R2) be relations on schemes R1 and R2.
  • If R1 ∩ R2 = Φ (no common attributes), then r1 ∝ r2 is the same as r1 ∝ r2 and we can estimate the size of this accurately.
  • If R1 ∩ R2 is a key for R1, then we know that a tuple of r2 will join with exactly one tuple of r1.
  • Thus the number of tuples in r1 ∝ r2 will be no greater than nr2 .
  • If R1 ∩ R2 is not a key for R1 or R2, things are more dicult.
  • We use the third statistic and the assumption of uniform distribution.
  • Assume R1 ∩ R2 = fAg.
  • We assume there are

tuples in r2 with an A value of t[A] for tuple t in r1.

  • So tuple t of r1 produces

tuples in r1 ∝ r2

3. Considering all the tuples in r1, we estimate that there are

 

tuples in total in r1

4. If we reverse the roles of r1 and r2 in this equation, we get a different estimate

 

if V (A,r1) 6= V (A, r2).

  • If this occurs, there are likely to be some dangling tuples that do not participate in the join.
  • Thus the lower estimate is probably the better one.
  • This estimate may still be high if the V (A; r1) values in r1 have few values in common with the V (A; r2) values in r2.
  • However, it is unlikely that the estimate is far o , as dangling tuples are likely to be a small fraction of the tuples in a real world relation.

5. To maintain accurate statistics, it is necessary to update the statistics whenever a relation is modified.This can be substantial, so most systems do this updating during periods of light load on the system.

Database System Structure: