hybrid histogram - cheeyoung/sqlplus-public GitHub Wiki

An Example of query Hybrid Histogram

...  
NUM_DISTINCT NUM_BUCKETS HISTOGRAM  
------------ ----------- ---------------  
    21309440         254 HYBRID  
...  
-----------------------------------------  
ENDPOINT_NUMBER ENDPOINT_VALUE  
--------------- --------------  
           4275     2459008.88  
...  
           4296     2459009.03  
...  
           4318     2459009.16  
.  
.  
.
           4209     2459008.37  
...  
           4231     2459008.61  
...  
           4253     2459008.73  

254 rows selected.

$ tail -n +60 query05.out | grep -c ' [ 0-9][ 0-9][ 0-9][0-9] '
254
$ tail -n +60 query05.out | grep ' [ 0-9][ 0-9][ 0-9][0-9] '  | sort -n | awk '{ printf "%s,%s\n",$2,$1 }' > histogram_col01.csv
$ vi -c "set number" histogram_col01.csv
      1 2458900.71,1
      2 2458928.94,23
      3 2458950.45,45
      4 2458963.45,67
      5 2458972.33,89
      6 2458972.79,110
      7 2458973.06,132
      8 2458973.34,154
      9 2458973.56,176
     10 2458973.85,198
...
    245 2459016.4,5321
    246 2459016.52,5343
    247 2459016.67,5365
    248 2459016.84,5386
    249 2459017.02,5408
    250 2459017.17,5430
    251 2459017.46,5452
    252 2459017.65,5474
    253 2459017.81,5496
    254 2459017.92,5517

11.8 Hybrid Histograms

The height-based histogram sometimes produces inaccurate estimates for values that are almost popular.
For example, a value that occurs as an endpoint value of only one bucket but almost occupies two buckets is not considered popular.
To solve this problem, a hybrid histogram distributes values so that no value occupies more than one bucket, and then stores the endpoint repeat count value, which is the number of times the endpoint value is repeated, for each endpoint (bucket) in the histogram. By using the repeat count, the optimizer can obtain accurate estimates for almost popular values.