hybrid histogram - cheeyoung/sqlplus-public GitHub Wiki
An Example of query Hybrid Histogram
...
NUM_DISTINCT NUM_BUCKETS HISTOGRAM
------------ ----------- ---------------
21309440 254 HYBRID
...
-----------------------------------------
ENDPOINT_NUMBER ENDPOINT_VALUE
--------------- --------------
4275 2459008.88
...
4296 2459009.03
...
4318 2459009.16
.
.
.
4209 2459008.37
...
4231 2459008.61
...
4253 2459008.73
254 rows selected.
$ tail -n +60 query05.out | grep -c ' [ 0-9][ 0-9][ 0-9][0-9] '
254
$ tail -n +60 query05.out | grep ' [ 0-9][ 0-9][ 0-9][0-9] ' | sort -n | awk '{ printf "%s,%s\n",$2,$1 }' > histogram_col01.csv
$ vi -c "set number" histogram_col01.csv
1 2458900.71,1
2 2458928.94,23
3 2458950.45,45
4 2458963.45,67
5 2458972.33,89
6 2458972.79,110
7 2458973.06,132
8 2458973.34,154
9 2458973.56,176
10 2458973.85,198
...
245 2459016.4,5321
246 2459016.52,5343
247 2459016.67,5365
248 2459016.84,5386
249 2459017.02,5408
250 2459017.17,5430
251 2459017.46,5452
252 2459017.65,5474
253 2459017.81,5496
254 2459017.92,5517
The height-based histogram sometimes produces inaccurate estimates for values that are almost popular.
For example, a value that occurs as an endpoint value of only one bucket but almost occupies two buckets is not considered popular.
To solve this problem, a hybrid histogram distributes values so that no value occupies more than one bucket, and then stores the endpoint repeat count value, which is the number of times the endpoint value is repeated, for each endpoint (bucket) in the histogram. By using the repeat count, the optimizer can obtain accurate estimates for almost popular values.