histogram - cheeyoung/sqlplus-public GitHub Wiki

SQL> execute dbms_stats.gather_table_stats( ownname => 'SH' , tabname => 'COUNTRIES' , method_opt => 'FOR COLUMNS COUNTRY_SUBREGION_ID' ) ;

default
254	1..2048	12.1
75 ?	1..254	11.2

References

4.105 ALL_TAB_COL_STATISTICS
4.110 ALL_TAB_HISTOGRAMS 21c

HYBRID  
FREQUENCY 
TOP FREQUENCY
HEIGHT BALANCED 
NONE

11 Histograms 21c
11.1 Purpose of Histograms

By default the optimizer assumes a uniform distribution of rows across the distinct values in a column.

For columns that contain data skew (a nonuniform distribution of data within the column), a histogram enables the optimizer to generate accurate cardinality estimates for filter and join predicates that involve these columns.

For example, a California-based book store ships 95% of the books to California, 4% to Oregon, and 1% to Nevada. The book orders table has 300,000 rows. A table column stores the state to which orders are shipped. A user queries the number of books shipped to Oregon. Without a histogram, the optimizer assumes an even distribution of 300000/3 (the NDV is 3), estimating cardinality at 100,000 rows. With this estimate, the optimizer chooses a full table scan. With a histogram, the optimizer calculates that 4% of the books are shipped to Oregon, and chooses an index scan.

11.2 When Oracle Database Creates Histograms

If DBMS_STATS gathers statistics for a table, and if queries have referenced the columns in this table, then Oracle Database creates histograms automatically as needed according to the previous query workload.

11.3 How Oracle Database Chooses the Histogram Type

11.4 Cardinality Algorithms When Using Histograms
11.4.1 Endpoint Numbers and Values

11.4.2 Popular and Nonpopular Values

Popular values
Nonpopular valuesAny value that is not popular is a nonpopular value. The optimizer calculates the cardinality estimates for nonpopular values using the following formula:

cardinality of nonpopular value = (num of rows in table) * density

The optimizer calculates density using an internal algorithm based on factors such as the number of buckets and the NDV. Density is expressed as a decimal number between 0 and 1. Values close to 1 indicate that the optimizer expects many rows to be returned by a query referencing this column in its predicate list. Values close to 0 indicate that the optimizer expects few rows to be returned.

11.4.3 Bucket Compression

11.5 Frequency Histogram
11.6 Top Frequency Histogram
11.7 Height Balanced Histogram
11.8 Hybrid Histogram
Frequency + Height Balanced

Cumulative histogram