histogram - cheeyoung/sqlplus-public GitHub Wiki
SQL> execute dbms_stats.gather_table_stats( ownname => 'SH' , tabname => 'COUNTRIES' , method_opt => 'FOR COLUMNS COUNTRY_SUBREGION_ID' ) ;
default | ||
---|---|---|
254 | 1..2048 | 12.1 |
75 ? | 1..254 | 11.2 |
References
4.105 ALL_TAB_COL_STATISTICS
4.110 ALL_TAB_HISTOGRAMS 21c
HYBRID
FREQUENCY
TOP FREQUENCY
HEIGHT BALANCED
NONE
11 Histograms 21c
11.1 Purpose of Histograms
By default the optimizer assumes a uniform distribution of rows across the distinct values in a column.
For columns that contain data skew (a nonuniform distribution of data within the column), a histogram enables the optimizer to generate accurate cardinality estimates for filter and join predicates that involve these columns.
For example, a California-based book store ships 95% of the books to California, 4% to Oregon, and 1% to Nevada. The book orders table has 300,000 rows. A table column stores the state to which orders are shipped. A user queries the number of books shipped to Oregon. Without a histogram, the optimizer assumes an even distribution of 300000/3 (the NDV is 3), estimating cardinality at 100,000 rows. With this estimate, the optimizer chooses a full table scan. With a histogram, the optimizer calculates that 4% of the books are shipped to Oregon, and chooses an index scan.
11.2 When Oracle Database Creates Histograms
If DBMS_STATS gathers statistics for a table, and if queries have referenced the columns in this table, then Oracle Database creates histograms automatically as needed according to the previous query workload.
11.3 How Oracle Database Chooses the Histogram Type
11.4 Cardinality Algorithms When Using Histograms
11.4.1 Endpoint Numbers and Values
11.4.2 Popular and Nonpopular Values
- Popular values
- Nonpopular valuesAny value that is not popular is a nonpopular value. The optimizer calculates the cardinality estimates for nonpopular values using the following formula:
cardinality of nonpopular value = (num of rows in table) * density
The optimizer calculates density using an internal algorithm based on factors such as the number of buckets and the NDV. Density is expressed as a decimal number between 0 and 1. Values close to 1 indicate that the optimizer expects many rows to be returned by a query referencing this column in its predicate list. Values close to 0 indicate that the optimizer expects few rows to be returned.
11.5 Frequency Histogram
11.6 Top Frequency Histogram
11.7 Height Balanced Histogram
11.8 Hybrid Histogram
Frequency + Height Balanced