IMPALA SQL_Performance - loukenny/atme GitHub Wiki
QUERY PERFORMANCE
STATISTICS TIME
-
command reports number of milliseconds, required to parse, compile, and execute a query
SET STATISTICS TIME ONSET STATISTICS TIME OFF
-
Page Read Statistics
SET STATISTICS IO ONSET STATISTICS IO OFF
-
Elapsed time
- total duration of the query
- May be variable when analyzing query time statistics
- Time best time statistics measure for the fastest running query
-
CPU time
- time taken by server processors to process the query
- Should vary little when analyzing query time statistics
- May not be a useful measure if server processors are running in parallel
INDEXES
- Structure to improve speed of accessing data from a table
- Used to locate data quickly without having to scan the entire table
- Useful for improving performance of queries with filter conditions
- Applied to table columns
- Typically added by a database administrator
-
Clustured Index = B-tree structure
- Analogy: dictionary
- Table data pages are ordered by the column(s) with the index
- Only one allowed per table
- Speeds up search operations
- Root Node / Branch Nodes / Page Nodes
-
Non-clustered Index
- Analogy: text book with an index at the back
- Structure contains an ordered layer of index pointers to unordered table data pages
- A table can have more than one
- Improves insert and update operations
EXECUTION PLANS
-
Optimization Phase
- Evaluates multiple execution plans and selects the one optimized for the lowest cost
- processor / memory / data page usage are evaluated as cost parameters ↓
-
Execution Engine
- Executes the query based on the best execution plan
-
Info from the execution plans
- whether indexes were used
- types of joins used
- location and relative costs of
- filter conditions
- sorting
- aggregations