IMPALA SQL_Performance - loukenny/atme GitHub Wiki

QUERY PERFORMANCE

STATISTICS TIME

  • command reports number of milliseconds, required to parse, compile, and execute a query

    • SET STATISTICS TIME ON
    • SET STATISTICS TIME OFF
  • Page Read Statistics

    • SET STATISTICS IO ON
    • SET STATISTICS IO OFF
  • Elapsed time

    • total duration of the query
    • May be variable when analyzing query time statistics
    • Time best time statistics measure for the fastest running query
  • CPU time

    • time taken by server processors to process the query
    • Should vary little when analyzing query time statistics
    • May not be a useful measure if server processors are running in parallel

INDEXES

  • Structure to improve speed of accessing data from a table
  • Used to locate data quickly without having to scan the entire table
  • Useful for improving performance of queries with filter conditions
  • Applied to table columns
  • Typically added by a database administrator
  • Clustured Index = B-tree structure

    • Analogy: dictionary
    • Table data pages are ordered by the column(s) with the index
    • Only one allowed per table
    • Speeds up search operations
    • Root Node / Branch Nodes / Page Nodes
  • Non-clustered Index

    • Analogy: text book with an index at the back
    • Structure contains an ordered layer of index pointers to unordered table data pages
    • A table can have more than one
    • Improves insert and update operations

EXECUTION PLANS

  • Optimization Phase

    • Evaluates multiple execution plans and selects the one optimized for the lowest cost
    • processor / memory / data page usage are evaluated as cost parameters ↓
  • Execution Engine

    • Executes the query based on the best execution plan
  • Info from the execution plans

    • whether indexes were used
    • types of joins used
    • location and relative costs of
      • filter conditions
      • sorting
      • aggregations