SQL query execution order - ignacio-alorre/Hive GitHub Wiki
Execution order of a SQL query:
- FROM, JOIN
- WHERE
- GROUP BY
- HAVING
- SELECT
- DISTINCT
- ORDER BY
- LIMIT, OFFSET
Let's use an example to clarify some of this.

select school_year, student_id, avg(gpa) as gpa_avg
from gpa_history
where is_required = TRUE
group by school_year, student_id
having avg(gpa) > 3.5
Sinve we want to filter out optional courses before computing average GPAs, we need to add the WHERE is_required = TRUE
instead HAVING, because WHERE is executed before GROUP BY and HAVING. The reason why we can not write HAVING avg_gpa >= 3.5
is that avg_gpa
is defined as part of SELECT
, so it cannot be referred to in steps executed before SELECT.
The EXPLAIN
command shows the execution plan for a query
EXPLAIN
SELECT id, COLLECT(DISTINCT name)
FROM D_customer
WHERE segment IS NOT null
GROUP BY id
ORDER BY id
LIMIT 2
Summarising the output of this query:
STAGE PLANS:
Stage: Stage-1
Map Reduce
Map Operator Tree:
filterExpr: segment is not null (type: boolean) [WHERE segment IS NOT null]
Select Operator
expressions: id (type: int), name (type: string) [SELECT id, name] !! No DISTINCT or COLLECT here still
Reduce Operator Tree:
Group By Operator [GROUP BY id]
aggregations: collect(DISTINCT KEY._col1:0._col0) [COLLECT DISTINCT name]
Stage: Stage-2
Map Reduce
Map Operator Tree: col0
sort order: + [SORT BY id]
Reduce Operator Tree:
Select Operator
Limit: 2 [LIMIT 2]
Stage: Stage-0
Fetch Operator
limit: 2
The full output of this query is:
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-2 depends on stages: Stage-1
Stage-0 depends on stages: Stage-2
STAGE PLANS:
Stage: Stage-1
Map Reduce
Map Operator Tree:
TableScan
alias: d_customer
filterExpr: segment is not null (type: boolean)
Statistics: Num rows: 1 Data size: 93 Basic stats: COMPLETE Column stats: NONE
Filter Operator
predicate: segment is not null (type: boolean)
Statistics: Num rows: 1 Data size: 93 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: id (type: int), name (type: string)
outputColumnNames: id, name
Statistics: Num rows: 1 Data size: 93 Basic stats: COMPLETE Column stats: NONE
Group By Operator
aggregations: collect(DISTINCT name)
keys: id (type: int), name (type: string)
mode: hash
outputColumnNames: _col0, _col1, _col2
Statistics: Num rows: 1 Data size: 93 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col0 (type: int), _col1 (type: string)
sort order: ++
Map-reduce partition columns: _col0 (type: int)
Statistics: Num rows: 1 Data size: 93 Basic stats: COMPLETE Column stats: NONE
Reduce Operator Tree:
Group By Operator
aggregations: collect(DISTINCT KEY._col1:0._col0)
keys: KEY._col0 (type: int)
mode: mergepartial
outputColumnNames: _col0, _col1
Statistics: Num rows: 1 Data size: 93 Basic stats: COMPLETE Column stats: NONE
File Output Operator
compressed: false
table:
input format: org.apache.hadoop.mapred.SequenceFileInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe
Stage: Stage-2
Map Reduce
Map Operator Tree:
TableScan
Reduce Output Operator
key expressions: _col0 (type: int)
sort order: +
Statistics: Num rows: 1 Data size: 93 Basic stats: COMPLETE Column stats: NONE
TopN Hash Memory Usage: 0.1
value expressions: _col1 (type: array<string>)
Reduce Operator Tree:
Select Operator
expressions: KEY.reducesinkkey0 (type: int), VALUE._col0 (type: array<string>)
outputColumnNames: _col0, _col1
Statistics: Num rows: 1 Data size: 93 Basic stats: COMPLETE Column stats: NONE
Limit
Number of rows: 2
Statistics: Num rows: 1 Data size: 93 Basic stats: COMPLETE Column stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 1 Data size: 93 Basic stats: COMPLETE Column stats: NONE
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Stage: Stage-0
Fetch Operator
limit: 2
Processor Tree:
ListSink