Joins in Apache Hive [Delete this one, better the new verison] - ignacio-alorre/Hive GitHub Wiki

INPUT TABLES

Table customers

select * from customers;

Output

 customers.id customers.name
 1 John
 2 Kevin
 19 Alex
 3 Mark
 4 Jenna
 5 Robert
 6 Zoya
 7 Sam
 8 George
 9 Peter

Table Orders

select * from orders;

Output

 order_id orders.order_date orders.customer_id orders.amount
 101 2016-01-01 7 3540
 102 2016-03-01 1 240
 103 2016-03-02 6 2340
 104 2016-02-12 3 5000
 105 2016-02-12 3 5500
 106 2016-02-14 9 3005
 107 2016-02-14 1 20
 108 2016-02-29 2 2000
 109 2016-02-29 3 2500
 110 2016-02-27 1 200

1- INNER JOIN

Selecting records that have matching values in both tables:

select c.id, c.name, o.order_date, o.amount from customers c inner join orders o ON (c.id = o.customer_id);

Output

c.id c.name o.order_date o.amount
 7 Sam 2016-01-01 3540
 1 John 2016-03-01 240
 6 Zoya 2016-03-02 2340
 3 Mark 2016-02-12 5000
 3 Mark 2016-02-12 5500
 9 Peter 2016-02-14 3005
 1 John 2016-02-14 20
 2 Kevin 2016-02-29 2000
 3 Mark 2016-02-29 2500
 1 John 2016-02-27 200

We can some customers are not in the output like: George, Alex or Jenna, since there are not orders tagged to these customers.

2- LEFT JOIN (LEFT OUTER JOIN)

Returns all the values from the left table, plus the matched values from the right table, or NULL in case of no matching join predicate.

SELECT c.id, c.name, o.order_date, o.amount 
FROM customers c LEFT OUTER JOIN orders o 
ON (c.id = o.customer_id);

Output

c.id    c.name    o.order_date    o.amount
 1    John    2016-03-01    240
 1    John    2016-02-14    20
 1    John    2016-02-27    200
 2    Kevin    2016-02-29    2000
 19    Alex    NULL    NULL
 3    Mark    2016-02-12    5000
 3    Mark    2016-02-12    5500
 3    Mark    2016-02-29    2500
 4    Jenna    NULL    NULL
 5    Robert    NULL    NULL
 6    Zoya    2016-03-02    2340
 7    Sam    2016-01-01    3540
 8    George    NULL    NULL
 9    Peter    2016-02-14    3005

3- RIGHT JOIN (RIGHT OUTER JOIN)

Returns all the values from the right table, plus the matched values from the left table, or NULL in case of no matching join predicate

SELECT c.id, c.name, o.order_date, o.amount 
FROM customers c 
LEFT OUTER JOIN orders o ON (c.id = o.customer_id);

Output

 c.id    c.name    o.order_date    o.amount
 7    Sam    2016-01-01    3540
 1    John    2016-03-01    240
 6    Zoya    2016-03-02    2340
 3    Mark    2016-02-12    5000
 3    Mark    2016-02-12    5500
 9    Peter    2016-02-14    3005
 1    John    2016-02-14    20
 2    Kevin    2016-02-29    2000
 3    Mark    2016-02-29    2500
 1    John    2016-02-27    200

4- FULL JOIN (FULL OUTER JOIN)

Selects all records that match either left or right table records

SELECT c.id, c.name, o.order_date, o.amount 
FROM customers c 
FULL OUTER JOIN orders o ON (c.id = o.customer_id);

Output

c.id c.name o.order_date o.amount
 1 John 2016-02-27 200
 1 John 2016-02-14 20
 1 John 2016-03-01 240
 19 Alex NULL NULL
 2 Kevin 2016-02-29 2000
 3 Mark 2016-02-29 2500
 3 Mark 2016-02-12 5500
 3 Mark 2016-02-12 5000
 4 Jenna NULL NULL
 5 Robert NULL NULL
 6 Zoya 2016-03-02 2340
 7 Sam 2016-01-01 3540
 8 George NULL NULL
 9 Peter 2016-02-14 3005

5- LEFT SEMI JOIN

Find all the customers where at least one order exist or find all customer who has placed an order. But return only information related to the customer, not to the order itself.

select *  from customers  left semi join orders  ON 
(customers.id = orders.customer_id);

Output

customers.id    customers.name
1    John
2    Kevin
3    Mark
6    Zoya
7    Sam
9    Peter

*Sources