18.1.1 IN vs EXISTS - mergen-sergio/DBest GitHub Wiki
The first example retrieves movies that have at least one cast member. The query below uses EXISTS
, where the subquery checks if a matching movie_id
exists in the movie_cast
table. If a match is found, the movie is included in the result set.
SELECT * FROM movie m
WHERE EXISTS (
SELECT 1 FROM movie_cast mc
WHERE m.movie_id = mc.movie_id
);
Since the correlated condition uses equality, the query can also be expressed with IN
, as shown below. The movie is considered relevant if its movie_id
appears in a list of movie_ids
associated with cast members.
SELECT * FROM movie
WHERE movie_id IN (
SELECT movie_id FROM movie_cast
);
We present two primary ways to represent this query in DBest:
-
Without Materialization (left approach): The execution is fully pipelined, with the Nested Loop Semi Join directly searching for matching
movie_id
s in themovie_cast
table. -
With Materialization (right approach): The Hash operator materializes
movie_cast
tuples in a hash table, enabling the Nested Loop Semi Join to perform efficient lookups.- Alternatively, a Hash Left Semi Join can replace the Nested Loop Semi Join + Hash combination.

Materialization is preferable when there are many lookups since it avoids repeated index accesses. Pipelined execution is better for a small number of lookups, as it eliminates materialization overhead.
Historically, SQL optimizers used materialization for IN
subqueries and pipelining for EXISTS
subqueries. However, modern optimizers can dynamically select the best execution plan, regardless of the SQL syntax used.
The following SQL query retrieves movies released in the same year as Casablanca. Although it uses an IN
subquery, an equivalent version with EXISTS
is also possible.
SELECT * FROM movie
WHERE release_year IN (
SELECT release_year FROM movie
WHERE title = 'Casablanca'
);
One way to represent this query in DBest is by materializing the movie_cast
tuples. Since the number of movies titled Casablanca is expected to be small, the cost of materialization remains low.
Two different approaches to materialization are discussed:
-
Materialized Operator (Left Approach):
- Stores Casablanca movies in memory.
- Requires a full scan over the materialized tuples for each lookup, as indexed search is not supported.
-
Hash Operator (Right Approach):
- Stores Casablanca movies in a hash table for quick lookups.
- Given a movie, the hash table is queried directly to find matching tuples.
Since the number of materialized tuples is small, both strategies should perform similarly.

The next query retrieves movies released before Casablanca. Unlike previous examples, this cannot be expressed using IN
because the condition is a range comparison, not an equality check. However, EXISTS
can handle it. This example demonstrates that EXISTS
has greater expressive power than IN
.
SELECT * FROM movie2 m1
WHERE EXISTS (
SELECT 1 FROM movie2 m2
WHERE m2.title = 'Casablanca'
AND m1.release_year < m2.release_year
);
There are multiple ways to represent this query in DBest, one of which involves materialization:
- Materialize Operation: Stores Casablanca movies in memory for efficient access.
-
Filter Operator: Evaluates whether each movie's
release_year
meets the specified condition. - Nested Loop Semi Join: Executes without an explicit condition, as the range check is performed by the internal filter.

Since only a small number of tuples are materialized, the filtering step remains efficient. Interestingly, the materialized approach is applied to subqueries expressed with EXISTS
. While IN
subqueries are traditionally associated with materialization, this example demonstrates that materialization can also be beneficial for other types of subqueries.