ApacheCalciteEnumerable - eellpp/pubScratchpad GitHub Wiki

Enumerable Adapter

The Enumerable Adapter in Apache Calcite serves a specific and important purpose: it provides a default implementation for executing queries in Calcite when no other adapter or data source is explicitly configured. It is a fallback mechanism that allows Calcite to execute queries in-memory using the Linq4j library, which is Calcite's implementation of LINQ (Language Integrated Query) for Java.

Purpose of the Enumerable Adapter

  1. Fallback Execution Engine:

    • When Calcite cannot push down a query to an external data source (e.g., a database or a custom adapter), it falls back to the Enumerable Adapter.
    • The Enumerable Adapter executes the query in-memory using the Linq4j library.
  2. In-Memory Query Execution:

    • The Enumerable Adapter allows Calcite to execute queries on data that is already loaded into memory (e.g., Java collections or in-memory tables).
    • This is useful for testing, prototyping, or when working with small datasets.
  3. Reference Implementation:

    • The Enumerable Adapter serves as a reference implementation for how to implement query execution in Calcite.
    • It demonstrates how to translate Calcite's relational algebra into executable code using Linq4j.
  4. Support for Advanced Features:

    • The Enumerable Adapter supports advanced SQL features like joins, aggregations, and sorting, even when the underlying data source does not.

How the Enumerable Adapter Works

The Enumerable Adapter works by translating Calcite's relational algebra into Linq4j expressions, which are then executed in-memory. Here’s a high-level overview of the process:

  1. Query Parsing and Optimization:

    • Calcite parses the SQL query and optimizes it into a relational algebra plan.
  2. Enumerable Convention:

    • If the query cannot be pushed down to an external data source, Calcite converts the relational algebra plan into the Enumerable convention.
    • The Enumerable convention represents the plan as a series of Linq4j expressions.
  3. Code Generation:

    • Calcite generates Java code (using Linq4j) to execute the query in-memory.
    • This code is typically a series of nested loops and function calls that implement the relational operators (e.g., filter, project, join).
  4. Execution:

    • The generated code is executed on the data, producing the query results.

Use Cases for the Enumerable Adapter

  1. Testing and Prototyping:

    • The Enumerable Adapter is useful for testing and prototyping SQL queries without needing to connect to an external data source.
  2. Small Datasets:

    • For small datasets that can fit into memory, the Enumerable Adapter provides a simple and efficient way to execute queries.
  3. Custom Adapters:

    • When building custom adapters, the Enumerable Adapter can be used as a fallback for operations that the adapter does not support natively.
  4. Standalone Query Engine:

    • The Enumerable Adapter allows Calcite to function as a standalone in-memory query engine, independent of external data sources.

Example: Enumerable Adapter in Action

Here’s an example of how the Enumerable Adapter works in practice:

1. Define an In-Memory Table
import org.apache.calcite.DataContext;
import org.apache.calcite.linq4j.Enumerable;
import org.apache.calcite.linq4j.Linq4j;
import org.apache.calcite.rel.type.RelDataType;
import org.apache.calcite.rel.type.RelDataTypeFactory;
import org.apache.calcite.schema.ScannableTable;
import org.apache.calcite.schema.impl.AbstractTable;

import java.util.Arrays;
import java.util.List;

public class InMemoryTable extends AbstractTable implements ScannableTable {
    private final List<Object[]> data;

    public InMemoryTable(List<Object[]> data) {
        this.data = data;
    }

    @Override
    public RelDataType getRowType(RelDataTypeFactory typeFactory) {
        return typeFactory.builder()
                .add("id", typeFactory.createSqlType(SqlTypeName.INTEGER))
                .add("name", typeFactory.createSqlType(SqlTypeName.VARCHAR))
                .add("age", typeFactory.createSqlType(SqlTypeName.INTEGER))
                .build();
    }

    @Override
    public Enumerable<Object[]> scan(DataContext root) {
        return Linq4j.asEnumerable(data);
    }
}
2. Register the Table and Execute a Query
import org.apache.calcite.jdbc.CalciteConnection;
import org.apache.calcite.schema.SchemaPlus;

import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.Statement;
import java.util.Arrays;
import java.util.List;

public class EnumerableAdapterExample {
    public static void main(String[] args) throws Exception {
        // Create an in-memory table
        List<Object[]> data = Arrays.asList(
                new Object[]{1, "Alice", 30},
                new Object[]{2, "Bob", 25},
                new Object[]{3, "Charlie", 35}
        );
        InMemoryTable table = new InMemoryTable(data);

        // Register the table with Calcite
        Connection connection = DriverManager.getConnection("jdbc:calcite:");
        CalciteConnection calciteConnection = connection.unwrap(CalciteConnection.class);
        SchemaPlus rootSchema = calciteConnection.getRootSchema();
        rootSchema.add("MEMORY_TABLE", table);

        // Execute a query
        Statement statement = calciteConnection.createStatement();
        ResultSet resultSet = statement.executeQuery("SELECT id, name, age FROM MEMORY_TABLE WHERE age > 30");
        while (resultSet.next()) {
            System.out.println(
                    resultSet.getInt("id") + ", " +
                            resultSet.getString("name") + ", " +
                            resultSet.getInt("age"));
        }

        // Clean up
        resultSet.close();
        statement.close();
        connection.close();
    }
}
Output
3, Charlie, 35

Advantages of the Enumerable Adapter

  1. Simplicity:
    • Provides a straightforward way to execute queries in-memory.
  2. Flexibility:
    • Supports advanced SQL features without requiring external data sources.
  3. Fallback Mechanism:
    • Ensures that queries can always be executed, even if no adapter is available.

Disadvantages of the Enumerable Adapter

  1. Performance:
    • In-memory execution can be slow for large datasets.
  2. Memory Usage:
    • Requires all data to fit into memory, which may not be feasible for large datasets.
  3. Limited Push-Down:
    • Does not leverage the capabilities of external data sources (e.g., indexes, distributed processing).

Conclusion

The Enumerable Adapter is a key component of Calcite that provides a fallback mechanism for query execution. It is particularly useful for testing, prototyping, and working with small datasets. However, for production use cases with large datasets, it is recommended to use adapters that can push down queries to external data sources for better performance and scalability.


Enumerable Interface

Enumerable Interface

Exposes the enumerator, which supports a simple iteration over a collection

public interface Enumerable extends RawEnumerable, Iterable, ExtendedEnumerable

  • Enumerable<T> is Calcite's equivalent of a relational iterator, inspired by .NET's LINQ Enumerable.
  • It allows row-by-row or batch execution of relational queries.
  • It is fully in-memory, meaning it operates on Java collections instead of a database.

Example: Query Execution in Enumerable

Enumerable<Object[]> result = EnumerableInterpretable.toEnumerable(optimizedPlan);

for (Object[] row : result) {
    System.out.println(Arrays.toString(row));
}
  • Here, result is an Enumerable that executes the query in memory, without a database.
  • Each row is represented as an Object[], just like a SQL row.

How Does the Enumerable Runtime Work?

Calcite translates SQL queries into Java-based Enumerable expressions, which are then executed dynamically.

Execution Flow

1️⃣ SQL Parsing & Logical Plan Generation

  • SQL query is converted into a RelNode (Relational Algebra Expression).
  • No actual data is accessed yet.

2️⃣ Optimization

  • Logical optimizations (e.g., predicate pushdown, projections).
  • Conversion to a physical plan using Enumerable operators.

3️⃣ Execution Using EnumerableRel (Calcite’s Physical Execution Plan)

  • The final execution plan consists of Enumerable-based operations (e.g., filtering, sorting, aggregations).
  • These operations are executed in-memory using Java.

What Does This Mean in Practice?

1. Queries Run in Java, Without a Database

Unlike traditional databases that execute queries using a disk-based engine, Calcite:

  • Executes directly in memory using Java’s Enumerable API.
  • Processes data using function-style operations (like Java Streams).

2. Query Execution is Lazy (Row-by-Row or Batch)

  • Enumerable<Object[]> fetches rows lazily.
  • This prevents loading the entire dataset into memory at once.

3. Supports Efficient Query Execution Without JDBC

  • You don’t need a database engine—Calcite executes queries in-memory.
  • Works well for structured data in JSON, CSV, or programmatically defined tables.

** Example: SQL Query Execution Using EnumerableRel**

Consider this SQL query:

SELECT name, salary FROM employees WHERE salary > 50000;

Step 1: Convert SQL to RelNode

Calcite first parses and optimizes it into a relational algebra expression (RelNode):

LogicalProject(name, salary)
  LogicalFilter(condition=[salary > 50000])
    LogicalTableScan(table=[employees])

No data is accessed yet—Calcite is still building the execution plan.

Step 2: Convert RelNode to EnumerableRel

Calcite converts it into an EnumerableRel physical plan, which uses Java iterators:

EnumerableProject(name, salary)
  EnumerableFilter(condition=[salary > 50000])
    EnumerableTableScan(table=[employees])

Here, EnumerableFilter and EnumerableTableScan process rows in-memory.

Step 3: Execute the Query Using Enumerable

Enumerable<Object[]> result = EnumerableInterpretable.toEnumerable(optimizedPlan);

for (Object[] row : result) {
    System.out.println(Arrays.toString(row));
}

This code lazily fetches rows and applies the WHERE and SELECT filters dynamically.


How is This Different from a Traditional Database Runtime?

Feature Calcite Enumerable Runtime Traditional Database (PostgreSQL, MySQL)
Execution Model Java-based in-memory execution Disk-based execution engine
Optimization Rule-based optimization (RBO) Cost-based optimization (CBO)
Data Access Operates on Java collections, in-memory tables Reads from disk-based tables
Laziness Fetches rows lazily (Enumerable.next()) Uses disk buffers, indexes, or materialized execution
Use Case Query execution on CSV, JSON, in-memory collections Traditional relational databases

When Should You Use Calcite’s Enumerable Runtime?

Best for:

  • In-memory analytics on structured data (CSV, JSON, Parquet, Java collections).
  • Embedding SQL-like query capabilities inside Java applications.
  • Querying without a database (Calcite acts as a SQL engine).

Not ideal for:

  • Very large datasets that don’t fit in memory (because Enumerable is in-memory).
  • Transactional workloads (Calcite is not an OLTP database).

Summary

Calcite's runtime is based on Enumerable, meaning it executes queries in-memory using Java.
It allows SQL-like operations without a database, using function-style execution.
This makes it ideal for embedded analytics, querying in-memory structures, and working with non-database sources.
It does not store data—it only processes structured data when needed.

⚠️ **GitHub.com Fallback** ⚠️