ApacheCalciteEnumerable - eellpp/pubScratchpad GitHub Wiki
The Enumerable Adapter in Apache Calcite serves a specific and important purpose: it provides a default implementation for executing queries in Calcite when no other adapter or data source is explicitly configured. It is a fallback mechanism that allows Calcite to execute queries in-memory using the Linq4j library, which is Calcite's implementation of LINQ (Language Integrated Query) for Java.
-
Fallback Execution Engine:
- When Calcite cannot push down a query to an external data source (e.g., a database or a custom adapter), it falls back to the Enumerable Adapter.
- The Enumerable Adapter executes the query in-memory using the Linq4j library.
-
In-Memory Query Execution:
- The Enumerable Adapter allows Calcite to execute queries on data that is already loaded into memory (e.g., Java collections or in-memory tables).
- This is useful for testing, prototyping, or when working with small datasets.
-
Reference Implementation:
- The Enumerable Adapter serves as a reference implementation for how to implement query execution in Calcite.
- It demonstrates how to translate Calcite's relational algebra into executable code using Linq4j.
-
Support for Advanced Features:
- The Enumerable Adapter supports advanced SQL features like joins, aggregations, and sorting, even when the underlying data source does not.
The Enumerable Adapter works by translating Calcite's relational algebra into Linq4j expressions, which are then executed in-memory. Here’s a high-level overview of the process:
-
Query Parsing and Optimization:
- Calcite parses the SQL query and optimizes it into a relational algebra plan.
-
Enumerable Convention:
- If the query cannot be pushed down to an external data source, Calcite converts the relational algebra plan into the Enumerable convention.
- The Enumerable convention represents the plan as a series of Linq4j expressions.
-
Code Generation:
- Calcite generates Java code (using Linq4j) to execute the query in-memory.
- This code is typically a series of nested loops and function calls that implement the relational operators (e.g., filter, project, join).
-
Execution:
- The generated code is executed on the data, producing the query results.
-
Testing and Prototyping:
- The Enumerable Adapter is useful for testing and prototyping SQL queries without needing to connect to an external data source.
-
Small Datasets:
- For small datasets that can fit into memory, the Enumerable Adapter provides a simple and efficient way to execute queries.
-
Custom Adapters:
- When building custom adapters, the Enumerable Adapter can be used as a fallback for operations that the adapter does not support natively.
-
Standalone Query Engine:
- The Enumerable Adapter allows Calcite to function as a standalone in-memory query engine, independent of external data sources.
Here’s an example of how the Enumerable Adapter works in practice:
import org.apache.calcite.DataContext;
import org.apache.calcite.linq4j.Enumerable;
import org.apache.calcite.linq4j.Linq4j;
import org.apache.calcite.rel.type.RelDataType;
import org.apache.calcite.rel.type.RelDataTypeFactory;
import org.apache.calcite.schema.ScannableTable;
import org.apache.calcite.schema.impl.AbstractTable;
import java.util.Arrays;
import java.util.List;
public class InMemoryTable extends AbstractTable implements ScannableTable {
private final List<Object[]> data;
public InMemoryTable(List<Object[]> data) {
this.data = data;
}
@Override
public RelDataType getRowType(RelDataTypeFactory typeFactory) {
return typeFactory.builder()
.add("id", typeFactory.createSqlType(SqlTypeName.INTEGER))
.add("name", typeFactory.createSqlType(SqlTypeName.VARCHAR))
.add("age", typeFactory.createSqlType(SqlTypeName.INTEGER))
.build();
}
@Override
public Enumerable<Object[]> scan(DataContext root) {
return Linq4j.asEnumerable(data);
}
}
import org.apache.calcite.jdbc.CalciteConnection;
import org.apache.calcite.schema.SchemaPlus;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.Statement;
import java.util.Arrays;
import java.util.List;
public class EnumerableAdapterExample {
public static void main(String[] args) throws Exception {
// Create an in-memory table
List<Object[]> data = Arrays.asList(
new Object[]{1, "Alice", 30},
new Object[]{2, "Bob", 25},
new Object[]{3, "Charlie", 35}
);
InMemoryTable table = new InMemoryTable(data);
// Register the table with Calcite
Connection connection = DriverManager.getConnection("jdbc:calcite:");
CalciteConnection calciteConnection = connection.unwrap(CalciteConnection.class);
SchemaPlus rootSchema = calciteConnection.getRootSchema();
rootSchema.add("MEMORY_TABLE", table);
// Execute a query
Statement statement = calciteConnection.createStatement();
ResultSet resultSet = statement.executeQuery("SELECT id, name, age FROM MEMORY_TABLE WHERE age > 30");
while (resultSet.next()) {
System.out.println(
resultSet.getInt("id") + ", " +
resultSet.getString("name") + ", " +
resultSet.getInt("age"));
}
// Clean up
resultSet.close();
statement.close();
connection.close();
}
}
3, Charlie, 35
-
Simplicity:
- Provides a straightforward way to execute queries in-memory.
-
Flexibility:
- Supports advanced SQL features without requiring external data sources.
-
Fallback Mechanism:
- Ensures that queries can always be executed, even if no adapter is available.
-
Performance:
- In-memory execution can be slow for large datasets.
-
Memory Usage:
- Requires all data to fit into memory, which may not be feasible for large datasets.
-
Limited Push-Down:
- Does not leverage the capabilities of external data sources (e.g., indexes, distributed processing).
The Enumerable Adapter is a key component of Calcite that provides a fallback mechanism for query execution. It is particularly useful for testing, prototyping, and working with small datasets. However, for production use cases with large datasets, it is recommended to use adapters that can push down queries to external data sources for better performance and scalability.
Exposes the enumerator, which supports a simple iteration over a collection
public interface Enumerable extends RawEnumerable, Iterable, ExtendedEnumerable
-
Enumerable<T>
is Calcite's equivalent of a relational iterator, inspired by .NET's LINQ Enumerable. - It allows row-by-row or batch execution of relational queries.
- It is fully in-memory, meaning it operates on Java collections instead of a database.
Enumerable<Object[]> result = EnumerableInterpretable.toEnumerable(optimizedPlan);
for (Object[] row : result) {
System.out.println(Arrays.toString(row));
}
- Here,
result
is anEnumerable
that executes the query in memory, without a database. - Each row is represented as an
Object[]
, just like a SQL row.
Calcite translates SQL queries into Java-based Enumerable
expressions, which are then executed dynamically.
1️⃣ SQL Parsing & Logical Plan Generation
- SQL query is converted into a RelNode (Relational Algebra Expression).
- No actual data is accessed yet.
2️⃣ Optimization
- Logical optimizations (e.g., predicate pushdown, projections).
- Conversion to a physical plan using Enumerable operators.
3️⃣ Execution Using EnumerableRel
(Calcite’s Physical Execution Plan)
- The final execution plan consists of Enumerable-based operations (e.g., filtering, sorting, aggregations).
- These operations are executed in-memory using Java.
Unlike traditional databases that execute queries using a disk-based engine, Calcite:
-
Executes directly in memory using Java’s
Enumerable
API. - Processes data using function-style operations (like Java Streams).
-
Enumerable<Object[]>
fetches rows lazily. - This prevents loading the entire dataset into memory at once.
- You don’t need a database engine—Calcite executes queries in-memory.
- Works well for structured data in JSON, CSV, or programmatically defined tables.
Consider this SQL query:
SELECT name, salary FROM employees WHERE salary > 50000;
Calcite first parses and optimizes it into a relational algebra expression (RelNode
):
LogicalProject(name, salary)
LogicalFilter(condition=[salary > 50000])
LogicalTableScan(table=[employees])
No data is accessed yet—Calcite is still building the execution plan.
Calcite converts it into an EnumerableRel
physical plan, which uses Java iterators:
EnumerableProject(name, salary)
EnumerableFilter(condition=[salary > 50000])
EnumerableTableScan(table=[employees])
Here, EnumerableFilter
and EnumerableTableScan
process rows in-memory.
Enumerable<Object[]> result = EnumerableInterpretable.toEnumerable(optimizedPlan);
for (Object[] row : result) {
System.out.println(Arrays.toString(row));
}
This code lazily fetches rows and applies the WHERE
and SELECT
filters dynamically.
Feature | Calcite Enumerable Runtime | Traditional Database (PostgreSQL, MySQL) |
---|---|---|
Execution Model | Java-based in-memory execution | Disk-based execution engine |
Optimization | Rule-based optimization (RBO) | Cost-based optimization (CBO) |
Data Access | Operates on Java collections, in-memory tables | Reads from disk-based tables |
Laziness | Fetches rows lazily (Enumerable.next() ) |
Uses disk buffers, indexes, or materialized execution |
Use Case | Query execution on CSV, JSON, in-memory collections | Traditional relational databases |
Best for:
- In-memory analytics on structured data (CSV, JSON, Parquet, Java collections).
- Embedding SQL-like query capabilities inside Java applications.
- Querying without a database (Calcite acts as a SQL engine).
Not ideal for:
- Very large datasets that don’t fit in memory (because
Enumerable
is in-memory). - Transactional workloads (Calcite is not an OLTP database).
✅ Calcite's runtime is based on Enumerable
, meaning it executes queries in-memory using Java.
✅ It allows SQL-like operations without a database, using function-style execution.
✅ This makes it ideal for embedded analytics, querying in-memory structures, and working with non-database sources.
✅ It does not store data—it only processes structured data when needed.