ApacheCalcite contains many of the pieces that comprise a typical database management system, except :

no storage of data,
no algorithms to process data,
no repository for storing metadata.

Tutorial : https://calcite.apache.org/docs/tutorial.html

Calcite can handle various data formats.

in memory
JDBC
custom adapters can be added for other data formats. An adapter creates tables from data sources

Calcite provides optimisers for better query plan.
You can write optimizer rules allow you to register new operators (such as a better join algorithm), and allow Calcite to optimize how queries are translated to operators. Calcite will combine your rules and operators with built-in rules and operators, apply cost-based optimization, and generate an efficient plan.

Properties

Properties info = new Properties();
info.setProperty("lex", "JAVA");

This line sets a property in the Properties object. The key is "lex", and the value is "JAVA".

In the context of Apache Calcite, the lex property is used to specify the lexical policy, which determines how identifiers (like table names, column names, etc.) are quoted and interpreted. Setting lex to "JAVA" means that Calcite will use Java-like rules for parsing identifiers (e.g., case-sensitive identifiers).

In Apache Calcite, the Properties object is used to configure various aspects of the SQL parser and other components. The lex property is one of the most commonly used properties, but there are several other properties that can be set to customize the behavior of Calcite. Here are some of the common properties:

caseSensitive:
- Determines whether identifiers are case-sensitive.
- Values: true or false.
- Example: info.setProperty("caseSensitive", "true");
quoting:
- Specifies the quoting style for identifiers.
- Values: DOUBLE_QUOTE, BACK_QUOTE, BRACKET, etc.
- Example: info.setProperty("quoting", "DOUBLE_QUOTE");
unquotedCasing:
- Determines how unquoted identifiers are treated in terms of casing.
- Values: TO_UPPER, TO_LOWER, UNCHANGED.
- Example: info.setProperty("unquotedCasing", "TO_UPPER");
quotedCasing:
- Determines how quoted identifiers are treated in terms of casing.
- Values: TO_UPPER, TO_LOWER, UNCHANGED.
- Example: info.setProperty("quotedCasing", "UNCHANGED");
conformance:
- Specifies the SQL conformance level (e.g., SQL standard compliance).
- Values: DEFAULT, STRICT_92, STRICT_99, PRAGMATIC_99, ORACLE_10, SQL_SERVER_2008, etc.
- Example: info.setProperty("conformance", "STRICT_99");
parserFactory:
- Specifies the factory class for creating the SQL parser.
- Values: Fully qualified class name of the parser factory.
- Example: info.setProperty("parserFactory", "org.apache.calcite.sql.parser.impl.SqlParserImplFactory");
fun:
- Specifies the initial catalog to use.
- Values: Catalog name.
- Example: info.setProperty("fun", "my_catalog");
model:
- Specifies the path to the model file (JSON or YAML) that defines the schema.
- Values: Path to the model file.
- Example: info.setProperty("model", "path/to/model.json");
typeSystem:
- Specifies the type system to use.
- Values: Fully qualified class name of the type system.
- Example: info.setProperty("typeSystem", "org.apache.calcite.rel.type.DelegatingTypeSystem");
timeZone:
- Specifies the time zone to use for date/time operations.
- Values: Time zone ID (e.g., "UTC", "America/New_York").
- Example: info.setProperty("timeZone", "UTC");
locale:
- Specifies the locale to use for string comparisons and other locale-sensitive operations.
- Values: Locale string (e.g., "en_US", "fr_FR").
- Example: info.setProperty("locale", "en_US");
defaultNullCollation:
- Specifies how NULL values are sorted in ORDER BY clauses.
- Values: FIRST, LAST, HIGH, LOW.
- Example: info.setProperty("defaultNullCollation", "LAST");
forceDecorrelate:
- Forces decorrelation of subqueries.
- Values: true or false.
- Example: info.setProperty("forceDecorrelate", "true");
materializationsEnabled:
- Enables or disables the use of materialized views.
- Values: true or false.
- Example: info.setProperty("materializationsEnabled", "true");
createMaterializations:
- Allows the creation of materialized views.
- Values: true or false.
- Example: info.setProperty("createMaterializations", "true");

These properties can be set in the Properties object passed to Calcite's configuration methods, such as when creating a SqlParser or FrameworkConfig. The exact set of properties you need to set will depend on your specific use case and requirements.

Adapters

Apache Calcite provides ready-made adapters to integrate with third-party data sources, including Cassandra, Elasticsearch, MongoDB, to name a few

The Apache Calcite adapters provide classes, such as ElasticsearchSchemaFactory, MongoSchemaFactory, FileSchemaFactory, implementing the interface SchemaFactory. The SchemaFactory helps connect the underlying data sources in a unified manner by creating a virtual Schema defined in a JSON/YAML model file.

Custom Adapters

Calcite allows you to create custom table adapters to read data from various sources. You can implement a custom adapter that reads CSV files from S3 using the AWS SDK for Java.

Steps: Use the AWS SDK for Java (aws-java-sdk-s3) to interact with S3 and read the CSV file. Implement a TableFactory or Table interface in Calcite to expose the CSV data as a table. Parse the CSV file and convert it into Calcite's relational data model (e.g., Enumerable or ScannableTable).

Relational data model

Apache Calcite's relational data model is at the core of its functionality, enabling it to process SQL queries and perform optimizations. The relational data model in Calcite is based on the principles of relational algebra, which provides a mathematical foundation for representing and manipulating data in tables (relations). Here's a detailed explanation of Calcite's relational data model:

1. Core Concepts

Calcite's relational data model revolves around the following key concepts:

a. Relational Algebra

Relational algebra is a formal system for manipulating relations (tables). It consists of operations like:
- Selection (σ): Filter rows based on a condition.
- Projection (π): Select specific columns.
- Join (⨝): Combine rows from two tables based on a condition.
- Union (∪), Intersection (∩), Difference (-): Set operations.
- Aggregation: Compute summaries (e.g., SUM, COUNT).
Calcite translates SQL queries into relational algebra expressions, which are then optimized and executed.

b. Relational Operators

Calcite represents relational algebra operations as relational operators (e.g., Filter, Project, Join, Aggregate).
These operators form a tree-like structure called a relational expression tree, which represents the logical plan of a query.

c. Relational Data Types

Calcite supports standard SQL data types, such as:
- Scalar types: INTEGER, VARCHAR, BOOLEAN, DATE, TIMESTAMP, etc.
- Composite types: ARRAY, MAP, ROW.
These types are defined using the RelDataType interface.

2. Key Components of the Relational Data Model

a. Table (Relation)

A table is a collection of rows and columns, where each row represents a record and each column represents an attribute.
In Calcite, a table is represented by the Table interface.
Tables can be:
- Physical tables (e.g., from a database or file).
- Virtual tables (e.g., views or derived tables).

b. Schema

A schema is a container for tables, views, and other database objects.
In Calcite, a schema is represented by the Schema interface.
Schemas can be nested, allowing for hierarchical organization of tables.

c. Relational Expression (RelNode)

A relational expression (RelNode) represents an operation in relational algebra.
Examples:
- TableScan: Reads data from a table.
- Filter: Applies a condition to filter rows.
- Project: Selects specific columns.
- Join: Combines rows from two tables.
- Aggregate: Computes summaries (e.g., SUM, COUNT).
These expressions form a tree structure, representing the logical plan of a query.

d. Relational Data Type (RelDataType)

Defines the structure of a table or expression, including column names and types.

Example: setting the datatype for a row in table

RelDataTypeFactory typeFactory = new SqlTypeFactoryImpl(RelDataTypeSystem.DEFAULT);
RelDataType rowType = typeFactory.builder()
    .add("id", SqlTypeName.INTEGER)
    .add("name", SqlTypeName.VARCHAR)
    .add("age", SqlTypeName.INTEGER)
    .build();

e. Relational Trait (Trait)

Traits represent physical properties of relational expressions, such as:
- Convention: The calling convention (e.g., Enumerable, Jdbc).
- Collation: The sort order of rows.
- Distribution: How data is distributed across nodes (e.g., in a distributed system).
Traits are used during query optimization to ensure the plan meets certain physical requirements.

3. Query Processing in Calcite

a. Logical Plan

When a SQL query is parsed, Calcite converts it into a logical plan (a tree of RelNode expressions).

Example:

SELECT name, age FROM users WHERE age > 30;

Logical plan:

Project(name, age)
  Filter(age > 30)
    TableScan(users)

b. Optimization

Calcite applies a series of optimization rules to the logical plan to improve performance.
Examples of optimizations:
- Predicate pushdown: Push filters closer to the data source.
- Join reordering: Reorder joins to minimize intermediate results.
- Projection pruning: Remove unnecessary columns.

c. Physical Plan

After optimization, the logical plan is converted into a physical plan by applying traits (e.g., Enumerable, Jdbc).
The physical plan is executable and can be translated into code (e.g., Java, SQL).

d. Execution

The physical plan is executed to produce the query results.
Calcite supports multiple execution engines, including:
- Enumerable: Executes plans in memory using Java iterators.
- Jdbc: Executes plans using JDBC connections to databases.
- Custom adapters: Execute plans using custom data sources (e.g., CSV, S3).

4. Example: Relational Data Model in Action

a. Define a Schema

SchemaPlus rootSchema = Frameworks.createRootSchema(true);
rootSchema.add("users", new AbstractTable() {
    @Override
    public RelDataType getRowType(RelDataTypeFactory typeFactory) {
        return typeFactory.builder()
            .add("id", SqlTypeName.INTEGER)
            .add("name", SqlTypeName.VARCHAR)
            .add("age", SqlTypeName.INTEGER)
            .build();
    }
});

b. Parse and Execute a Query

FrameworkConfig config = Frameworks.newConfigBuilder()
    .defaultSchema(rootSchema)
    .build();
Planner planner = Frameworks.getPlanner(config);
SqlNode sqlNode = planner.parse("SELECT name, age FROM users WHERE age > 30");
SqlNode validatedNode = planner.validate(sqlNode);
RelNode logicalPlan = planner.rel(validatedNode).project();

c. Optimize and Execute

RelOptPlanner optPlanner = logicalPlan.getCluster().getPlanner();
RelNode optimizedPlan = optPlanner.findBestExp();
Enumerable<Object[]> result = EnumerableInterpretable.toEnumerable(optimizedPlan);
for (Object[] row : result) {
    System.out.println(Arrays.toString(row));
}

The line:

Enumerable<Object[]> result = EnumerableInterpretable.toEnumerable(optimizedPlan);

is performing the final step of query execution in Apache Calcite, where a relational algebra plan (optimizedPlan) is being converted into an executable enumerable result set.

Breakdown of What's Happening:

optimizedPlan is a RelNode (Relational Expression)
- Before this step, a SQL query is parsed, validated, and converted into a relational expression (RelNode).
- The RelNode (optimizedPlan) has already been optimized by Calcite’s query optimizer.
EnumerableInterpretable.toEnumerable(...) Interprets the Relational Plan
- EnumerableInterpretable is a Calcite utility that converts a logical relational expression (RelNode) into an executable Enumerable<Object[]>.
- This means Calcite is interpreting the query plan rather than compiling it into a more optimized execution plan.
Enumerable<Object[]> is the Final Executable Result Set
- Enumerable (from Calcite’s LINQ-like API) represents an iterable sequence of results.
- Each row in the result set is represented as an Object[] (an array of objects), where:
  - Each element in the Object[] array corresponds to a column value in that row.

Equivalent Execution Flow:

If we compare this to a traditional database:

SQL Query → Logical Plan (RelNode) → Optimized Plan (RelNode) → Execution Plan (Enumerable<Object[]>) → Results.

Example:

Let's say we have a table employees with columns (id, name, salary). If we execute:

SELECT id, name FROM employees;

The following happens in Calcite:

SQL is parsed → Converted into a SqlNode.
SQL is validated → Checked against schema definitions.
SQL is converted into a relational plan (RelNode).
Optimization is applied (logical to physical plan transformation).
Execution happens via EnumerableInterpretable.toEnumerable(optimizedPlan), returning:

Enumerable<Object[]> result = EnumerableInterpretable.toEnumerable(optimizedPlan);

The resulting Enumerable<Object[]> could produce rows like:

[ [1, "Alice"], [2, "Bob"], [3, "Charlie"] ]

When to Use `EnumerableInterpretable.toEnumerable(...)`?

When you want to execute an optimized relational algebra plan without compiling it into Java bytecode.
Useful in scenarios like interactive query execution where an immediate result is needed without JIT (Just-In-Time) compilation overhead.

Alternative:

For better performance, Calcite can compile queries into bytecode instead of interpreting them, using EnumerableCompiler. But for many use cases, EnumerableInterpretable is sufficient.

5. Advantages of Calcite's Relational Data Model

Flexibility: Can represent data from diverse sources (e.g., databases, files, streams).
Extensibility: Supports custom operators, data types, and optimizations.
Optimization: Provides a rich set of optimization rules for efficient query execution.
Interoperability: Integrates with multiple execution engines and data sources.

Lazy Data Fetch

Calcite accesses physical data only at execution time, specifically when iterating over the Enumerable<Object[]>. This means that data access happens lazily, not during query compilation or optimization.

Step-by-Step Breakdown

SQL Parsing, Validation, and Conversion to Relational Plan (RelNode)
- No data access occurs at this stage.
- The SQL is transformed into an abstract relational algebra representation.
Optimization (Logical & Physical Plan Generation)
- Still no actual data access.
- The optimizer applies rule-based and cost-based transformations.
Execution: Generating Enumerable<Object[]>
- If using EnumerableInterpretable.toEnumerable(), execution is still lazy.
- The query is converted into an iterable execution plan but doesn’t fetch data yet.
Accessing Data During Enumeration (for-each loop)
- Physical data is accessed when iteration starts:
```
for (Object[] row : compiledQuery) { 
    System.out.println(Arrays.toString(row)); 
}
```
- This triggers the underlying data access layer (e.g., JDBC, file system, in-memory storage).
- Data is read row-by-row, processed, and returned.

When Exactly Does Calcite Access Data?

✅ Only when next() is called on the Enumerable<Object[]> iterator.

This means data is retrieved on demand, not all at once.

Example: Lazy Data Fetching

If you have a large dataset, Calcite does not load all records into memory. Instead, it processes them as needed:

Enumerable<Object[]> result = EnumerableInterpretable.toEnumerable(optimizedPlan);
Iterator<Object[]> iterator = result.iterator();

while (iterator.hasNext()) {  // Data is accessed lazily here
    Object[] row = iterator.next();
    System.out.println(Arrays.toString(row));
}

Key Takeaways

🔹 Data is not accessed at query compilation time.
🔹 Data access begins when you iterate over Enumerable<Object[]>.
🔹 Execution is lazy, fetching only the required rows on demand.
🔹 This minimizes memory usage for large datasets.

ApacheCalcite - eellpp/pubScratchpad GitHub Wiki

Properties

Adapters

Custom Adapters

Relational data model

1. Core Concepts

a. Relational Algebra

b. Relational Operators

c. Relational Data Types

2. Key Components of the Relational Data Model

a. Table (Relation)

b. Schema

c. Relational Expression (RelNode)

d. Relational Data Type (RelDataType)

e. Relational Trait (Trait)

3. Query Processing in Calcite

a. Logical Plan

b. Optimization

c. Physical Plan

d. Execution

4. Example: Relational Data Model in Action

a. Define a Schema

b. Parse and Execute a Query

c. Optimize and Execute

Breakdown of What's Happening:

Equivalent Execution Flow:

Example:

When to Use `EnumerableInterpretable.toEnumerable(...)`?

Alternative:

5. Advantages of Calcite's Relational Data Model

Lazy Data Fetch

Step-by-Step Breakdown

When Exactly Does Calcite Access Data?

Example: Lazy Data Fetching

Key Takeaways

⚠️ GitHub.com Fallback ⚠️

ApacheCalcite - eellpp/pubScratchpad GitHub Wiki

Properties

Adapters

Custom Adapters

Relational data model

1. Core Concepts

a. Relational Algebra

b. Relational Operators

c. Relational Data Types

2. Key Components of the Relational Data Model

a. Table (Relation)

b. Schema

c. Relational Expression (RelNode)

d. Relational Data Type (RelDataType)

e. Relational Trait (Trait)

3. Query Processing in Calcite

a. Logical Plan

b. Optimization

c. Physical Plan

d. Execution

4. Example: Relational Data Model in Action

a. Define a Schema

b. Parse and Execute a Query

c. Optimize and Execute

Breakdown of What's Happening:

Equivalent Execution Flow:

Example:

When to Use EnumerableInterpretable.toEnumerable(...)?

Alternative:

5. Advantages of Calcite's Relational Data Model

Lazy Data Fetch

Step-by-Step Breakdown

When Exactly Does Calcite Access Data?

Example: Lazy Data Fetching

Key Takeaways

⚠️ **GitHub.com Fallback** ⚠️

When to Use `EnumerableInterpretable.toEnumerable(...)`?

⚠️ GitHub.com Fallback ⚠️