ApacheCalciteDataTypes - eellpp/pubScratchpad GitHub Wiki

Calcite, also known as Apache Calcite, is a dynamic data management framework that provides SQL parsing, query optimization, and execution capabilities. It is widely used in big data systems like Apache Hive, Apache Flink, and Apache Beam. Calcite supports a variety of data types that are commonly used in SQL and relational databases. These data types are categorized into scalar types, complex types, and other specialized types.

Here’s a breakdown of the different data types supported by Calcite:

1. Scalar Data Types

Scalar data types represent single values and are the most commonly used types in SQL queries.

Data Type	Description
`BOOLEAN`	Represents true/false values.
`TINYINT`	A 1-byte signed integer (range: -128 to 127).
`SMALLINT`	A 2-byte signed integer (range: -32,768 to 32,767).
`INTEGER`	A 4-byte signed integer (range: -2,147,483,648 to 2,147,483,647).
`BIGINT`	An 8-byte signed integer (range: -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807).
`DECIMAL(p, s)`	A fixed-precision decimal number with `p` total digits and `s` decimal places.
`FLOAT`	A 4-byte single-precision floating-point number.
`REAL`	Synonym for `FLOAT`.
`DOUBLE`	An 8-byte double-precision floating-point number.
`CHAR(n)`	A fixed-length character string with `n` characters.
`VARCHAR(n)`	A variable-length character string with a maximum length of `n` characters.
`DATE`	Represents a date (year, month, day).
`TIME`	Represents a time of day (hour, minute, second).
`TIMESTAMP`	Represents a date and time (year, month, day, hour, minute, second).
`INTERVAL`	Represents a span of time (e.g., `INTERVAL '1' DAY`).
`BINARY(n)`	A fixed-length binary string with `n` bytes.
`VARBINARY(n)`	A variable-length binary string with a maximum length of `n` bytes.

2. Complex Data Types

Complex data types represent structured or nested data, which are common in modern big data systems.

Data Type	Description
`ARRAY`	An ordered collection of elements of the same type (e.g., `ARRAY<INT>`).
`MAP`	A collection of key-value pairs (e.g., `MAP<STRING, INT>`).
`ROW`	A structured type with named fields (e.g., `ROW<name STRING, age INT>`).
`MULTISET`	An unordered collection of elements that allows duplicates.

3. Other Specialized Data Types

These are less common but useful in specific scenarios.

Data Type	Description
`ANY`	A generic type that can represent any data type.
`NULL`	Represents a null value (unknown or missing data).
`SYMBOL`	Used internally by Calcite for optimization and planning.
`CURSOR`	Represents a result set returned by a query (used in stored procedures).

4. User-Defined Types (UDTs)

Calcite also supports user-defined types, which allow users to define custom data types based on their specific requirements. These types can be scalar or complex.

Example Usage in SQL

Here’s an example of how these data types might be used in a SQL query:

CREATE TABLE employees (
    id BIGINT,
    name VARCHAR(100),
    salary DECIMAL(10, 2),
    hire_date DATE,
    skills ARRAY<VARCHAR(50)>,
    contact_info ROW<email VARCHAR(100), phone VARCHAR(20)>
);

Notes

Calcite’s type system is designed to be extensible, so it can integrate with external systems that have their own custom data types.
The availability of certain data types may depend on the underlying data source (e.g., a database or file system) being queried through Calcite.

In summary, Calcite supports a rich set of data types, ranging from simple scalar types to complex nested types, making it versatile for handling diverse data management tasks.