1.6. Symbol columns#

1.6.1. Definition#

Symbol Columns are a specialized data type in our database system that offers unique advantages for efficient data storage, processing, and query optimization. These columns are designed to map strings to integers under the hood, combining the space-saving benefits of integer storage with the flexibility of string-based data manipulation.

1.6.2. Purpose#

  1. Space Efficiency: Symbol Columns store data as integers, saving storage space and making your database more efficient.

  2. Faster Queries: These columns enable faster query execution, especially when used in WHERE clauses and GROUP BY operations.

  3. String-Like Interaction: Despite the internal storage as integers, Symbol Columns ensure that queries and API interactions treat these values as strings, simplifying data access and manipulation.

  4. Versatile Usage: Symbol Columns are designed for scenarios where datasets contain repeating values, making them ideal for various use cases.

1.6.3. Usage#

While Symbol Columns offer substantial benefits, consider the following when implementing them:

  • If (string) values are reused / recurring within your data set, it’s typically a good candidate for symbol columns;

  • If string values are mostly unique within your data set (e.g. a unique id), it’s not a good candidate for symbol columns.

1.6.4. Use Case#

1.6.4.1. Finance and Orderbooks#

Symbol Columns are highly valuable for scenarios involving finance and orderbooks, such as tracking “order types” (e.g., BUY or SELL). You can efficiently store and query these types with Symbol Columns.

Creating a table with a Symbol Column for order type
CREATE TABLE Orders (
order_id INT64,
order_type SYMBOL(my_order_type),
...
);

1.6.4.2. State Representation#

Symbol Columns excel in representing the “state” of entities, processes, or machines. They simplify the storage and retrieval of state information.

Storing and querying the state of machines
CREATE TABLE Machines (
machine_id INT64,
machine_state SYMBOL(my_machine_state),
...
);

1.6.4.3. Identifiers#

Symbol Columns efficiently manage various identifiers, such as country code. By mapping country code to integers, you can optimize storage and accelerate customer-related queries.

Using a Symbol Column for country code.
CREATE TABLE Customers (
customer_id INT64,
customer_name STRING,
country_code SYMBOL(country_code),
...
);

Note

You can name Symbol Columns to make your data more understandable in your database, and you can reuse names within the same table. However, these names are not shared between different tables.