2. Create table#

2.1. Synopsis#

CREATE TABLE [ IF NOT EXISTS ] <table_name> ( [
  { <column_name> <data_type> }
  [, ... ]
] )
[ SHARD_SIZE [=] <duration> ]
[ WITH TAGS='[' <tag> [, ...] ']' ]
[ TTL <duration> ]

2.2. Description#

CREATE TABLE will create a new table in a QuasarDB cluster with the specified schema. A special timestamp column is automatically created, which is what new data will be indexed on. If table_name already exists, CREATE TABLE will fail, unless IF NOT EXISTS is specified, in which case it will succeed and keep the original table intact.

2.3. Parameters#

table_name

The name of the table to be created. Can be alphanumeric, but is not allowed to start with a number.

column_name

The name of a column to be created in the new table. Can be alphanumeric, but is not allowed to start with a number.

data_type

The data type to be associated with the column. Can be any of INT64, DOUBLE, BLOB, TIMESTAMP, STRING or SYMBOL(<symbol_table_name>).

duration

The size (as time duration) of a single shard (bucket). The syntax is described in section Durations.

[ TTL <duration> ]

In the context of TTL, the <duration> parameter specifies the Time-To-Live for the data in the created table. It represents the duration after which the data will be automatically removed or discarded from the table.

tag

The name of a tag to be attached to the newly created table. Can be alphanumeric, but is not allowed to start with a number.

2.4. Examples#

Create a table with only a single column:

CREATE TABLE example (my_int INT64)

Create a table with only a single symbol column:

CREATE TABLE example (my_int SYMBOL(my_symbol_table))

Create a table with multiple columns:

CREATE TABLE example (my_int INT64, my_double DOUBLE, my_blob BLOB, my_ts TIMESTAMP, my_symbol SYMBOL(my_symtable), my_string STRING)

Create a table with a custom shard size:

CREATE TABLE example (my_double DOUBLE) SHARD_SIZE = 1hour 2min 3s

Creates the table as specified if it does not exists, leaves the existing table untouched if it does exist:

CREATE TABLE IF NOT EXISTS example (my_int INT64)

Create a table with with 3 tags attached:

CREATE TABLE example (my_double DOUBLE) WITH TAGS = ['tag1', 'tag2', 'tag3']

Create a table with a shard size and TTL:

CREATE TABLE example (my_int INT64) SHARD_SIZE = 1min TTL = 1min

2.5. Time to Live (TTL)#

2.5.1. Understanding TTL in Contrast to Traditional Methods#

Currently, data retention periods are often implemented using queries like “DELETE FROM table IN RANGE (1970, now() - 3 months).” However, this practice is less efficient as it leads to a phenomenon known as “write amplification.” This occurs when deleting data triggers more data to be written than actually removed.

2.5.2. Introducing the TTL Function#

The new “TTL” function presents an efficient alternative to traditional data deletion methods. Unlike manual “DELETE FROM” queries, TTL operates in a more intelligent and automated manner.

2.5.3. The “Lazy” TTL Approach#

TTL employs a “lazy” strategy. This means that TTL activates, and data retention is executed when the underlying storage engine performs its “merging” of SST files (see Storage Optimization Techniques). This strategy comes with minimal cost, making it an almost 0-cost solution for data retention.

2.5.4. Benefits of TTL#

  • Automatic Data Retention: TTL offers an efficient and effective way to implement data retention policies. It eliminates the need for manual intervention, ensuring outdated data is automatically removed.

  • Efficiency: By integrating with the compaction process (see Storage Optimization Techniques), TTL optimizes storage utilization without impacting performance.

2.6. TTL Usage Guide#

2.6.1. Configuring TTL During Table Creation#

To implement TTL during table creation, use the following syntax:

CREATE TABLE example (my_int INT64) SHARD_SIZE = 1min TTL = [TTL time range];

Replace [TTL time range] with your desired data retention interval, such as a day, week, or month

2.6.2. Modifying TTL with ALTER TABLE#

For existing tables, adjust TTL using the ALTER TABLE command:

ALTER TABLE [table name] SET TTL [TTL time range];

This flexibility allows you to adapt data retention settings as your needs evolve.

2.7. Time to Live (TTL) Behavior and Mechanisms#

2.7.1. Flexible TTL Behavior#

When a TTL is set for data, it signifies automatic deletion after the specified time has elapsed. Nonetheless, it’s crucial to grasp that TTL doesn’t assure immediate data removal upon expiration (see Storage Optimization Techniques).

Furthermore, it’s important to comprehend that TTL operates with flexibility, not rigid adherence to a fixed schedule. During data compaction, a slight delay might occur, causing the process to extend beyond the initially defined TTL period.

When immediate TTL enforcement is necessary, there are two approaches available:

  • Comprehensive Cluster Compaction: Execute cluster_compact full via the shell to initiate a thorough cluster compaction. While this action efficiently rewrites all data for notable storage gains, it does exert a considerable load on the cluster. Depending on dataset size, this process may span multiple days.

  • Manual DELETE FROM: Opt for manual intervention by triggering a DELETE FROM query on the table within a specific timeframe.

2.7.2. TTL Clearing Timing#

With TTL, data is efficiently cleared based on the defined time range. For instance, setting a TTL of 1 minute leads to data being automatically cleared after approximately 30 seconds.

2.7.3. Understanding TTL Conditions#

TTL-driven deletion occurs when new data is added within the set timeframe. Without new additions, existing data remains unchanged after the TTL period, applicable to the entire database, not just specific tables.

Note:

For further understanding, refer to Storage Optimization Techniques