1.4. Shards#
1.4.1. Purpose#
QuasarDB organizes its data into tables, and logically partitions tables into shards. Each shard has a fixed size, and covers a unique time range within a table: that is, shards cannot overlap, and are part of a table’s schema.
Each shard is assigned to one or more nodes in a cluster, depending upon your configuration replication level.
Shards enable indexing of data inside a table based on the timestamp. Upon querying, the QuasarDB client can determine which shards to retrieve in constant time, and contact the nodes accordingly.
Shard-based indexing operates in constant-time and do not consume any memory: whether you’re storing 1 million or 1 billion rows in a table, the performance of these indexes remains constant and do not use any memory.
1.4.2. Example#
Given a table called log
with a shard size of 1h, a client executes the following query:
SELECT COUNT(*) FROM log IN RANGE (2022-01-01, +1d)
Based on the shard size of 1 hour, the client knows which shards it needs in advance: it only needs to query the nodes that have the 24 shards for each hour of January 1st, 2022. It will contact the appropriate node(s) accordingly, and request from them to process the COUNT(*) statement.
1.4.3. Choosing the correct shard size#
Choosing the correct shard size is instrumental to reaching your performance goals: ingestion prefers smaller shard sizes, where queries prefer larger shard sizes.
As a rule of thumb, aim for between 50,000 and 500,000 rows per shard. When unsure, err on the side of smaller shard sizes: larger shard sizes can cause unintended pressure on the storage engine.