6.2. Reaching your performance goals

6.2. Reaching your performance goals

We have designed the default settings to work on as many environments as possible, to allow you to test the fitness of QuasarDB for your needs quickly. To reach peak performance, you may need to review your system architecture, deamon configuration, and API usage. A worthwhile investment: this review can yield one to two orders of magnitude performance improvement. In this section, we will go through all the essential parts.

6.2.1. Optimizing your data model

Choosing the correct data model for your use case is critical in getting high performance. This section guides you through tuning your data model for high performance.

6.2.1.1. Partition key

QuasarDB embraces a model where you create many smaller tables. This has several advantages:

  • Higher write performance

    • Writes can be parallelized to multiple tables

    • Writes have less contention

  • Reads can be parallelized

  • Improved compression

As such, the first step is to identify a key you can use to partition your data. A few example scenarios:

  • For a finance use case that ingests ticker data from many stocks, partitioning the data by stock id works best;

  • For an industrial / IoT use case that collects data from many sensors, partitioning the data by sensor_id and/or facility_id works best.

This may very well mean you end up with many different tables: more than 100k is not uncommon, and actually a good thing.

6.2.1.2. Shard size

QuasarDB divides all data data stored in its tables into shards: for more information about shards, see the documentation about shards.

Choosing the optimal shard size is instrumental to reaching your performance goals: ingestion prefers smaller shard sizes, where queries prefer larger shard sizes:

  • Every insert into an existing shard triggers an “incremental insert”: the entire shard is re-sorted and re-written to disk. As such, too large a shard size causes many (large) incremental inserts into the same shard, and as such creates a lot of write amplification;

  • Every query processes each shard individually, so having having too small a shard size can create a lot of overhead.

As a rule of thumb, aim for between 50,000 and 500,000 rows per shard, with some notable exceptions:

  • If you do periodic batch inserts (for example, load data once a day), you will not have any issues with incremental inserts / write amplification. As such, it is possible (and often beneficial) to choose a much larger shard size;

  • If you do many small, incremental inserts (for example, write into the same table every 3s), you will want to reduce the size of your shards even further to reduce write amplification.

6.2.1.3. Column types

Variable-length data such as strings and blob should be avoided where possible:

  • They cause a significant performance degradation in query processing (at least one order of magnitude):

    • GROUP BY performance is significantly impacted;

    • WHERE or PREWHERE performance is significantly impacted.

  • They take up more disk space, and are more difficult to compress;

  • They take up more memory for client APIs:

    • It is not uncommon for even small strings to consume 10x as much memory than integers or doubles;

    • They are a frequent cause of memory fragmentation.

If your dataset contains strings that frequently repeat (for example, textual identifiers, names / locations, etc), consider using a SYMBOL column type instead: symbol columns map each identifier to a unique integer, and store these integers instead.

6.2.2. Ingestion

As mentioned above, shard size and table partitioning are important factors in your ingestion performance, especially when you have many small incremental inserts. In addition to this, there are some common techniques to improve ingestion performance.

6.2.2.1. CSV, TSV

When your source data is in the CSV or TSV format, consider using qdb_import to ingest the data: it enables you to ingest these formats in the fastest way possible.

While using this tool, keep the following in mind:

  • Do not run qdb_import on the same machine as you’re running the quasardb daemon on: it causes qdb_import and qdbd to compete for memory, and leads to severely degraded performance;

  • Ensure your source files are sorted by timestamp: unsorted input files cause a significant performance impact.

  • If sorted input files are not feasible, use either of these two options:

    • --async insertion mode, which is an asynchronous insertion mode that buffers data in the QuasarDB daemon’s async pipelines;

    • --jobs 1, which ensures single-threaded insertion mode. You can then invoke qdb_import on multiple files simultaneously to ensure parallelism.

6.2.2.2. APIs

When ingesting data using the regular APIs, do not use SQL INSERT statements, but make use of the batch inserter instead: this ensures data ingestion in the most efficient way possible.

When using the batch inserter, consider the following:

  • When loading data in batch:

    • Use the FAST insertion mode: this avoids expensive Copy-on-Write and uses in-place writes instead, and significantly reduces write amplification and lock contention;

    • Split your data in smaller chunks that aligns with your shard size: for example, if you’re using a 1h shard size, push separate batches for each hour of the day. This enables smaller inserts, which reduces lock contention on the server;

    • Increase the timeout:

      • Large batches can sometimes take a while to process, which may be longer than the default timeout of 60 seconds;

      • In case of such a timeout, the data is often still written to disk, causing additional write amplification, which increases the chance of additional timeouts.

    • Enable deduplication: when ingesting data, retries are often built in various places in the data pipeline, which can cause duplicates to be inserted. This can, in turn, cause write amplification, which can be avoided when using automatic deduplication.

  • When streaming data:

    • Use the ASYNC insertion mode: this buffers data in the async pipelines in the QuasarDB daemon, and will reduce write amplification;

    • Use small batches of data: flushing once every 3 seconds will yield much better performance than flushing every 100 milliseconds.

6.2.3. Retrieval & Queries

As mentioned earlier, table partitioning, shard size and column types are very important for query performance. This section describes some common techniques on how to improve your query performance.

6.2.3.1. Bulk retrieval

Bulk retrieval, e.g. loading / exporting all data from a certain table, is best done with either of these tools:

  • Using qdb_export to export the data to CSV or TSV is the fastest way; it features a highly optimized engine that is able to export a time range from a single table to a local file;

  • For APIs, make use of the bulk reader, which provides streaming access to the data in QuasarDB. Please consult the language tutorials for more information on how to use this:

6.2.3.2. Queries

Use these techniques to optimize your SQL queries:

  • Offload computation to QuasarDB:

    • QuasarDB excels in slicing & dicing large datasets into smaller buckets, while doing the “last mile” of computation in language APIs;

    • For example, when training models over years of data, pre-aggregate the data per hour, and do the model inference in your language API.

  • Avoid large data transfers in a single query:

    • Split a single query over a large time range up into smaller time ranges if possible;

    • Split a multi-table queries up into multiple single-table queries.

  • Use native query parallelism to speed up multi-table queries:

    • Native, high-performance query parallelism provided by the QuasarDB APIs;

    • Parallelism only works on queries with multiple tables at this moment.

6.2.4. Server tuning

Properly tuning your server for QuasarDB and vice versa is essential for getting great performance. In this section, we guide you through various aspects and how to tune them.

6.2.4.1. Storage

The following tips will help you make sure your storage is properly tuned for QuasarDB:

  • QuasarDB needs write throughput more than it needs IOPS; ensure you properly tune your storage for this, depending on your deployment environment:

    • Amazon Web Services:

      • We recommend using GP3 with 1000MB/sec write throughput;

      • Make sure your selected instance type has enough write bandwidth as well; for example, an m6a.8xlarge instance has only 825MB/sec of EBS bandwidth, so you will never get more than that.

    • Microsoft Azure:

      • We recommend premium SSD;

      • If possible, enable instance-local caching.

    • On-premise:

      • We recommend NVMe SSD if the budget allows for it;

      • Alternatively, pool multiple SATA SSDs in a raid array.

  • Pool multiple disks into a single raid array:

    • A single SATA channel only allows up to 600MB/sec, iSCSI disks in the cloud have comparable limitations;

    • To mitigate these limitations, use software raid (e.g. mdraid on Linux) so that the maximum available bandwidth becomes the sum of bandwidth of all disks together:

    • For more information, follow this tutorial to set up software RAID on Linux

6.2.4.2. Memory

Memory has a significant impact on performance: ideally, you want your working set to fit in RAM, and as such make sure your cluster remains fast and responsive.

6.2.4.2.1. Limiter

QuasarDB’s memory allocator has a soft and a hard limit:

  • QuasarDB will attemp to maintain memory limit around the soft limit: upon reaching the soft limit, any time a new block of data is cached, an old block is purged from the cache;

  • When the hard limit is reached, a full purge of all caches occurs: all entries are removed from memory, and is considered a fail-stop last resort to avoid crossing the upper memory limits.

Under normal operation, it’s normal that the soft limit is reached, and you’ll see messages such as these in your logs:

2022.08.16T09.58.19.498135897   34029   34097     warning       memory usage is 62.163 GiB, greater than the soft limit of 61.959 GiB (233,343 entries in memory - 44.355 GiB approximate usage)

In this example, QuasarDB is maintaining the soft limit, and 44.355GB of data blocks are in memory. This is normal, desired behavior.

Undesired, however, is when the hard limit is reached: this means that memory is consumed faster than QuasarDB can deallocate, and is a strong indication that your cluster is under too much memory pressure. As a last restort, all caches will be purged:

2022.08.16T13.24.06.241534838   34029   34097     warning       memory usage is 111.554 GiB, greater than the hard limit of 111.526 GiB (574,625 entries in memory - 101.239 GiB approximate usage)
2022.08.16T13.24.06.241541846   34029   34097     warning       tidying memory
2022.08.16T13.24.06.241542407   34029   34097     warning       purging allocators
2022.08.16T13.24.06.663456973   34029   34097      info         memory usage is now 101.485 GiB - 10.069 GiB collected
2022.08.16T13.24.06.663458563   34029   34097     warning       tidying memory was not sufficient - evicting everything
2022.08.16T13.24.06.669219834   34029   34097      info         clearing all caches
2022.08.16T13.24.13.593800424   34029   34097      info         1542187 entries and 101.391 GiB bytes removed from memory
2022.08.16T13.24.13.593841338   34029   34097      info         removed all entries from memory - mem usage 27.349 GiB - delta 74.137 GiB

In this example, you can see that QuasarDB reached the hard limit, and eventually ends up purging 101GB of memory. All caches will have to be re-filled after this.

6.2.4.2.2. Usage

There are various components of QuasarDB that make up the bulk of its memory usage, which you can tune to control the memory usage:

  • Async pipelines: server-side in-memory staging area used to buffer writes before being written to disk, and only used for ASYNC batch writer push modes. The server configuration variables that control these are local.depot.async_ts.pipelines and local.depot.async_ts.pipeline_buffer_size: the pipelines will use (up to) pipelines * pipeline_buffer_size memory. For example, if you have allocated 16 async pipelines of 1GB each, they can account for a total of 16GB memory.

  • Network buffers: server-side buffers allocated to each unique session, and controlled by the server configuration variables local.network.max_in_buffer_size, local.network.max_out_buffer_size. If you have many concurrent sessions all reading/writing large chunks of data to or from the server, they can account for a lot of memory.

  • Compaction: a continuous background process that sorts & cleans up data. If you are using too large a shard size, it will cause very large blocks of data that need to be sorted in-memory, and cause significant memory pressure. A strong indicator for this is if you see warnings in the logs writing a large column of 1,419,264 rows [..] consider using a smaller shard size, and you should use a smaller shard size for that table.

  • Entries cache: blocks of data being cached by QuasarDB, which have a direct impact on the performance of your queries. As mentioned above, QuasarDB will try to keep as many entries in cache as possible, provided that the soft_limit is not reached. Ideally, about half of all memory in use is allocated to these entries cache, otherwise you may want to consider allocating more memory to your cluster instances.