We have designed the default settings to work on as many environments as possible, to allow you to test the fitness of QuasarDB for your needs quickly. To reach peak performance, you may need to review your system architecture, deamon configuration, and API usage. A worthwhile investment: this review can yield one to two orders of magnitude performance improvement. In this section, we will go through all the essential parts.
In general, it is advised to ensure that the client and the server reside on two separate machines. Client and server may indeed fight for resources.
The rest server should not be installed on the same machine as the QuasarDB to ensure the daemon has all memory available for caching.
QuasarDB’s write and read speed are limited by the underlying storage technology that you choose.
Storage can be physically connected to the computer, a remote attached file system, or object storage.
In theory, a physical connection is the fastest, then remote attached, then object storage. However, this much depends on the underlying media.
In the cloud, it is even made more complicated by I/O rate limiting, which may be a function of your instance type.
That is why it is highly recommended that you first benchmark your storage and ensure the raw speed is sufficient for your needs before doing any QuasarDB tuning.
Memory has a significant impact on performance. Ideally, you want your working set to fit in RAM. Since QuasarDB is distributed, that doesn’t mean you need a machine with a large amount of RAM: the load will be distributed across the cluster.
For more information about daemon configuration, see daemon configuration.
By default, QuasarDB will use half of the system memory for caching. You can change this value in the limiter section of the configuration.
The default persistence layer is RocksDB. Several parameters have a massive impact on performance:
disable_waloption: If set to true, the WAL will be in-memory only, significantly increasing the capacity of a node to absorb write spikes, at the cost of durability.
threadsoption: Increasing the number of threads can speed-up RocksDB compaction
Persistence cache: You can use a local disk as a cache of remote disk to speed up reads.
The caching size of RocksDB has a limited impact on performance. That is because the QuasarDB cache that sits on top of RocksDB significantly reduces its importance.
Traditionally, in a Relational DataBase Management System (RDBMS), you would want to limit the number of tables in the database.
However, QuasarDB has unique mechanisms that enable you to have thousands of tables.
Higher write performance
Higher read performance
Higher ASOF join performance
Faster WHERE clause
Potentially better compression
Before 3.10, slower to read all the data from many tables than from a single table.
Some joins may be slower
For more information, see managing tables.
When possible, it is recommended to use integers or floats for data types. Compared to strings or blobs, these types are:
More compact: they use less memory and disk space
Have dedicated, high-performance compression algorithms
Have optimized SIMD aggregation functions
A faster scan speed which means faster WHERE clause execution
Starting with QuasarDB 3.10, QuasarDB supports a dynamic symbol table that is an excellent replacement for strings and blobs without the hassle of manually handling a mapping table.
When loading a CSV (or a TSV) into QuasarDB, we strongly recommend using Railgun.
The bottleneck is reading the file, not writing it into QuasarDB. Railgun has been heavily optimized to deliver optimal performance.
While Railgun can load unsorted files, it greatly increases performance if the file is sorted by the primary timestamp.
An experimental version of Railgun for parquet files is available upon request.
For more information, see the railgun documentation.
The Batch API solves a lot of headaches related to efficiently loading data into QuasarDB and is the best choice when writing custom loaders.
When writing to multiple tables at once, using the “pin column” API is recommended for optimal performance.
When you run a query, QuasarDB splits it into sub-queries and run it on all relevant nodes of the cluster. This process is transparent.
If you are not satisfied with the speed at which your query is running, you need to find out where you spend most of your time.
Is it in the client or on the server? If it is on the server, where is it?
Enabling performance profiling in the server and the shell will give you a comprehensive overview.
If you have identified the server’s bottleneck, go through or list recommendations for server processing.
If the bottleneck is on the client-side, ensure custom code is not the cause.
Your solutions architect will be happy to help you debug your performance problem.
Avoid creating and destroying handles. Keep your handles long-lived. Creating a handle and establishing a connection to the cluster is a very expensive operation.
By default, the client API uses a buffer size of 128 MiB. Increasing the buffer size can improve performance if you intend to have large replies.
Starting with 3.10, the client API can dynamically multi-thread sub queries to leverage the parallel architecture of QuasarDB.
Before that, each sub-query would be run in sequence on a single thread. That could result in a very long-running query despite a low CPU usage on the cluster.