quasardb daemon#
Introduction#
The quasardb daemon is a highly scalable data repository that handles requests from multiple clients. The data is cached in memory and persisted on disk. It can be distributed on several servers to form a cluster.
The persistence layer is based on RocksDB (c) RocksDB authors. All rights reserved. The network distribution uses the Chord protocol.
The quasardb daemon does not require privileges (unless listening on a port under 1024) and can be launched from the command line. From this command line it can safely be stopped with CTRL-C. On UNIX, CTRL-Z will also result in the daemon being suspended.
Important
Without a valid license (see License), quasardb will run in “community edition” mode. The community edition is limited to 16 GiB of storage and 4 GiB of RAM per node, with a maximum of two nodes per cluster.
Quick Reference#
Option
Usage
Default
Global
Req. Version
display help
No
display version information
No
generate default config file
No
>=1.1.3
specify config file
No
>=1.1.3
daemonize
No
specify license
qdb_license.txt
No
address to listen on
127.0.0.1:2836
No
max client sessions
20000
No
idle timeout in ms
600,000
No
request timeout in ms
60,000
No
one peer to form a cluster
No
set the node id
generated
No
select the storage engine
rocksdb
Yes
>=3.1.0
persistence directory
./db
Yes
sets the replication factor
1
Yes
rocksdb maximum open files
256
No
>=3.0.0
max bytes in cache (soft)
Automatic
No
>=3.5.0
max bytes in cache (hard)
Automatic
No
>=3.5.0
log in the given directory
No
log on syslog
No
change log level
info
No
change log flush
3
No
Configuration#
Global and local options#
When a node connects to a ring, it will first download the configuration of this ring and overwrite its parameters with the ring’s parameters.
This way, you can be sure that parameters are consistent over all the nodes. This is especially important for parameters such as replication where you need all nodes to agree on a single replication factor.
This is also important for persistance as having a mix of transient and non-transient nodes will result in undefined behaviour and unwanted data loss.
However, not all options are taken from the ring. It makes sense to have a heterogenous logging threshold for example, as you may want to analyze the behaviour of a specific part of your cluster.
In addition, some parameters are node specific, such as the listening address or the node ID.
An option that applies cluster-wide is said to be global whereas other options are said to be local. The value of a global option is set by the first node that creates the ring, all other nodes will copy these parameters. On the other hand, local options are read from the command line as you run the daemon.
Network distribution#
qdbd distribution is peer-to-peer. This means:
The unavailability of one server does not compromise the whole cluster
The memory load is automatically distributed amongst all instances within a cluster
Each server within one cluster needs:
Note
It’s counter-productive to run several instances on the same node. qdbd is hyper-scalar and will be able to use all the memory and processors of your server. The same remark applies for virtual machines: running quasardb multiple times in multiple virtual machines on a single physical server will not increase the performances.
The daemon will automatically launch an appropriate number of threads to handle connection accepts and requests, depending on the actual hardware configuration of your server.
Logging#
By default, a non-daemonized qdbd will log to the console. If daemonized, logging is disabled unless configured to log to files (--log-directory
) or to the syslog (--log-syslog
) on Unix.
There are six different log levels: detailed
, debug
, info
, warning
, error
and panic
. You can change the log level (--log-level
), it defaults to info
.
You can also change the log flush interval (--log-flush-interval
), which defaults to 3,000 ms.
Cache#
QuasarDB caches data in RAM in addition to the cache provided by the persistence layers. QuasarDB avoids loading data if it can answer a query based on information contained in the indexes, and will use a LRU strategy for timeseries buckets. Inserting data in parallel of querying it is well supported and will not pollute the cache.
The daemon will start to evict entries when the process memory usage reaches the soft limit. It will stop all processing and evict has much as it can when the process memory usage hits the hard limit. When QuasarDB evicts entries, it will throttle down queries to prevent a situation where users would load data faster than QuasarDB could evict it.
Thus, the memory usage is kept between the soft and the hard limit, possibly below the soft limit.
The memory usage measurement is based on the process size, which means that file system caches, and persistence layer caches are included in this measurement. It is thus important to ensure that the soft limit is well above the sum of all caches of the persistence layer. Failure to do so may result in continuous eviction and poor performance.
By default, the hard limit will be 80% of the physical RAM present on the machine, and the soft limit will be 75% of the hard limit.
On a machine with 64 GiB of RAM, the hard limit will thus be around 51 GiB, and the soft limit will be 38 GiB.
Each parameter can be configured independently, the hard limit must always be greater than the soft limit.
Ideally, you want your working set to fit in memory. The working set for a single node is close to the total working set divided by the number of nodes.
Note
The cache size has a huge impact on performance. Your QuasarDB solutions architect will be happy to assist you in finding the optimal setting for your usecase.
Data Storage#
Note
Data storage options are global for any given ring.
QuasarDB has three storage engines:
RocksDB (default)
Transient (no data is written to disk)
Entries are often kept resident in a write cache so the daemon can rapidly serve a large amount of simultaenous requests. Data may not be synced to the disk at all times.
The RocksDB persistence engine allows you to sync every write to disk, if needed, thanks to the “sync” option setting.
For more information, see Data Storage and Data Transfer.
RocksDB#
RocksDB is an open-source, persistent, key-value store for fast storage environments. It is based on LevelDB and uses LSM trees.
To enable RocksDB, one sets the storage engine configuration parameter to “rocksdb” and set a “root” directory in the rocksdb configuration section. QuasarDB will then write the data under this directory using our tuned RocksDB implementation. The data stored is 100% compatible with RocksDB and can be used via the RocksDB API, if needed.
The directory can be absolute or relative, for production we recommend using absolute directory.
It is possible to limit the amount of space a node will occupy with the “max_size” option. The writes to the node will fail when the disk usage reaches that limit, warnings being emitted before that point. The write-ahead log is not accounted in the space usage meaning that the actual disk usage may be greater than the limit. Compression may also reduce the actual disk usage.
Note
RocksDB is a safe default, however it can limit the performance of your QuasarDB cluster.
Persistent read cache#
Note
The persistent read cache is only available for the RocksDB persistence layer.
The persistent read cache optimizes I/O in buffering data from a (potentially remote) storage into a faster, local storage (See Persistent read cache). The persistent read cache does not buffer writes.
There are three configuration settings:
The path of the persistent read cache. It should be used on local, fast storage only (SSD, NVMe, or Optane). Using remote or slow storage will have a detrimental effect on performance.
The maximum size of the persistent read cache
Should the persistent read cache be optimized for NVMe.
The persistent read cache is disabled by default.
Partitions#
A partition can be seen as a worker thread. The more partitions, the more work can be done in parallel. However if the number of partitions is too high relative to your server capabilities to actually do parallel work, performance will decrease.
quasardb is highly scalable and partitions do not interfere with each other. The daemon’s scheduler will assign incoming requests to the partition with the least workload.
The ideal number of partitions is close to the number of physical cores your server has. By default the daemon chooses the best compromise it can. If this value is not satisfactory, you can use the partitions_count config file option to set the value manually.
Note
Unless a performance issue is identified, it is best to let the daemon compute the partition count.
Asynchronous time series inserter#
The server has a built-in asynchronous timeseries inserter that is acceded through the batch insert API. The inserter will buffer updates to commit them in batch for optimal performance.
When an asynchronous request arrives, the following happens:
The server validates the request
The request will be dispatched to one of the pipeline through an asynchronous queue, if there is room.
If the queue is full, it will wait a random amount of time and try again. If it still fails after three updates the server will return to the client and error “busy, try again later”
The asynchronous workers run in a configurable number of pipelines which run in independent threads.
Each pipeline does the following:
If the queue is half full, or if a configured amount of time has elapsed, the content of the queue will be inspected.
Requests updating the same timeseries bucket will be merged in a single request
Merged requests will be written to disk
The following parameters are configurable:
The number of pipelines (by default, 1)
The amount of data a queue may contain
The number of requests a queue may contain
The maximum amount of time before the content of a queue is inspected and written to disk
Statistics#
By default the server will collect statistics. Statistics are stored in blobs or integer keys for convenient consumption. They represent the value since the latest refresh. The refresh interval is configurable and by default 5,000 milliseconds (5 seconds).
The key name of the statistics are in the form “$qdb.statistics.{node id}.{stat name}”. For example, for the node ID 1-0-0-0, the cummulated amount of bytes written to disk key name is “$qdb.statistics.1-0-0-0.persistence.bytes_written”.
Supported metrics#
The currently supported metrics are:
engine_build_date: blob, the QuasarDB engine build date
engine_version: blob, the QuasarDB engine version
hardware_concurrency: int64, the detected concurrent number of hardware threads supported on the system
startup: int64, the startup timestamp
node_id: blob, a string representing the node id
operating_system: blob, a string representing the operating system
partitions_count: int64, the number of partitions
cpu.idle: int64, the cumulated CPU idle time
cpu.system: int64, the cumulated CPU system time
cpu.user: int64, the cumulated CPU user time
disk.path: blob, the persistence path
disk.bytes_free: int64, the bytes free on the persistence path
disk.bytes_total: int64, the bytes total on the persistence path
memory.bytes_resident_size: int64, the computed amount of RAM used for data by QuasarDB
memory.physmem.bytes_total: int64, physical RAM free bytes count
memory.physmem.bytes_used: int64, physical RAM used bytes count
memory.resident_count: int64, the number of entries in RAM
memory.vm.bytes_total: int64, virtual memory free bytes count
memory.vm.bytes_used: int64, virtual memory used bytes count
network.current_users_count: int64, the current users count
network.sessions.max_count: int64, the configured maximum number of sessions
network.sessions.available_count: int64, the current number of available sessions
network.sessions.unavailable_count: int64, the current number of used sessions
persistence.bytes_capacity: int64, the persistence layer storage capacity, in bytes. May be 0 if the value is unknown.
persistence.bytes_utilized: int64, how many bytes are currently used in the persistence layer.
persistence.bytes_read: int64, the cumulated number of bytes read
persistence.bytes_written: int64, the cumulated number of bytes written
persistence.entries_count: int64, the current number of entries in the persistence layer
requests.total_count: int64, the cumulated number of requests
requests.successes_count: int64, the cumulated number of successful operations
requests.bytes_out: int64, the cumulated number of bytes sent by the server
Performance data#
When performance profiling is enabled, each request will store an accumulator of the time spent, in nanoseconds, in each step of the process.
The performance metrics are stored in the “$qdb.statistics.{node id}.perf.” subfield.
Operating limits#
Theoretical limits#
- Entry size
An entry cannot be larger than the amount of virtual memory available on a single node. This ranges from several megabytes to several gigabytes depending on the amount of physical memory available on the system. It is recommended to keep entries size well below the amount of available physical memory.
- Key size
As it is the case for entries, a key cannot be larger than the amount of virtual memory available on a single node.
- Number of nodes in a grid
The maximum number of nodes is \(2^{63}\) (9,223,372,036,854,775,808)
- Number of entries on a single grid
The maximum number of entries is \(2^{63}\) (9,223,372,036,854,775,808)
- Node maximum capacity
The node capacity depends on the available disk space on a given node. The community edition is limited to 16 GiB on disk and 4 GiB in RAM.
- Total amount of data
The total amount of data a single grid may handle is 16 EiB (that’s 18,446,744,073,709,551,616 bytes).
Practical limits#
- Entry size
Very small entries (below a hundred bytes) do not offer a very good throughput because the network overhead is larger than the payload. This is a limitation of TCP. Very large entries (larger than 10% of the node RAM) impact performance negatively and are probably not optimal to store on a quasardb cluster “as is”. It is generally recommended to slice very large entries in smaller entries and handle reassembly in the client program. If you have a lot of RAM (several gigabytes per node) do not be afraid to add large entries to a quasardb cluster. For optimal performance, it’s better if the “hot data” - the data that is frequently acceded - can fit in RAM.
- Simultaneous clients
A single instance can serve thousands of clients simultaneously. The actual limit is the network bandwidth, not the server. You can set the
-s
to a higher number to handle more simultaneous clients per node. Also you should make sure the clients connects to the nodes of the cluster in a load-balanced fashion.
Parameters Reference#
Parameters can be supplied in any order and are prefixed with --
.
The arguments format is parameter dependent.
Instance specific parameters only apply to the instance, while global parameters are for the whole ring. Global parameters are applied when the first instance of a ring is launched.
Instance specific#
- -h, --help#
Displays basic usage information.
- Example
To display the online help, type:
qdbd --help
- -v, --version#
Displays qdbd version information.
- --gen-config#
Generates a JSON configuration file with default values and prints it to STDOUT.
- Example
To create a new config file with the name “qdbd_default_config.json”, type:
qdbd --gen-config > qdbd_default_config.json
Note
The –gen-config argument is only available with quasardb 1.1.3 or higher.
- -c, --config#
Specifies a configuration file to use. See Config File Reference.
Any other command-line options will be ignored.
If an option is omitted in the config file, the default will be used.
If an option is malformed in the config file, it will be ignored.
- Argument
The path to a valid configuration file.
- Example
To use a configuration file named “qdbd_default_config.json”, type:
qdbd --config=qdbd_default_config.json
Note
The –config argument is only available with quasardb 2.0.0 or higher.
- -d, --daemonize#
Runs the server as a daemon (UNIX only). In this mode, the process will fork and prevent console interactions. This is the recommended running mode for UNIX environments.
- Example
To run as a daemon:
qdbd -d
Note
Logging to the console is not allowed when running as a daemon.
- --license-file#
Specifies the location of the license file. A valid license is required to run the daemon (see License).
- Argument
The path to a valid license file.
- Default value
qdb_license.txt
- Example
Load the license from license.txt:
qdbd --license-file=license.txt
- -a <address>:<port>, --address=<address>:<port>#
Specifies the address and port on which the server will listen.
- Argument
A string representing one address the server listens on and a port. The address string can be a host name, an IP address or an interface (BSD and Linux only).
- Default value
127.0.0.1:2836, the IPv4 localhost and the port 2836
- Example
Listen on localhost and the port 5910:
qdbd --address=localhost:5910
Listen on eth0 port 2836:
qdbd --address=eth0:2836
Note
The unspecified address (0.0.0.0 for IPv4, :: for IPv6) is not allowed.
- -s <count>, --sessions=<count>#
Specifies the number of simultaneous sessions per partition.
- Argument
A number greater or equal to fifty (50) representing the number of allowed simultaneous sessions.
- Default value
64
- Example
Allow 10,000 simultaneous session:
qdbd --sessions=10000
Note
The sessions count determines the number of simultaneous clients the server may handle at any given time. Increasing the value increases the memory load. This value may be limited by your license.
- --idle-timeout=<duration>#
Sets the timeout after which inactive sessions will be considered for termination.
- Argument
An integer representing the number of milliseconds after which an idle session will be considered for termination.
- Default value
300,000 (300 seconds, 5 minutes)
- Example
Set the timeout to one minute:
qdbd --idle-timeout=60000
- --request-timeout=<timeout>#
Sets the timeout after which a request from the server to another server must be considered to have timed out.
- Argument
An integer representing the number of milliseconds after which a request must be considered to have timed out.
- Default value
60,000 (60 seconds, 1 minute)
- Example
Set the timeout to two minutes:
qdbd --request-timeout=120000
- --peer=<address>:<port>#
The address and port of a peer to which to connect within the cluster. It can be any server belonging to the cluster.
- Argument
The address and port of a machines where a quasardb daemon is running. The address string can be a host name or an IP address.
- Default value
None
- Example
Join a cluster where the machine 192.168.1.1 listening on the port 2836 is already connected:
qdbd --peer=192.168.1.1:2836
- --id=<id string>#
Sets the node ID.
- Argument
A string representing the ID to of the node. This can either be a 256-bit number in hexadecimal form, the value “random” and use the indexed syntax. This value may not be zero (
0-0-0-0
).You are strongly encouraged to use the indexed syntax. See Clustering.- Default value
Unique random value.
- Example
Set the node ID to 1-a-2-b:
qdbd --id=1-a-2-b
Set the node ID to a random value:
qdbd --id=random
Set the node to the ideal value for the third node of a cluster totalling 8 nodes:
qdbd --id=3/8
Warning
Having two nodes with the same ID on the ring leads to undefined behaviour. By default the daemon generates an ID that is guaranteed to be unique on any given ring. Only modify the node ID if the topology of the ring is unsatisfactory and you are certain no two node IDs are the same.
- -l <path>, --log-directory=<path>#
Logs in the designated directory.
- Argument
A string representing a path to a directory where log files will be created.
- Example
Log in /var/log/qdb:
qdbd --log-directory=/var/log/qdb
- --log-syslog#
UNIX only, activates logging to syslog.
- --log-level=<value>#
Specifies the log verbosity.
- Argument
A string representing the amount of logging required. Must be one of:
detailed
(most output)debug
info
warning
error
panic
(least output)
- Default value
info
- Example
Request a
debug
level logging:qdbd --log-level=debug
- --log-flush-interval=<delay>#
How frequently log messages are flushed to output, in milliseconds.
- Argument
An integer representing the number of milliseconds between each flush.
- Default value
3,000
- Example
Flush the log every minute:
qdbd --log-flush-interval=60000
- --rocksdb-max-open-files=<count>#
Sets the maximum number of open files for the RocksDB persistence layer. The default value is configured in a way that it will not stress the operating system, at a potential performance cost. Increasing the default value is recommended for production servers. This only applies to the RocksDB persistence layer.
- Argument
An integer representing the maximum number of open files at any point in time.
- Default value
256
- Example
Increase the number to 10,240:
qdbd --rocksdb-max-open-files=10240
- --rocksdb-max-bytes=<size-in-bytes>#
Sets the maximum amount of disk usage for each node’s database in bytes. Any write operations that would overflow the database will return a
qdb_e_system_remote
error stating “disk full”. The write-ahead log is not accounted in the disk usage.Due to excessive meta-data or uncompressed db entries, the actual database size may exceed this set value by up to 20%.
Only applies to the RocksDB storage engine.
- Argument
An integer representing the maximum size of the database on disk in bytes. The minimum value is 134,217,728 (128 MB).
- Default value
0 (disabled)
- Example A
To limit the database size on each node to 12 Terabytes:
\[\begin{split}\text{Max Depot Size Value} &= \text{12 Terabytes} \: * \: \frac{1024^4 \: \text{Bytes}}{\text{1 Terabyte}}\\ &= \text{13194139533312 Bytes}\end{split}\]And thus the command:
qdbd --rocksdb-max-bytes=13194139533312
This database may expand out to approximately 14.4 Terabytes due to meta-data and uncompressed db entries.
- Example B
This example will limit the database size to ensure it fits within 1 Terabyte of free space. Since limiting to a specific overhead is important in this example, the filesystem cluster size is also taken into account; the default for most filesystems is 4096 bytes.
\[\begin{split}\text{Max Depot Size Value} &= \text{1099511627776 Bytes} - \text{(1099511627776 Bytes} \: * \: 0.2 \text{)} - \text{Cluster Size of 4096} \\ &= \text{1099511627776 Bytes} - \text{219902325555.2 Bytes} - \text{4096 Bytes} \\ &= \text{879609298124.8 Bytes}\end{split}\]And thus the command, truncating down to an integer:
qdbd --rocksdb-max-bytes=879609298124
This database should not exceed 1 Terabyte.
Note
The –rocksdb-max-bytes argument is only available with quasardb 1.1.2 or higher.
Note
Using a max depot size may cause a slight performance penalty on writes.
- --limiter-max-bytes-hard=<value>#
The hard limit after which the system will take drastic measures to lower memory usage. When the hard limit is reached, processing is temporarly stopped, every entry is evicted from memory, file cache is flushed, and every buffer is purged to free memory.
The total process memory usage is measured, including system caches.
- Argument
An integer representing the hard limit, in bytes.
- Default value
0 (automatic, 4/5 the available physical memory).
- Example
To allow only 100 KiB of entries:
qdbd --limiter-max-bytes-hard=102400
To allow up to 8 GiB:
qdbd --limiter-max-bytes-hard=8589934592
Note
This value has a huge impact on performance.
- --limiter-max-bytes-soft=<value>#
The limit after which the system will start to evict entries. When the soft limit, queries will be throttled down to let the system clear memory. The soft limit must always be lower than the hard limit.
The total process memory usage is measured, including system caches.
- Argument
An integer representing the soft limit, in bytes.
- Default value
0 (automatic, 3/4 of the hard limit).
- Example
To allow only 100 KiB of entries:
qdbd --limiter-max-bytes-soft=102400
To allow up to 8 GiB:
qdbd --limiter-max-bytes-soft=8589934592
Note
This value has a huge impact on performance.
- -r <path>, --rocksdb-root=<path>#
Specifies the directory where data will be persisted when using the RocksDB storage engine for the node where the process has been launched.
- Argument
A string representing a full path to the directory where data will be persisted.
- Default value
The “db” subdirectory relative to the current working directory.
- Example
Persist data in /var/quasardb/db
qdbd --rocksdb-root=/var/quasardb/db
Note
Although this parameter is global, the directory refers to the local node of each instance.
- --security=<boolean>#
Enables or disables cluster security.
- Argument
A boolean specifiying whether or not security should be enabled.
- Default value
True
- Example
To disable security completely:
qdbd --security=false
Note
To work, security needs a cluster private key and an users list.
- --cluster-private-file=<path>#
A path to the cluster private key file.
- Argument
A string representing a full path to the cluster private key file.
- Example
Use the file /etc/qdbd/cluster_private.key:
qdbd --cluster-private-file=/etc/qdbd/cluster_private.key
Note
A cluster private key file is required for security to work (see quasardb cluster key generator).
- --user-list=<path>#
A path to the user lists containing the user names and their respective public key in JSON format.
- Example
Use the file /etc/qdbd/users.cfg:
qdbd --user-list=/etc/qdbd/users.cfg
Note
A users list is required for security to work (see quasardb user adder).
Global#
- --replication=<factor>#
Specifies the replication factor (global parameter). For more information, see Data replication.
- Argument
A positive integer between 1 and 4 (inclusive) specifying the replication factor. If the integer is higher than the number of nodes in the cluster, it will be automatically reduced to the cluster size.
- Default value
1 (replication disabled)
- Example
Have one copy of every entry in the cluster:
qdbd --replication=2
- --storage_engine=<engine>#
Specifies the storage engine.
- Argument
A string representing the storage engine to use. Can be one of ‘transient’, and ‘rocksdb’.
- Default value
“rocksdb”
- Example
Use the transient persistence engine:
qdbd --storage_engine=transient
Config File Reference#
As of quasardb version 1.1.3, the qdbd daemon can read its parameters from a JSON configuration file provided by the -c
command-line argument. Using a configuration file is recommended.
Some things to note when working with a configuration file:
If a configuration file is specified, all other command-line options will be ignored. Only values from the configuration file will be used.
The configuration file must be valid JSON in ASCII format.
If a key or value is missing from the configuration file or malformed, the default value will be used.
If a key or value is unknown, it will be ignored.
The default configuration file is shown below:
{
"local": {
"depot": {
"rocksdb": {
"sync_every_write": false,
"root": "db",
"max_bytes": 0,
"storage_warning_level": 90,
"storage_warning_interval": 3600000,
"disable_wal": false,
"direct_read": false,
"direct_write": false,
"max_total_wal_size": 1073741824,
"metadata_mem_budget": 268435456,
"data_cache": 134217728,
"threads": 4,
"hi_threads": 2,
"max_open_files": 0,
"persistent_cache_path": "",
"persistent_cache_size": 0,
"persistent_cache_nvme_optimization": false
},
"async_ts": {
"pipelines": 1,
"pipeline_buffer_size": 1073741824,
"pipeline_queue_length": 1000000,
"flush_deadline": 4000
}
},
"user": {
"license_file": "",
"license_key": "",
"daemon": false
},
"limiter": {
"max_resident_entries": 0,
"max_bytes_soft": 0,
"max_bytes_hard": 0
},
"logger": {
"log_level": 2,
"flush_interval": 3000,
"log_directory": "",
"log_to_console": false,
"log_to_syslog": false
},
"network": {
"server_sessions": 64,
"partitions_count": 4,
"idle_timeout": 600000,
"client_timeout": 60000,
"max_in_buffer_size": 134217728,
"max_out_buffer_size": 134217728,
"listen_on": "127.0.0.1:2836",
"advertise_as": "127.0.0.1:2836",
"profile_performance": false
},
"chord": {
"node_id": "0-0-0-0",
"no_stabilization": false,
"bootstrapping_peers": [],
"min_stabilization_interval": 100,
"max_stabilization_interval": 60000
}
},
"global": {
"cluster": {
"storage_engine": "rocksdb",
"enable_statistics": true,
"statistics_refresh_interval": 5000,
"replication_factor": 1,
"max_versions": 3,
"max_transaction_duration": 15000,
"acl_cache_duration": 60000,
"acl_cache_size": 100000,
"persisted_firehose": "$qdb.firehose"
},
"security": {
"enable_stop": false,
"enable_purge_all": false,
"enabled": false,
"encrypt_traffic": false,
"cluster_private_file": "",
"user_list": ""
}
}
}
- local::depot::rocksdb::sync_every_write
A boolean representing whether or not the node should sync to disk every write. This option has a huge negative impact on performance, especially on high latency media and adds only marginal safety compared to the sync option. Disabled by default. This setting only applies to RocksDB.
- local::depot::rocksdb::root
A string representing the relative or absolute path to the directory where data will be stored. Specifying this string will enable RocksDB as the persistence layer.
- local::depot::rocksdb::max_bytes
An integer representing the maximum amount of disk usage for each node’s database in bytes. Any write operations that would overflow the database will return a
qdb_e_system_remote
error stating “disk full”.Due to excessive meta-data or uncompressed db entries, the actual database size may exceed this set value by up to 20%.
See
--rocksdb-max-bytes
for more details and examples to calculate the max_bytes value.
- local::depot::storage_warning_level
An integer between 50 and 100 (inclusive) specifying the percentage of disk usage at which a warning about depleting disk space will be emitted. See also
local::depot::storage_warning_interval
.
- local::depot::storage_warning_interval
An integer representing how often quasardb will emit a warning about depleting disk space, in milliseconds. See also
local::depot::storage_warning_level
.
- local::depot::rocksdb::disable_wal
A boolean repersenting whether or not the write-ahead log should be used. When you write data to quasardb, it is added in a buffer who is backed by a disk file called the write-ahead log. In case of failure, quasardb is able to recover by reading from the write-ahead log. For applications that are looking for maximum write performance, you may want to disable the write-ahead log. However, disabling the write-ahead log means that you can lose data should a failure occur before the buffer is flushed into the database. Disabled by default (that is, by default, buffers are backed by disk).
- local::depot::rocksdb::direct_read
A boolean repersenting whether or not reads from the disk should be direct (i.e. bypass OS buffers). This setting has an impact on performance and memory usage depending on the hardware configuration. It is generally advised to not use direct reads with spinning disks.
- local::depot::rocksdb::direct_write
This option currently does not have any effect.
- local::depot::rocksdb::max_total_wal_size
The maximum size, in bytes, of the write-ahead log.
- local::depot::rocksdb::metadata_mem_budget
An integer representing the approximate amount of memory (RAM) that should be dedicated to the management of metadata.
- local::depot::rocksdb::data_cache
An integer representing the apporximate amount of memory (RAM) that should be used for caching data blocks. This setting only applies to RocksDB.
- local::depot::rocksdb::threads
An integer representing the number of threads dedicated to the persistence layer.
- local::depot::rocksdb::hi_threads
An integer representing the number of high-priority threads dedicated to the persistence layer, in addition to the normal priority threads.
- local::depot::rocksdb::max_open_files
An integer representing the maximum number of files to keep open at a time. 0 for auto-detection.
- local::depot::rocksdb::persistent_cache_path
A string representing the path to the peristent cache path. An empty value means the read persistent cache is disabled.
- local::depot::rocksdb::persistent_cache_size
An integer representing the maximum size of the persistent read cache. This value cannot be lower than 10 MiB.
- local::depot::rocksdb::persistent_cache_nvme_optimization
A boolean specifying whether the persistent read cache should be optimized for NVMe or not. Disabled by default.
- local::depot::async_ts::pipelines
The number of asynchronous time series pipelines (see Asynchronous time series inserter).
- local::depot::async_ts::pipeline_buffer_size
The maximum amount of data in a pipeline buffer before requests are refused (see Asynchronous time series inserter).
- local::depot::async_ts::pipeline_queue_length
The maximum count of requests in the pipeline buffer before requests are refused (see Asynchronous time series inserter).
- local::depot::async_ts::flush_deadline
The maximum amount of time before the pipeline buffer gets inspected and flushed to disk (see Asynchronous time series inserter).
- local::user::license_file
A string representing the relative or absolute path to the license file. Providing an empty string to both license_key and license_file runs quasardb in evaluation mode.
- local::user::license_key
A string representing the license. Providing an empty string to both license_key and license_file runs quasardb in evaluation mode.
- local::user::daemon
A boolean value representing whether or not the quasardb daemon should daemonize on launch.
- local::limiter::max_bytes_soft
The limit after which the system will start to evict entries. When the soft limit, queries will be throttled down to let the system clear memory. The soft limit must always be lower than the hard limit. The total process memory usage is measured, including system caches.
- local::limiter::max_bytes_hard
The hard limit after which the system will take drastic measures to lower memory usage. When the hard limit is reached, processing is temporarly stopped, every entry is evicted from memory, file cache is flushed, and every buffer is purged to free memory.
- local::limiter::max_trim_queue_length
An integer representing the total maximum number of updated entries that may be queued for asynchronous trimming. Trimming is a background process that optimizes disk space.
- local::logger::log_level
An integer representing the verbosity of the log output. Acceptable values are:
0 = detailed (most output) 1 = debug 2 = info (default) 3 = warning 4 = error 5 = panic (least output)
- local::logger::flush_interval
An integer representing how frequently quasardb log messages should be flushed to the log locations, in milliseconds. Default value: 3,000 ms.
- local::logger::log_directory
A string representing the relative or absolute path to the directory where log files will be created.
- local::logger::log_to_console
A boolean value representing whether or not the quasardb daemon should log to the console it was spawned from. This value is ignored if local::user::daemon is true.
- local::logger::log_to_syslog
A boolean value representing whether or not the quasardb daemon should log to the syslog.
- local::network::server_sessions
An integer representing the number of server sessions the quasardb daemon can provide.
- local::network::partitions_count
An integer representing the number of partitions, or worker threads, quasardb can spawn to perform operations. The ideal number of partitions is close to the number of physical cores your server has. If set to 0, the daemon will choose the best compromise it can.
- local::network::idle_timeout
An integer representing the number of milliseconds after which an inactive session will be considered for termination.
- local::network::client_timeout
An integer representing the number of milliseconds after which a client session will be considered for termination.
- local::network::max_in_buffer_size
The maximum input size that will be allowed by the server in a single message. Any ingoing message larger than this value may be dropped by the server.
- local::network::max_out_buffer_size
The maximum output size that will be allowed by the server in a single message. Any outgoing message larger than this value may be dropped by the server.
- local::network::max_ts_buffered_queue_length
The maximum length of the asynchronous write queue. When doing asynchronous updates to a timeseries, updates are queued to be processed later. If the length of the queue exceeds this value, updates may be refused by the server.
- local::network::max_ts_async_writer_interval
The interval, in milliseconds, at which the asynchronous timseries updates queue will be processed.
- local::network::listen_on
A string representing an address and port the daemon should listen on. The string can be a host name or an IP address. Must have name or IP separated from port with a colon.
- local::network::advertise_as
A string representing an address and port the daemon will adverstise itself to other daemons as. This setting is mainly used for cloud and container deployments. Must have name or IP separated from port with a colon.
- local::network::performance
Enable performance metrics collection and return the data to client for every call. Disabled by default for performance and security reasons.
- local::chord::node_id
A string representing the ID to of the node. This can either be a 256-bit number in hexadecimal form, the value “random” and use the indexed syntax. This value may not be zero (
0-0-0-0
). If left at the default of0-0-0-0
, the daemon will assign a random node ID at startup. You are strongly encouraged to use the indexed syntax. See Clustering.
- local::chord::no_stabilization
A read-only boolean value representing whether or not this node should stabilize upon startup. Even if set to true, stabilization will still occur.
- local::chord::min_stabilization_interval
The minimum wait interval between two stabilizations, in milliseconds. The default value is 100 ms, it is rarely needed to change this value. This value cannot be zero.
- local::chord::max_stabilization_interval
The maximum wait interval between two stabilizations, in milliseconds. Nodes disapearance will take at least that amount of time. The default value is 60,000 ms (one minute). This value must be greater than the minimum stabilization interval, and cannot be lower than 10 ms.
- local::chord::bootstrapping_peers
An array of strings representing other nodes in the cluster which will bootstrap this node upon startup. The string can be a host name or an IP address. Must have name or IP separated from port with a colon.
- global::cluster::storage_engine
A string reprensenting the storage engine to use for persistence. Must be either “rocksdb”, or “transient”. By default “rocksdb”.
- global::cluster::enable_statistics
A boolean setting whether or not the cluster should gather statistics. Small performance impact. Enabled by default.
- global::cluster::statistics_refresh_interval
An integer representing the refresh interval of the statistics, in milliseconds. May not be below 1,000 milliseconds (1 second).
- global::cluster::replication_factor
An integer between 1 and 4 (inclusive) specifying the replication factor for the cluster. A higher value indicates more copies of data on each node.
- global::cluster::max_versions
An integer represending the maximum number of copies the cluster keeps for transaction history. If an entry has more versions than this value, the oldest versions are garbage collected.
- global::cluster::max_transaction_duration
An integer representing the maximum guaranteed duration of a transaction, in milliseconds. Transactions lasting longer than this interval will be rolled-back. Default value, 15,000 ms.
- global::cluster::acl_cache_duration
An integer representing the maximum guaranteed duration of cached ACL information. Nodes may cache ACL information to improve performance.
- global::security::enable_stop
Allows a node to be remotely stop via an API call. False by default.
- global::security::enable_purge_all
Allows the cluster to be remotely purged via an API call. False by default.
- global::security::enabled
Require cryptographically strong authentication to connect to the cluster. True by default.
- global::security::encrypt_traffic
In addition to requiring authentication, encrypt all network traffic. This setting can have a negative performance impact. False by default.
- global::security::cluster_private_file
Specifies the path to the cluster private key file (see quasardb cluster key generator). This file must not be accessible to the daemon only.
- global::security::user_list
Specifies the path to the users list (see quasardb user adder). This file must be writable by the administrator only.
Performance considerations#
Persistence#
When using RocksDB as a persistence layer, there are a certain number of parameters that are key to have been set properly. Default values were engineered to be “safe” to use without any system-wide tuning. For production environements, it’s generally advised to change these default values.
Important settings to look at:
local::depot::disable_wal
: Disabling the WAL means that fresh updates will be kept in memory and not written to disk. This can greatly increase performance at the cost of potential data loss.local::depot::max_open_files
: The default value of 0 will request QuasarDB to use as many as it’s safe to have on the current system. If the automatic value isn’t satisfactory, ensure that your file descriptor limit is high enough. If for some reason, the value is still incorrect, you can specify manually a value. A low value will make RocksDB open and close files more than necessary.local::depot::sync_every_write
: Disabled by default. Enabling it will greatly reduce performance for a small reliability benefit.
It is not recommended to touch the caching and threading parameters without the assitance of a QuasarDB Solutions Architect.
Network#
The default network settings ensure QuasarDB will not exhaust sockets and can be used for testing purposes. In production environments where the connections can be in the realm of thousands, the default settings will not be adequate.
Important settings to look at:
local::network::server_sessions
: The default value of 64 will be insufficient for serious production use. Provided you increased the descriptors limits, this value can be greater than 1,024. Each partition will preallocate the session, meaning high values can result in high memory usage as well.local::network::partitions_count
: The number of independent processing partition. Having more partitions than cores is generally counter-productive. Allocating half of your cores is a safe start, as QuasarDB will consume additional cores for persistence and upkeeping tasks.local::network::idle_timeout
: The default value of 10 minutes, means that a daemon guarantees it will not initiate termination of a client connection before 10 minutes. This setting may be too high if the daemon has to handle misbehaving applications that do not properly terminate the connection, resulting in premature session exhaustion.
Memory management#
QuasarDB paginates memory usage for you, meaning it will load data from disk if missing from memory and evict it from memory if the usage exceeds configured thresholds. Memory usage measurement is always approximative that’s why QuasarDB has an additional parameter that will base eviction on the number of entries in memory.
If the eviction thresholds are too low, QuasarDB will not properly use all the memory available and may spend a lot of time paging in and out entries.
If the evictions thresholds are too high, system memory may be exhausted, which at best results in dramatic performance loss and at worst in operating system failure.
Important settings to look at:
local::limiter::max_bytes
: The maximum memory usage the daemon is allowed to have. When this setting is zero, QuasarDB will use half of the available memory. In the community edition, this setting is capped to 4 GiB. Ensure the baseline memory usage of QuasarDB is significantly lower than this value, otherwise QuasarDB will keep evicting entries.local::limiter::max_resident_entries
: The maximum number of entries allowed in memory. When this value is excedeed, QuasarDB will evict entries to be half of this setting. When this setting is zero, QuasarDB will usemax_bytes
divided by 1,024. This setting exists to protect QuasarDB and the OS should for some reason QuasarDB be unable to accurately measure memory usage.
Do not hesistate to contact your QuasarDB Solutions Architect should you be unsure of your memory management settings.
Security#
Authentication has little to no impact on server performance, however, encrypt the traffic will cap the bandwidth to the encryption speed of your server. QuasarDB uses AEGIS-256 for traffic encryption. On 10 GBit networks this can result in an observable drop in maximum transfer bandwidth.