4. Routine maintenance tasks#
QuasarDB requires certain tasks to be performed regularly to maintain optimal performance. The tasks discussed in this chapter are required, but they are repetitive in nature and can easily be automated using standard tools.
4.1. Graceful shutdown#
QuasarDB ensures data consistency by using a Write-Ahead Log (WAL) <https://en.wikipedia.org/wiki/Write-ahead_logging> on top of its data store. This ensures that, in the event of a fatal system crash, we can restore a node to a known good state and as such achieve single-node durability.
This recovery procedure can be expensive, may cause a full database repair upon next boot, and is typically something you want to avoid in production whenever possible.
Additionally, forcefully disconnecting a node from a cluster will lead to timeouts on clients, and a delayed cluster rebalancing.
To work around these limitations, QuasarDB supports a graceful shutdown. On UNIX-like systems, this can be achieved by sending the QuasarDB daemon the SIGTERM or SIGQUIT signal. This will ensure QuasarDB gracefully leaves the cluster, and does the necessary housekeeping to ensure the data store is in a consistent state.
If you have installed QuasarDB through one of our packages, our packaging scripts typically take care of this. However, if you are using custom scripts to start/stop QuasarDB, please ensure a SIGTERM is sent before termination. For example, with systemd, you will want to ensure you use an infinite termination timeout as follows:
[Service]
Type=simple
User=qdb
Group=qdb
ExecStart=/usr/bin/qdbd -c /etc/qdb/qdbd.conf
Restart=on-failure
LimitNOFILE=65536
TimeoutStopSec=infinity
Here we explicitly tell systemd to wait as long as it takes to have QuasarDB terminate gracefully.
4.2. Cluster trimming#
QuasarDB requires periodic trimming of the dataset. When trimming, it performs several operations:
it compacts the Write-Ahead-Log and merges this into the database;
it removes MVCC-transaction metadata;
it reclaims any unused space.
While these tasks are also done automatically by the database, we strongly recommend periodic explicit trimming, for example using a daily or weekly cron job. As trimming can be an expensive operation, we recommend to avoid doing this during peak hours.
You can issue a cluster trim using the QuasarDB shell:
$ qdbsh
qdbsh > cluster_trim
If you observe the logs of the database daemon, you should see something like this:
2019.11.05-16.37.42.953826008 3339 3438 info trimming all 412,160 entries (325,396 versions in memory)
2019.11.05-16.37.57.683002389 3339 3438 debug successfully cleaned allocator buffers
2019.11.05-16.37.57.683022210 3339 3438 info 434,001/325,594 entries were trimmed (for a size of 14.91 GiB). we now have 428,327 entries in total and 309,597 entries in memory for a total size of 60.238 GiB (16.0 EiB in memory)
2019.11.05-16.37.57.713021310 3339 3438 info requesting complete database compaction
2019.11.05-16.37.57.720022210 3339 3438 info complete database compaction took 47 ms
4.3. Log rotation#
QuasarDB does not support log rotation out of the box, and its log files will increase in size over time. We recommend:
weekly rotation of all log files stored in
/var/log/qdb;all logs older than 7 days should be removed.
Additionally, you might want to set up a centralized log aggregator to ease the management of your logs.
4.4. Backup Guide#
Backups ensure you can recover from operator errors, hardware failures, and other disasters. How often you should create backups depends on how much data you can afford to lose in the event of a failure.
4.4.1. Overview#
QuasarDB supports two complementary backup workflows:
Local incremental backups rely on RocksDB’s direct backup subsystem to maintain multiple point-in-time snapshots on a local filesystem.
S3 incremental backups stream the current dataset to Amazon S3, keeping only the most recent snapshot for fast redeployment in the cloud.
4.4.2. Choosing a backup workflow#
Aspect |
Local backups ( |
S3 backups ( |
|---|---|---|
Retention |
Incremental, keeps the last N snapshots |
Incremental, keeps a single snapshot |
Restore tool |
|
Copy snapshot to a new bucket/path and start the node on it |
Abort support |
|
Not supported; allow the operation to finish |
Typical use case |
On-premises or attached storage with multiple restore points |
Cloud-native redeployments with minimal operational overhead |
Node availability |
Runs while the node is online; stop only for restore |
Runs while the node is online |
4.4.3. Local backups (direct_backup)#
Local backups rely on RocksDB’s direct backup component. They are incremental, keep track of the last N versions, and provide true point-in-time recovery on the filesystem where they are stored.
4.4.3.1. Workflow summary#
Schedule backups during off-hours to reduce load on the cluster.
Issue a cluster trim to compact metadata and the WAL.
From
qdbsh(or via the management API), trigger a backup:direct_backup /path/to/backups fullfor the initial snapshot. The destination directory will be created or overwritten.direct_backup /path/to/backups checkpointfor subsequent incremental snapshots.direct_backup /path/to/backups checkpoint_trim <count>retains only the most recent<count>snapshots, automatically pruning older versions.
Optionally archive or copy completed snapshots to secondary storage once the command succeeds.
4.4.3.2. Monitoring and aborting local backups#
Use direct_backup_get_progress to monitor how much data has been written during a backup. If you need to cancel a running operation, issue direct_backup_abort; the backup directory will be left in an undefined state, so remove it before re-running the job.
4.4.3.3. Restoring local backups with qdb_dbtool#
Restoration is performed offline to guarantee consistency:
Gracefully stop the target node using
SIGTERM(for example:systemctl stop qdbd).Run the database tool to restore the chosen snapshot into the data directory:
$ qdb_dbtool --database /var/lib/qdb --restore /path/to/backups/latest
Adjust
--databaseto match your storage configuration, and point--restoreto the snapshot you want to recover.Start the daemon again once the command completes.
4.4.4. S3 backups (direct_s3_backup)#
S3 backups offer a lighter-weight workflow for clusters configured with Amazon S3. They are incremental under the hood, but only the most recent snapshot is retained—starting a new backup replaces the previous one.
4.4.4.1. Workflow summary#
Issue a cluster trim before initiating the backup to minimize redundant data.
Run the command:
$ direct_s3_backup bucket path_prefix [region [thread_count [flush]]]
bucket,path_prefix,region: Identify where the snapshot is stored.thread_count(default1) controls parallelism; higher values speed up uploads at the cost of memory usage.flush(0or1) determines whether to flush in-memory data before starting the backup.
Monitor progress with
direct_s3_backup_get_progress. This reports the total number of files and how many have been uploaded.Only one
direct_s3_backupcan run per node at a time, and it cannot be aborted mid-flight. Wait for completion before starting a new backup.
4.4.4.2. Restoring S3 backups#
S3 backups are effectively copies of the live dataset. To “restore” one:
Copy the saved snapshot to a new S3 location (for example, another bucket or prefix) or download it to local storage.
Configure the node’s storage settings to point to the restored data (or sync it to a fresh data directory).
Start the daemon. Because only a single snapshot is retained, there is no multi-version selection step.
This workflow is intentionally simpler than the local backup process; use it when you only need the latest consistent image of each node.