4. Routine maintenance tasks#

QuasarDB requires certain tasks be performed regularly to maintain optimal performance. The tasks discussed in this chapter are required, but they are repetetive in nature and can easily be automated using standard tools.

4.1. Graceful shutdown#

QuasarDB ensures data consistency by using a Write-Ahead Log (WAL) <https://en.wikipedia.org/wiki/Write-ahead_logging> on top of its data store. This ensures that, in the event of a fatal system crash, we can restore a node to a known good state and as such achieve single-node durability.

This recovery procedure can be expensive, may cause a full database repair upon next boot, and is typically something you want to avoid in production whenever possible.

Additionally, forcefully disconnecting a node from a cluster will lead to timeouts on clients, and a delayed cluster rebalancing.

To work around these limitations, QuasarDB supports a graceful shutdown. On UNIX-like systems, this can be achieved by sending the QuasarDB daemon the SIGTERM or SIGQUIT signal. This will ensure QuasarDB gracefully leaves the cluster, and does the necessary housekeeping to ensure the data store is in a consistent state.

If you have installed QuasarDB through one of our packages, our packaging scripts typically take care of this. However, if you are using custom scripts to start/stop QuasarDB, please ensure a SIGTERM is sent before termination, and not killed. For example, with systemd, you will want to ensure you use an infinite termination timeout as follows:

[Service]
Type=simple
User=qdb
Group=qdb
ExecStart=/usr/bin/qdbd -c /etc/qdb/qdbd.conf
Restart=on-failure
LimitNOFILE=65536
TimeoutStopSec=infinity

Here we explicitly tell systemd to wait as long as it takes to have QuasarDB terminate gracefully.

4.2. Cluster trimming#

QuasarDB requires periodic trimming of the dataset. When trimming, it performs several operations:

  • it compacts the Write-Ahead-Log and merges this into the database;

  • it removes MVCC-transaction metadata;

  • it reclaims any unused space.

While these tasks are also done automatically by the database, we strongly recommend periodic explicit trimming, for example using a daily or weekly cron job. As trimming can be an expensive operation, we recommend to avoid doing this during peak hours.

You can issue a cluster trim using the QuasarDB shell:

$ qdbsh
qdbsh > cluster_trim

If you observe the logs of the database daemon, you should see something like this:

2019.11.05-16.37.42.953826008   3339    3438       info         trimming all 412,160 entries (325,396 versions in memory)
2019.11.05-16.37.57.683002389   3339    3438       debug        successfully cleaned allocator buffers
2019.11.05-16.37.57.683022210   3339    3438       info         434,001/325,594 entries were trimmed (for a size of 14.91 GiB). we now have 428,327 entries in total and 309,597 entries in memory for a total size of 60.238 GiB (16.0 EiB in memory)
2019.11.05-16.37.57.713021310   3339    3438       info         requesting complete database compaction
2019.11.05-16.37.57.720022210   3339    3438       info         complete database compaction took 47 ms

4.3. Log rotation#

QuasarDB does not support log rotation out of the box, and its log files will increase in size over time. We recommend:

  • weekly rotation of all log files stored in /var/log/qdb;

  • all logs older than 7 days should be removed.

Additionally, you might want to setup a centralised log aggregator to ease the management of your logs.

4.4. Backup Guide#

Backups are a way to restore the system in case of any failure. How often you should create backups depends on how much data you can afford to lose in the event of a failure.

Although it’s not required, it’s recommended you take periodic backups of your database and store them in a safe, offsite location.

4.5. Backup Strategy#

We do not have a backup of the entire cluster. Instead, we have backups of individual nodes. Formally, this may create a problem since the contents of node backups may not be consistent. However, in practice, the difference should be negligible with the right backup policy organization.

We give the following process recommendations for backups:

  1. Schedule your backups during off-hours, for example at night;

  2. Before making a backup, issue a cluster trim in order to remove any unnecessary metadata and compact the Write-Ahead-Log (WAL);

  3. Shut the QuasarDB daemon you are about to take a backup of, for example using systemctl stop qdbd. Make sure to send the normal termination signal SIGTERM so that the process gracefully shutds down.

  4. Copy the data directory, including the WAL, to your backup location. The precise location of your data directory depends upon your storage configuration, but is typically /var/lib/qdb.

  5. After the copy operation has completed, you can restart the QuasarDB daemon.

Repeat this process for each of the nodes you want to back up.

4.6. Backup Guide#

Backups are a way to restore the system in case of any failure. How often you should create backups depends on how much data you can afford to lose in the event of a failure.

4.7. Backup Strategy#

We do not have a backup of the entire cluster. Instead, we have backups of individual nodes. Formally, this may create a problem since the contents of node backups may not be consistent. However, in practice, the difference should be negligible with the right backup policy organization.

4.8. Backup Storage Options#

Our server can work with AWS S3 storage or without it. Therefore, backups can be made locally or in S3.

4.8.1. Local Backups#

To initiate a local backup, use the following commands:

  1. direct_backup path full: This creates a full backup. Anything stored at the specified path will be overwritten.

  2. direct_backup path checkpoint: Initiates an incremental backup. Only changes relative to the previous backup will be saved, or a full backup will be created if it’s the first run.

  3. direct_backup path checkpoint_trim count: Removes old increments, leaving only the specified “count” number.

You can monitor the backup process with the direct_backup_get_progress command. This command returns the amount of data written to the backup. It’s purely informational, as you cannot estimate the percentage or remaining time of the backup.

If, for any reason, you need to abort a backup, you can use the direct_backup_abort command. It will terminate the current backup, and the state in the backup will be undefined.

Please note that only one backup command can be run on a node simultaneously. Attempting to start a second backup command will result in an error.

4.8.2. S3 Backups#

If the cluster is configured to work with AWS S3, you can create cloud backups. Cloud backups also work on each cluster node individually. Cloud backups are simple copies of files in the current active state. Copying occurs directly from the cloud to the cloud, and there is no support for increments.

Just like with local backups, only one backup command can be run on a node simultaneously. Attempting to start a second backup command will result in an error.

To initiate a backup, use the following command:

direct_s3_backup bucket path_prefix [region [thread_count [flush]]]
  • bucket, path_prefix, region: Specify the destination in Amazon S3 where the backup should be stored.

  • thread_count (default: 1, maximum: 1024): Specifies the number of threads for the backup process. Increasing the number of threads speeds up the backup but also increases memory usage.

  • flush: Determines whether to save data stored in memory before executing the backup.

To check the status of a backup, you can use the following command:

direct_s3_backup_get_progress

This command provides the total number of files for the backup and how many files have already been saved. It can be used to estimate the progress of the backup.

Please note that once a backup has been initiated, it cannot be interrupted.

4.8.3. Backup Structure in RocksDB#

RocksDB backup consists of several directories:

  1. Data Directory: This directory contains data and SST files, and it is shared among all backups. In other words, there is no distinction between the first backup and any subsequent backups.

  2. Special Directories: These directories contain information about the backup, including a list of files included in it.

When a backup is created, a new checkpoint is established, comprising a set of files and directories for that specific backup. Data files are copied to a shared storage location.

When old backups are removed, entry points are deleted, but the data files are retained. Afterward, a garbage collection process is performed to delete SST files that are no longer referenced by any remaining backup.

In the process, only files not referenced by any remaining backup are deleted.

This process can be summarized as follows:

  • When a backup is created, a new checkpoint with a set of files is established.

  • When old backups are removed, entry points are deleted.

  • Garbage collection identifies and deletes unreferenced SST files.

It’s worth noting that during a backup, RocksDB works with the current list of files in use by the database and saves only those files that are absent in the backup. The amount of data added or removed is independent of the backup process and is considered a user-level operation rather than a storage-level operation.

4.8.4. Managing Multiple Snapshots in RocksDB#

RocksDB’s “backup” feature serves not only as a traditional backup but also allows you to efficiently manage multiple snapshots of your data over time. These snapshots may share common data, and this mechanism provides flexibility in data recovery. Here’s how it works:

  • Snapshot Tracking: RocksDB’s “backup” feature doesn’t create entirely distinct backups every time. Instead, it keeps track of a series of snapshots. These snapshots may have overlapping data, and they share the same SST files.

  • Example Scenario:

    1. The first snapshot captures 100GB of SST files.

    2. A day later, data is added and removed, and the next snapshot includes 20GB of “new” data in the SST files, while 10GB is removed. The total “backed up” data is now 120GB.

    3. You can recover from either the first or second snapshot.

  • Flexible Recovery Points: With this mechanism, you can have multiple recovery points at different moments in time. For instance, creating a third snapshot adds another 20GB and removes another 10GB. Now, the total size of all backed-up data is 140GB, but you can recover from three snapshots at different points in time.

  • Retention Policy: RocksDB allows you to configure the “total number of snapshots to keep track of.” When this limit is reached, creating a new snapshot that adds data and removes some will trigger the removal of SST files that were referred to by the earliest snapshot but not by any other snapshots. This ensures efficient disk space management.

This snapshot management approach is a common mechanism found in various systems, such as some file systems used for daily incremental backups, like Time Machine. It tracks differences over time and removes files uniquely referred to by historical backups once they are discarded.