1.2. Compaction and Trimming#

QuasarDB uses a Log-Structured Merge Tree (LSM-tree) for high-throughput writes and efficient data organization. In addition, to handle multi-version concurrency control (MVCC), QuasarDB periodically trims old data versions. Understanding these processes and how to control them manually can help keep your database performing optimally.

1.2.1. LSM-Tree Fundamentals#

QuasarDB’s underlying storage uses RocksDB, which implements an LSM-tree architecture. This is a natural choice for data warehouses and timeseries applications because of its ability to handle very high write loads efficiently.

1.2.1.1. Why LSM-Tree for Timeseries Databases#

  1. High Write Throughput

    LSM-trees buffer new writes in an in-memory structure (memtable) before flushing them to disk. This allows QuasarDB to handle high ingestion rates with minimal performance impact.

  2. Optimized Disk I/O

    Data is flushed to disk sequentially, minimizing random writes. This is vital for timeseries workloads with many continuous inserts.

  3. Scalability

    As data grows, the LSM-tree organizes files into levels, merging smaller files over time. This design prevents write performance from degrading significantly as your dataset expands. The bottom-most level (the data that lives in the database the longest) is automatically compressed using the best compression algorithm, where higher levels use faster compression algorithms.

1.2.2. Compaction#

Compaction is a key process in LSM-trees, merging and reorganizing smaller on-disk files to remove obsolete data. QuasarDB automatically performs compactions in the background, but manual intervention is also possible.

1.2.2.1. Auto-Compaction#

  • Continuous Background Process

    QuasarDB automatically initiates compactions in the background. This constant low-level activity avoids large, sudden I/O spikes in most cases.

  • Minimal User Intervention

    In many environments, auto-compaction is sufficient. You usually do not need to manually trigger compaction for standard workloads.

1.2.2.2. Manual Compaction#

You may sometimes want direct control over the compaction process, for example:

  • Mass Data Deletions or Updates

    A large batch deletion can free up space more quickly if you trigger a compaction soon afterward.

  • Maintenance Windows

    You might run a manual compaction during low-usage periods to aggressively consolidate data.

Note

Manual compaction can be time-consuming (potentially days for very large datasets) and uses heavy I/O. To mitigate the load, QuasarDB lets you compact only a subset of tables at a time.

Manual Compaction Examples

-- Trigger a full compaction of the entire QuasarDB instance.
qdbsh > CLUSTER_COMPACT FULL;
-- Trigger compaction on all tables that start with a certain prefix.
qdbsh > cluster_compact prefix stocks/nasdaq/
-- Get the progress of the current compaction.
qdbsh > cluster_compact_get_progress
Compact: not active

1.2.3. Trimming (MVCC Cleanup)#

QuasarDB employs MVCC transactions, resulting in multiple versions of data over time. Trimming is the process of removing old or stale versions that are no longer needed by any active transactions.

1.2.3.1. Auto-Trimming#

  • Runs Automatically in the Background

    Just like compaction, trimming occurs periodically behind the scenes.

  • Ensures Consistent Data

    By removing older versions that are no longer visible to any transaction, auto-trimming keeps the database tidy and efficient.

1.2.3.2. Manual Trimming#

In certain use cases—such as after large transactional updates or before a major maintenance event—you may want to trigger trimming manually.

  1. Heavy Disk Writes

    Trimming reclaims space by rewriting and removing old versions, causing high I/O load.

  2. Table-Level or Subset Trimming

    You can choose to trim only certain tables to limit the scope of disk activity.

  3. Paced Trimming

    QuasarDB offers a “paced trimming” feature that pauses for a certain duration after each table is trimmed, helping maintain overall system performance.

A good use case to trigger manual trimming would be to trim a set of tables and/or a specific timerange after you know you have just done a large deletion or cleanup of a dataset. This will ensure stale data gets cleaned up swiftly.

Manual Trimming Examples

-- Trigger a full manual trimming operation on the entire cluster.
qdbsh> CLUSTER_TRIM;
-- Trigger trimming on a set of tables by tag:
TRIM TABLE FIND(tag='nasdaq');
-- Trigger trimming on a set of tables by tag and limit to a certain timerange:
TRIM TABLE FIND(tag='nasdaq') IN RANGE (2025, +1y);
-- Example of paced trimming with a pause of 1s after each table
qdbsh > CLUSTER_TRIM 1s;
-- Get the progress of the current trimming operation.
qdbsh > cluster_trim_get_progress
Trim: not active

1.2.4. Best Practices: Trimming Before Compaction#

Trimming is a higher-level operation that permanently removes old versions of data. To avoid unnecessary write amplification:

  1. Trim First

    Clear out old MVCC versions across the entire database or targeted tables.

  2. Compact Second

    Merge and reorganize the data after trimming, ensuring stale entries aren’t included in the on-disk files. This sequence minimizes additional rewrite overhead.