3. Metrics reference#
This document lists all kernel-level metrics available in QuasarDB, categorized by subsystem. Each metric includes its name, type (accumulator or gauge), and a concise description. This structure is optimized for both human readability and AI model training purposes.
3.1. Cache#
Metric Name |
Type |
Description |
---|---|---|
|
accumulator |
How many evictions were done (from the QuasarDB cache) |
|
accumulator |
How many bytes were evicted |
|
gauge |
The size of all block caches bytes (RocksDB) - we don’t use the row cache at this time |
|
gauge |
The number of bytes in the memtables (RocksDB) |
|
gauge |
The numbers of unflushed bytes in the memtables (RocksDB) |
|
gauge |
Memory usage of all RocksDB table readers |
|
gauge |
Memory usage of the persistence: memtable bytes, table reader, cache bytes |
|
gauge |
The total amount of physical memory in bytes detected on the machine |
|
gauge |
The amount of physical memory in bytes used on the machine |
|
gauge |
The size in bytes of all entries currently in memory |
|
gauge |
The number of entries in memory, an entry is a value in the internal hash table, which is correlated, but not identical, to table counts |
|
gauge |
Advanced internal stat used for debugging complex memory issues only |
|
gauge |
When the allocator will use huge pages (if supported) |
|
gauge |
The total bytes of large objects (eg big allocations that don’t fit in the optimized structures) |
|
gauge |
The total count of large objects (eg big allocations that don’t fit in the optimized structures) |
|
gauge |
Advanced internal stat used for debugging complex memory issues only |
|
gauge |
The largest allocation request ever made |
|
gauge |
The threshold, in bytes, where TBB will return memory to the OS. Below that threshold, TBB will hold the bytes. |
|
gauge |
Total bytes currently allocated (managed) by TBB - note not every allocation in Quasar goes through TBB |
|
gauge |
The number of allocations made to TBB |
|
gauge |
How many bytes of virtual memory the process can use, this value is usually extremely high on 64-bit operating systems |
|
gauge |
How many bytes of virtual memory the process is currently using, can be much higher than the actual memory usage when memory is reserved but not actually used |
|
accumulator |
How many pagins were done by Quasar (from disk to Quasar cache) |
|
accumulator |
How many bytes were paged in |
3.2. Cache - LRU2#
QuasarDB uses a two-level LRU (LRU2) caching strategy consisting of a cold and hot layer. New entries are first placed in the cold cache and only promoted to the hot cache on repeated access. This design improves hit rates for frequently accessed data while avoiding pollution by one-time reads.
- The LRU2 metrics help observe:
Cold/hot cache pressure (evictions, promotions)
Cache efficiency (hit ratios)
I/O load due to cache misses (page-ins)
Metric Name |
Type |
Description |
---|---|---|
|
accumulator |
Total number of entries read from disk into the cold cache layer |
|
accumulator |
Total bytes read from disk into the cold cache layer |
|
accumulator |
Number of entries removed from the cold cache before promotion to hot |
|
accumulator |
Bytes removed from the cold cache before promotion to hot |
|
gauge |
Current number of entries in the cold cache |
|
accumulator |
Total number of evictions from the hot cache |
|
accumulator |
Total bytes evicted from the hot cache |
|
accumulator |
Total number of entries promoted from cold to hot cache |
|
accumulator |
Total bytes promoted from cold to hot cache |
|
accumulator |
Number of cache hits in the hot layer (entry already promoted) |
|
accumulator |
Total bytes hit in the hot cache |
|
gauge |
Current number of entries in the hot cache |
3.3. Clustering#
Metric Name |
Type |
Description |
---|---|---|
|
accumulator |
How many times the client sent a request to the wrong node |
|
accumulator |
How many times the predecessor changed, if more than a couple of times, cluster has issues |
|
accumulator |
Same as predecessor but for successor |
|
accumulator |
How many times we returned “unstable cluster” to the user |
|
accumulator |
cluster to cluster time elapsed in seconds |
|
accumulator |
cluster to cluster error count |
|
accumulator |
cluster to cluster successes count |
3.4. Environment#
Metrics related to the environment in which QuasarDB is running, such as the OS, license, quasardb version, etc.
Metric Name |
Type |
Description |
---|---|---|
|
gauge |
The value returned by std::thread::hardware_concurrency(), very useful to diagnose problems |
|
gauge |
When the license was attributed, epoch |
|
gauge |
Expiration date in seconds from epoch |
|
gauge |
The maximum number of bytes allowed by the node |
|
gauge |
Numbers of days left until the license expires |
|
gauge |
When the support will expire in seconds from epoch |
|
gauge |
Startup time stamp in seconds from epoch |
3.5. Indexes#
Metrics that relate to the microindex subsystem of QuasarDB, which speeds up queries.
Metric Name |
Type |
Description |
---|---|---|
|
accumulator |
How many times an aggregation successfully leveraged the microindex |
|
accumulator |
How many times an aggregation could not leverage the microindex |
|
accumulator |
How many times a filter (eg WHERE) successfully leveraged the microindex |
|
accumulator |
How many times a filter (eg WHERE) could not leverage the microindex |
3.6. Network#
Network related metrics, useful for understanding the number of requests, simulatenous users and network throughput.
Metric Name |
Type |
Description |
---|---|---|
|
gauge |
How many users currently have an active session |
|
gauge |
How many partitions are there |
|
gauge |
How many sessions are available |
|
gauge |
How many sessions total are available |
|
gauge |
How many sessions are currently busy |
|
gauge |
How many threads does each partition have |
|
accumulator |
How many bytes in accrosss all calls |
|
accumulator |
How many bytes out accross all calls |
|
accumulator |
How many requests lasted for longer than log slow operation setting |
|
accumulator |
How many requests (accross all calls) we have received successes + failures |
|
accumulator |
How many successes (accross all calls) |
|
accumulator |
How many failures / errors accross all calls |
3.7. Performance profiling#
Only enabled when network.profile_performance
is enabled in qdbd. Useful for better understanding how busy the cluster is, and where the majority of the time is spent.
Metric Name |
Type |
Description |
---|---|---|
|
accumulator |
time spend in ns for the given perf metric of name function |
|
accumulator |
aggregated total for the function |
|
accumulator |
total of all measured functions in the current performance trace, helpful to compute ratios of a given function |
3.8. Storage#
Metric Name |
Type |
Description |
---|---|---|
|
accumulator |
How many bytes were written to disk for buckets, including large buckets |
|
accumulator |
How many times did we write buckets to disk, including large buckets |
|
accumulator |
How many microseconds did we spend writing buckets, including large buckets |
|
accumulator |
How many times did we delete from a bucket |
|
accumulator |
How many times did we insert into a bucket |
|
accumulator |
How many times did we read from a bucket |
|
accumulator |
How many times did we update a bucket |
|
gauge |
The current size, in bytes, of the cloud cache (RocksDB + S3) |
|
gauge |
The number of entries in the persistence layer, correlated with the number of tables/buckets, but usually higher |
|
accumulator |
How many bytes were written to disk for all the large buckets |
|
accumulator |
How many times did we write a large bucket |
|
accumulator |
How many microseconds did we spend writing a large bucket |
|
gauge |
The current size, in bytes, of the persisted cache. The persisted cache is used to cache slower I/O on faster I/O. Not to be confused with the cloud cache. |
|
accumulator |
How many “writes” (all ts operations) failed |
|
accumulator |
How many “writes” (all ts operations) succeded |
|
gauge |
How many bytes used on disk, low-level RocksDB metric |
|
gauge |
How many bytes read from disk, low-level RocksDB metric |
|
gauge |
How many bytes written to disk, lowl-level RocksDB Metric |
3.9. Storage - Async Pipelines#
These metrics relate to the async pipelines storage subsystem, which can be heavy in CPU/memory usage, typically used in streaming data use cases.
Metric Name |
Type |
Description |
---|---|---|
|
gauge |
How many bytes we have in the “merge” map of the async pipelines (a buffer) |
|
gauge |
How many entries we have in the “merge” map of the async pipelines (a buffer) |
|
accumulator |
The number of bytes merged by the async pipelines (eg smaller requests merged into a larger one) |
|
accumulator |
The number of merge operations |
|
accumulator |
denied writes because pipe is full for a given user |
|
accumulator |
same but for all users |
|
accumulator |
errors for the current user id |
|
accumulator |
errors for all users |
|
accumulator |
The time elapsed to write the state of the low priority async pipes |
|
accumulator |
How many bytes were pulled from the pipelines by the merger |
|
accumulator |
How many times data was pulled from the pipelines by the merger |
|
accumulator |
How many bytes were pushed to the pipelines by a user |
|
accumulator |
How many times data was pushed to the pipelines by a user |
|
accumulator |
How many bytes were written to disk |
|
accumulator |
How much time was spent writing to disk, this includes serialization, inserting into the timeseries structure in memory, etc |
|
accumulator |
How many failures for the given user |
|
accumulator |
How many failures for all users |
|
accumulator |
How many successes for the given user |
|
accumulator |
How many successes for all users |
3.10. Storage - Backups#
These metrics relate to backups of the storage subsystem
Metric Name |
Type |
Description |
---|---|---|
|
accumulator |
How much time did we spend backing up |
|
accumulator |
How many errors? |
|
accumulator |
Background backup successes |
|
accumulator |
How many bytes were written to disk |
3.11. Storage - Optimization#
These metrics relate to background tasks and operations for the storage subsystem that help maintain performance and manage data lifecycle.
Metric Name |
Type |
Description |
---|---|---|
|
accumulator |
Background compaction cancelations |
|
accumulator |
How much time did we spend compacting |
|
accumulator |
Background compaction failures |
|
accumulator |
Background compaction successes (not automatic, explicit calls) |
|
accumulator |
Background trim cancelations |
|
accumulator |
Background trim duration |
|
accumulator |
Background trim failures |
|
accumulator |
Background trim successes |
3.12. Metric Unit Interpretation#
Metric names use suffixes to indicate the unit or value type:
Suffix |
Meaning |
---|---|
|
Duration in nanoseconds |
|
Duration in microseconds |
|
Duration in seconds |
|
Timestamp (seconds since Unix epoch) |
|
Byte count (e.g., memory or I/O) |
|
Count of operations or events |
|
Cumulative count or size |