5. Observability#
5.1. Statistics#
If you have enabled statistics, QuasarDB will collect runtime statistics per node. QuasarDB uses its direct key/value API to store these statistics.
Key |
Type |
Description |
---|---|---|
|
int64 |
The total number of bytes pulled by the asynchronous pipelines |
|
int64 |
The total count of pull operations performed by the asynchronous pipelines |
|
blob |
The QuasarDB engine build date |
|
blob |
The QuasarDB engine version |
|
int64 |
The total count of evicted items from the in-memory cache of data on-disk |
|
int64 |
The total number of bytes evicted |
|
int64 |
The detected concurrent number of hardware threads supported on the system |
|
int64 |
The date when the license was attributed in Unix epoch timestamp format |
|
int64 |
The date of license expiration |
|
int64 |
The memory limit set by the license |
|
int64 |
The remaining days until the license expiration |
|
int64 |
The date until the support is valid |
|
int64 |
The startup timestamp |
|
int64 |
The startup time |
|
blob |
A string representing the node id |
|
blob |
A string representing the operating system |
|
int64 |
The total count of data read from disk into the in-memory cache |
|
int64 |
The total number of bytes read from disk into the in-memory cache |
|
int64 |
The number of partitions |
|
int64 |
The number of times a request was incorrectly routed to this node by a client |
|
int64 |
The number of times the node’s predecessor changed |
|
int64 |
The number of times the node’s successor changed |
|
int64 |
The number of times the node returned to a client an “unstable” error |
|
int64 |
The cumulated CPU idle time |
|
int64 |
The cumulated CPU system time |
|
int64 |
The cumulated CPU user time |
|
blob |
The persistence path |
|
int64 |
The bytes free on the persistence path |
|
int64 |
The total bytes on the persistence path |
|
int64 |
The number of bytes cached in memory for persistence |
|
int64 |
The number of bytes used by the memtable for persistence |
|
int64 |
The number of bytes in the memtable waiting to be flushed for persistence |
|
int64 |
The number of bytes used by the table reader for persistence |
|
int64 |
The total number of bytes used for persistence in memory |
|
int64 |
The computed amount of RAM used for data by QuasarDB |
|
int64 |
Physical RAM free bytes count |
|
int64 |
Physical RAM used bytes count |
|
int64 |
The number of bytes currently resident in memory |
|
int64 |
The number of entries in RAM |
|
int64 |
Virtual memory free bytes count |
|
int64 |
Virtual memory used bytes count |
|
int64 |
The current count of connected users |
|
int64 |
The configured maximum number of sessions |
|
int64 |
The current number of available sessions |
|
int64 |
The current number of used sessions |
|
int64 |
The total count of bucket reads |
|
int64 |
The total count of bucket updates |
|
int64 |
The persistence layer storage capacity, in bytes. May be 0 if the value is unknown. |
|
int64 |
The number of bytes in the cloud local cache |
|
int64 |
The number of bytes currently used in the persistence layer |
|
int64 |
The cumulated number of bytes read |
|
int64 |
The cumulated number of bytes written |
|
int64 |
The current number of entries in the persistence layer |
|
blob |
Additional information about the persistence layer |
|
int64 |
The number of bytes in the persistent cache |
|
int64 |
The total count of successful writes to the persistence layer |
|
int64 |
The total number of matching aggregation queries |
|
int64 |
The total number of matching filter queries |
|
int64 |
The total number of bytes received in requests |
|
int64 |
The total count of request errors |
|
int64 |
The cumulated number of requests |
|
int64 |
The cumulated number of successful operations |
|
int64 |
The total number of bytes sent in responses by the server |
|
int64 |
The shutdown timestamp |
|
int64 |
The online status check result (1 for online, 0 for offline) |
|
int64 |
The duration of the check operation in milliseconds |
5.1.1. Retrieving Statistics from a QuasarDB Cluster#
5.1.1.1. Using QuasarDB Python Client Library#
This section provides a Python script that demonstrates how to use the QuasarDB Python API to connect to a QuasarDB cluster and retrieve statistics of all nodes in the cluster. The script outlines the process for accessing cumulative and by_uid statistics and provides guidance on interpreting various types of stats.
Prerequisites
Before running the script, make sure you have the following installed:
Python (3.x or higher)
QuasarDB Python API (quasardb)
Python Script
import quasardb
import quasardb.stats as qdbst
import json
with quasardb.Cluster('qdb://127.0.0.1:2836') as conn:
stats = qdbst.by_node(conn)
print(json.dumps(stats, indent=4))
{
"127.0.0.1:2836": {
"by_uid": {},
"cumulative": {
"async_pipelines.pulled.total_bytes": 0,
"async_pipelines.pulled.total_count": 0,
"cpu.idle": 6012100000,
"cpu.system": 166390000,
"cpu.user": 318260000,
"disk.bytes_free": 221570932736,
"disk.bytes_total": 274865303552,
"disk.path": "insecure/db/0-0-0-1",
"engine_build_date": "2023.12.06T20.59.15.000000000 UTC",
"engine_version": "3.15.x",
"evicted.count": 1,
"evicted.total_bytes": 209,
"hardware_concurrency": 8,
"license.attribution_date": 1701900515,
"license.expiration_date": 0,
"license.memory": 8589934592,
"license.remaining_days": 31337,
"license.support_until": 0,
"memory.persistence.cache_bytes": 104,
"memory.persistence.memtable_bytes": 1071088,
"memory.persistence.memtable_unflushed_bytes": 1071088,
"memory.persistence.table_reader_bytes": 0,
"memory.persistence.total_bytes": 1071192,
"memory.physmem.bytes_total": 33105100800,
"memory.physmem.bytes_used": 12619468800,
"memory.resident_bytes": 0,
"memory.resident_count": 0,
"memory.vm.bytes_total": 140737488351232,
"memory.vm.bytes_used": 1776730112,
"network.current_users_count": 0,
"network.sessions.available_count": 510,
"network.sessions.max_count": 512,
"network.sessions.unavailable_count": 2,
"node_id": "0-0-0-1",
"operating_system": "Linux 5.10.199-190.747.amzn2.x86_64",
"partitions_count": 8,
"perf.common.get_type_for_removal.deserialization.total_ns": 1374,
"perf.common.get_type_for_removal.processing.total_ns": 8101,
"perf.common.get_type_for_removal.total_ns": 9475,
"perf.control.get_system_info.deserialization.total_ns": 4851,
"perf.control.get_system_info.processing.total_ns": 599,
"perf.control.get_system_info.total_ns": 5450,
"perf.total_ns": 249523,
"perf.ts.create_root.content_writing.total_ns": 16186,
"perf.ts.create_root.deserialization.total_ns": 1717,
"perf.ts.create_root.entry_writing.total_ns": 53294,
"perf.ts.create_root.processing.total_ns": 131766,
"perf.ts.create_root.serialization.total_ns": 128,
"perf.ts.create_root.total_ns": 203091,
"perf.ts.get_columns.deserialization.total_ns": 1409,
"perf.ts.get_columns.processing.total_ns": 30098,
"perf.ts.get_columns.total_ns": 31507,
"persistence.capacity_bytes": 0,
"persistence.cloud_local_cache_bytes": 0,
"persistence.entries_count": 1,
"persistence.info": "RocksDB 6.27",
"persistence.persistent_cache_bytes": 0,
"persistence.read_bytes": 10245,
"persistence.utilized_bytes": 0,
"persistence.written_bytes": 25929,
"requests.bytes_in": 851,
"requests.bytes_out": 32,
"requests.errors_count": 2,
"requests.successes_count": 4,
"requests.total_count": 6,
"startup": 1701900515,
"startup_time": 1701900515,
"check.online": 1,
"check.duration_ms": 4
}
}
}
5.1.1.2. Using qdbsh#
In qdbsh, you can use the direct_prefix_get command to retrieve a list of all statistics. Here’s an example:
qdbsh > direct_prefix_get $qdb.statistics 100
1. $qdb.statistics.async_pipelines.pulled.total_bytes
2. $qdb.statistics.async_pipelines.pulled.total_count
3. $qdb.statistics.cpu.idle
4. $qdb.statistics.cpu.system
...
52. $qdb.statistics.startup_time
qdbsh > direct_int_get $qdb.statistics.cpu.user
1575260000
qdbsh > direct_blob_get $qdb.statistics.node_id
c5fe30bf0154acc-5d63bd06e7878b9c-5f635b3cf7fc3560-dbe35df7b5080651
1:12
Note
The direct_prefix_get command is used to retrieve a list of statistics keys that match a specific prefix. In this case, the command is used to fetch statistics keys that start with the prefix $qdb.statistics. The number 100 is a parameter that limits the maximum number of keys returned by the command.
5.1.2. Understanding “by_uid” and “cumulative” Statistics#
The retrieved statistics are organized into two main dictionaries: “by_uid” and “cumulative.”
“by_uid”: This dictionary contains user statistics for a secure cluster with users. It groups statistics by their corresponding user IDs.
“cumulative”: This dictionary holds cumulative statistics for each node in the cluster. These statistics provide aggregated information across all users and are typically global values for the entire node.
5.1.2.1. Interpreting Cumulative Statistics#
Many statistics, like CPU user usage, are cumulative. To interpret them correctly, you need to get the difference between the values and consider the time range between measurements.
For example, if you retrieve the CPU user usage twice, like this:
qdbsh > direct_int_get $qdb.statistics.cpu.user
1576680000
qdbsh > direct_int_get $qdb.statistics.cpu.user
1576740000
In this scenario, the CPU user usage increased from 1576680000 to 1576740000 between the two measurements. To determine the CPU user usage during that time range, you can use the calculate_delta(prev, curr) command.
5.1.3. Statistic Type#
Statistics can be categorized into two types: “gauge” and “counter.”
“Gauge” Statistics: Gauge statistics represent instantaneous measurements of a value at a particular point in time. For example, “memory.vm.bytes_used” indicates the current virtual memory used.
“Counter” Statistics: Counter statistics represent continuously increasing values that keep track of changes over time. For example, “requests.total_count” records the total number of requests received since the node started.
To retrieve the statistic type of a specific statistic, you can use the QuasarDB Python API along with the following code:
import quasardb
import quasardb.stats as qdbst
with quasardb.Cluster('qdb://127.0.0.1:2836') as conn:
stat_id = "requests.successes_count"
stat_type = qdbst.stat_type(stat_id)
print(f"The statistic type for '{stat_id}' is: {stat_type}")
The statistic type for 'requests.successes_count' is: ('counter', 'count')
In this format, you will find statistics related to various aspects of the QuasarDB cluster, such as system resources, persistence, network, queries, and more. The “by_uid” dictionary may contain user statistics for a secure cluster with users. The “cumulative” dictionary holds cumulative statistics for each node in the cluster.
5.2. Performance Tracing#
If you have enabled performance profiling from the server-side, you can run detailed performance traces of the operations that you execute on your cluster. This will provide you with a detailed, step-by-step trace of tje low-level operations, and are usually useful when you are debugging together with your Solution Architect.
If you have enabled statistics in conjunction with performance traces, you will get additional performance trace metrics in your statistics.
To use performance traces from the QuasarDB shell, use the enable_perf_trace
command as follows:
$ qdbsh
quasardb shell version 3.5.0master build ce779a9 2019-10-23 00:04:01 +0000
Copyright (c) 2009-2019 quasardb. All rights reserved.
Need some help? Check out our documentation here: https://doc.quasardb.net
qdbsh > enable_perf_trace
qdbsh > create table testable(col1 int64, col2 double)
*** begin performance trace
total time: 175 us
- function + ts.create_root - 175 us [100.00 %]
|
| data received: 0 us - delta: 0 us [00.00 % - 00.00 %]
| deserialization starts: 2 us - delta: 2 us [01.14 % - 01.14 %]
| deserialization ends: 7 us - delta: 5 us [02.86 % - 02.86 %]
| entering chord: 9 us - delta: 2 us [01.14 % - 01.14 %]
| dispatch: 14 us - delta: 5 us [02.86 % - 02.86 %]
| deserialization starts: 24 us - delta: 10 us [05.71 % - 05.71 %]
| deserialization ends: 26 us - delta: 2 us [01.14 % - 01.14 %]
| processing starts: 28 us - delta: 2 us [01.14 % - 01.14 %]
| entry trimming starts: 79 us - delta: 51 us [29.14 % - 29.14 %]
| entry trimming ends: 81 us - delta: 2 us [01.14 % - 01.14 %]
| content writing starts: 82 us - delta: 1 us [00.57 % - 00.57 %]
| content writing ends: 141 us - delta: 59 us [33.71 % - 33.71 %]
| entry writing starts: 141 us - delta: 0 us [00.00 % - 00.00 %]
| entry writing ends: 172 us - delta: 31 us [17.71 % - 17.71 %]
| processing ends: 175 us - delta: 3 us [01.71 % - 01.71 %]
*** end performance trace
In the trace above, for example, we can see the performance trace of the CREATE TABLE
statement, and get a detailed idea of where it’s spending most of its time. In this case, the total operation lasted 175 microseconds, and a detailed breakdown of the low level function timings.
5.3. User properties#
User properties are a way to attach metadata to specific connection. They are key-value pairs that can be used to store additional information, useful when you need a way to identify a connection for debugging or logging purposes.
Currently user properties can be set from QuasarDB Python API.
User properties are logged server side, when JSON log output is enabled. Information on how to enable JSON logging can be found here.
5.3.1. Identifying which connection is pushing data in small increments#
Let’s say you have multiple applications writing data to same server. One of those applications is writing data in small increments.
This could impact server performance and is best to avoid when not using async pipelines. Lets use user properties to help identify troubling connection.
If your server is set to detect and log small increment writes, you can modify your applications to use unique user properties to identify which application is causing the issue.
5.3.1.1. Setting user properties from Python API and triggering logging for small increment writes#
import quasardb
import datetime
import quasardb.pandas as qdbpd
import pandas as pd
data = {
"column_1": [42]
}
idx = [datetime.datetime.now()]
df = pd.DataFrame(data, index=idx)
with quasardb.Cluster("qdb://127.0.0.1:2836") as conn:
# first make sure that the user property is enabled
conn.options().enable_user_properties()
# now you can set user property
conn.properties().put("application_id", "0")
# write single row
qdbpd.write_dataframe(df, conn, "t", create=True)
Now you should see in the server logs message for small increment coming from this connection.
{"timestamp":"2025-01-16T08:58:06.664698300Z","process_id":16548,"thread_id":33640,"level":"warning","message":"small incremental insert detected: append increased data size for shard t/0ms by only 0.012% (below threshold of 10%). This negatively affects write performance. To turn off this message, set 'log_small_append_percentage' to 0 in your qdbd config file.","$client_hostname":"hal-9000","$client_version":"3.14.2 3.14.2.dev0 d82b8b86d71c9334951b442b937abf9a598eda64 2025-01-14 10:18:10 -0500","$client_target":"AMD64 core2 64-bit","application_id":"0","$client_timestamp":"2025-01-16T09:58:06+01:00","$client_platform":"Microsoft Windows 11 (build 26100), 64-bit"}
With this information you can now identify which application is causing the issue and take appropriate action.
More examples for Python API can be found in Python API documentation.