1. Overview#

1.1. Why Observability Matters#

Observability is the foundation of operating QuasarDB in a reliable and predictable manner. It enables administrators and engineers to answer critical questions such as:

  • Is the cluster healthy?

  • Is data being ingested at the expected rate?

  • Why are some queries slower than usual?

  • Are any nodes under- or over-utilized?

By collecting and analyzing system metrics, logs, and trace data, operators can detect anomalies early, identify performance bottlenecks, and respond to incidents before they impact users or downstream systems.

QuasarDB is a distributed, high-performance time series database. This architecture introduces additional complexity: components may fail independently, ingestion rates may spike without warning, and queries may stress unexpected parts of the system. Observability helps make these behaviors visible and actionable.

1.2. What Observability Includes#

In QuasarDB, observability spans several key domains:

  • Monitoring Collect and analyze metrics that provide insight into system health, resource usage, ingestion rates, and query latency.

  • Logging Capture structured and unstructured log output from nodes and clients to understand system events and diagnose issues.

  • Performance Tracing Record detailed execution traces of slow queries to identify where time is spent.

  • User-defined Properties Tag queries with metadata (e.g., user IDs, system names) to improve traceability in logs and metrics.

Together, these tools provide a full picture of how the system behaves over time and under different conditions.

1.3. How This Chapter Is Organized#

This section of the documentation is structured as follows:

  • Monitoring QuasarDB Clusters A top-down guide to cluster health monitoring, from business goals to practical dashboards and alerts.

  • Metrics Reference A comprehensive, categorized listing of all QuasarDB metrics, their types, and units.

  • Dashboards Panels and visualizations for common monitoring patterns, and help building good dashboards.

  • Logs and Troubleshooting Techniques for identifying and resolving problems using logs, slow query diagnostics, and system events.

  • Performance Tracing Instructions for enabling and using trace output to debug slow queries and system behavior.

  • User-defined Properties How to set and use custom query metadata for improved correlation and debugging.

By following these guides, you can build a robust monitoring and diagnostic setup that scales with your deployment and operational needs.