5. Functions & Operators#
5.1. Aggregate functions#
QuasarDB supports many server-side operations that can be applied on timeseries data. Below is an overview of the functions available and the value types they can operator on.
Calculations performed using these functions are distributed over the cluster, and many of them are vectorized. Since 3.0.0 QuasarDB uses light-weight indexes to speed up lookups and computations where possible.
Using an aggregation function on a column always requires all other columns to be grouped by.
The following query demonstrates a valid aggregation:
SELECT instrument_id, AVG(value) FROM measurements GROUP BY instrument_id
In this case, we use the AVG
aggregation function, and group by instrument_id. This query, however, is invalid:
SELECT instrument_id, AVG(value) FROM measurements
- Grouping based on durations always requires the special
$timestamp
field to be selected as follows:: SELECT $timestamp, AVG(value) FROM measurements GROUP BY 1h
This query will then yield the aggregation in groups of 1h.
Operation
Description
Input value
Return value
Complexity
Vectorized
Basic operations
first
First element
Any
Any
Constant
No
last
Last element
Any
Any
Constant
No
min
Minimum value
Double, int64
Double, int64
Linear
Yes
max
Maximum value
Double, int64
Double, int64
Linear
Yes
count
Number of elements
Any
Int64
Constant
No
sum
Sum of values
Double, int64
Double, int64
Linear
Yes
Averages
avg
arithmetic_mean
Arithmetic mean
Double, int64
Double, int64
Linear
Yes
harmonic_mean
Harmonic mean
Double, int64
Double, int64
Linear
No
geometric_mean
Geometric mean
Double, int64
Double, int64
Linear
No
quadratic_mean
Quadratic mean
Double, int64
Double, int64
Linear
Yes
Statistics
distinct_count
Number of distinct elements
Any
Int64
Linear
No
adjacent_count
Number of differences between successive elements
Any
Int64
Linear
No
product
Product
Double, int64
Double, int64
Linear
Yes
spread
Spread
Double, int64
Double, int64
Linear
Yes
skewness
Skewness
Double, int64
Double, int64
Linear
No
kurtosis
Kurtosis
Double, int64
Double, int64
Linear
No
abs_min
min(abs(x))
Double, int64
Double, int64
Linear
No
abs_max
max(abs(x))
Double, int64
Double, int64
Linear
No
sum_of_squares
abs_energy
Sum of squares
Double, int64
Double, int64
Linear
Yes
sample_variance
Sample variance
Double, int64
Double, int64
Linear
No
sample_stddev
Sample standard deviation
Double, int64
Double, int64
Linear
No
population_variance
Population variance
Double, int64
Double, int64
Linear
No
population_stddev
Population standard deviation
Double, int64
Double, int64
Linear
No
correlation
Population correlation
Double, int64
Double
Linear
No
covariance
Population covariance
Double, int64
Double
Linear
No
sum_of_abs
Sum of absolute values
Double, int64
Double, int64
Linear
No
sum_of_abs_changes
Sum of absolute value changes
Double, int64
Double, int64
Linear
No
slope
Slope
Double, int64
Double, int64
Linear
No
macd
- Moving Average Convergence
Divergence
Double, int64
Double, int64
Linear
No
rsi
Relative Strength Index
Double, int64
Double, int64
Linear
No
array_accum
Accamulate arrays
Any
Array
Linear
No
Note
Retrieving timestamps for first/last/min/max
While most aggregation functions simply operate only on the group they operate in and return a single value, the FIRST
, LAST
, MIN
and MAX
aggregations have an additional trait: they always point to the value of a single row.
As such, when you get the result of running e.g. MAX(value)
, you often also want to know when this occured. For this reason, these specific aggregation functions also allow you to select the associated timestamp by running e.g. MAX(value).$timestamp
.
- For Example
SELECT MAX(value), MAX(value).$timestamp, MIN(value), MIN(value).$timestamp FROM measurements
This enables you not only calculate these aggregations, but also exactly when they occurred.
5.2. Window functions#
Window functions are similar to aggregation functions, but instead of aggregating “buckets”, they instead operate over a sliding window of multiple rows, and accumulate their results in the process.
Operation
Description
Input value
Return value
Complexity
Vectorized
cumsum
cumulative_sum
Cumulative sum
Double, int64
Double, int64
Linear
No
cumprod
cumulative_product
Cumulative product
Double, int64
Double, int64
Linear
No
ma
||moving_average
Moving average
Double, int64
Double, int64
Linear
No
moving_min
Moving minimum
Double, int64
Double, int64
Linear
No
moving_max
Moving maximum
Double, int64
Double, int64
Linear
No
5.2.1. OVER Clause#
“OVER” is a feature that takes inspiration from SQL window functions. It’s like having a moving window over your data. Imagine this window sliding through your dataset, collecting and summarizing information as it goes.
With “OVER,” you can specify how this window behaves, defining the scope and limits of its movement. This allows you to perform all sorts of calculations and aggregations on your data.
It’s essential to know that it doesn’t yet fully support the official SQL syntax. Currently, it only works with time-based ranges. As this feature develops, the existing ‘transformations’ will eventually become alternative ways to use ‘OVER.
5.2.2. Examples#
Cumulative Sum Over All Rows
- In this example, we calculate the cumulative sum of the y column over all rows without specifying a time range.
CREATE TABLE table_name (x DOUBLE, y DOUBLE); INSERT INTO table_name ($timestamp, x, y) VALUES (2023-01-01, 1, 2), (2023-01-02, 2, 4), (2023-01-03, 3, 6)
SELECT $timestamp, x, sum(y) OVER() FROM table_name; $timestamp x sum(y) OVER() ----------------------------------------------------------------- 2023-01-01T00:00:00.000000000Z 1 12 2023-01-02T00:00:00.000000000Z 2 12 2023-01-03T00:00:00.000000000Z 3 12 Returned 3 rows in 17,830 us Scanned 6 points in 17,830 us (336 rows/sec)
- Specifying a Time-Based Range
SELECT $timestamp, x, sum(y) OVER(RANGE($unbounded, $unbounded)) AS z FROM table_name; $timestamp x z ----------------------------------------------------------------- 2023-01-01T00:00:00.000000000Z 1 12 2023-01-02T00:00:00.000000000Z 2 12 2023-01-03T00:00:00.000000000Z 3 12 Returned 3 rows in 1,976 us Scanned 6 points in 1,976 us (3,035 rows/sec)
When you use RANGE($unbounded, $unbounded) in the “OVER” clause, it means you’re including all the rows in your dataset, allowing you to calculate aggregates or cumulative values without any time-based limitations. It’s a way to analyze the entire dataset as a whole.
Defining the time-based constraints for the analytical operation can be achieved in different ways using ‘RANGE’. For example, you can use ‘($unbounded, $timestamp)’ (range that starts from the beginning and goes up to the current timestamp) or ‘(‘2023-01-02’, $timestamp)’ (range that starts from a specific timestamp (January 2, 2023) and goes up to the current timestamp).
5.2.3. Usage:#
Think of “window functions” as a group of special math operations that you can do with your data. They include things like adding up numbers, finding averages, or picking the biggest or smallest values.
Now, here’s where “OVER” comes in: It’s like putting a frame around a part of your data. This frame tells those math operations exactly which data you want to work with.
Imagine you have a list of numbers. If you want to add up only the numbers in the last three positions, you use “OVER” to set up that frame. It’s like telling the math to only look at those last three numbers.
So, “window functions” are the math operations, and “OVER” helps you focus those operations on a specific part of your data. It’s super handy when you want to do things like adding up values as they slide through your data or finding the average in a moving window of time.
5.3. Distinguishing ‘OVER’ for Timeseries Tables and Aggregated Tables#
QuasarDB uses the ‘OVER’ function as a window function, allowing you to perform various calculations within defined windows. It’s crucial to understand that ‘OVER’ applies to both Timeseries Tables and Aggregated Tables, with variations in their implementations.
5.3.1. ‘OVER’ for Sliding Windows (Timeseries Tables)#
What It Does: Think of “OVER” in window functions like a spotlight on your data. It helps you perform calculations and find insights by focusing on specific portions of your dataset as it moves through time.
How It Works:
Sliding Windows: In this scenario, “OVER” defines time-based windows that move with your data. It’s like a spotlight sliding over your information, highlighting different parts as it goes.
Row-Centric: The calculations are done on individual rows, like analyzing a single frame within the sliding spotlight. Each row gets its calculations based on the data within its specific window.
Ordered by Time: When you’re dealing with time-based windows, the order of the calculations always follows the time of your data points. It ensures that everything aligns properly.
One unique feature of “OVER” in window functions is the ability to create “sliding” windows. For instance, using an SQL-like expression such as:
MAX(value) OVER($timestamp - 15 minutes)
means that, for each row, the system calculates the maximum value within a 15-minute window relative to that row’s timestamp. This calculation is based on the position of each row, and the window slides as you move through the dataset.
5.3.2. ‘OVER’ for Fixed Windows (Aggregated Tables)#
What It Does: With “OVER” in aggregated tables, you’re setting up fixed windows to analyze your data. These windows don’t move like a spotlight but rather remain in specific positions.
How It Works:
Fixed Windows: In this context, ‘OVER’ creates absolute time-based windows. These windows are predetermined by your table’s configuration. They don’t shift or change position as data flows in.
Window Configuration: The windows in aggregated tables are established by your table’s configuration. You don’t need to set a unique window for each data point. The window size and location are defined in advance.
Different Result Counts: In window functions, you get as many results as there are rows, each with its own window. In aggregated tables, the number of results matches the number of predefined windows you’ve set for the table.
In a nutshell, ‘OVER’ for Sliding Windows (Timeseries Tables) allows you to perform dynamic calculations within sliding time-based windows that are relative to each data point. On the other hand, ‘OVER’ for Fixed Windows (Aggregated Tables) allows you to work with predetermined, fixed windows that are set by your table’s configuration.
Remember, the terminology like “sliding” or “tumbling” windows might sound confusing, but it’s primarily related to aggregated tables and how their predefined windows behave. In window functions, the term “sliding” may also be used, but it refers to windows centered around your data and moving as time progresses.
5.4. Comparison operators#
For WHERE clauses, QuasarDB supports different boolean operators to filter rows from the results. Below is an overview of the operators available and the value types they can operator on.
Operator
Description
Applies to
==
Equal
Any
!=
Not equal
Any
<
Less than
Double, int64
>
Greater than
Double, int64
<=
Less than or equal to
Double, int64
>=
Greater than or equal to
Double, int64
in find(...)
Is contained in the resulting set
Blob, double, int64, string
You can use inline key/value lookups to check if a value is contained in the set of blobs/ints resulting from it. See IN FIND example.
5.5. Logical operators#
You can compose boolean operators in WHERE clause with logical operators to filter rows from the results. Below is an overview of the operators available.
Operator
Description
and
Logical and
or
Logical or
not
Logical not
5.6. Arithmetic operator#
Bitwise operators allow you to combine values in SELECT and WHERE clauses. Below is an overview of the operators available.
Operator
Description
Applies to
+
Addition
double, int64
-
Substraction
double, int64
*
Multiplication
double, int64
/
Division
double, int64
%
Modulus
double, int64
||
String concatenation
(string, string), (string, int64), (int64, string)
Aggregations with a single result can be combined with arithmetic, which allows to use expressions like sum(b - min(a))
.
5.7. Bitwise operators#
Bitwise operators allow you to combine values in SELECT and WHERE clauses. Below is an overview of the operators available.
Operator
Description
Applies to
&
Bitwise and
int64
5.8. Regular expressions#
Regular expressions on blobs are possible within WHERE clauses. QuasarDB uses the extended POSIX regular expression grammar.
Operator
Description
Applies to
~
Regex case sensitive match
blob
!~
Regex case sensitive no match
blob
~*
Regex case insensitive match
blob
!~*
Regex case insensitive no match
blob
5.10. Geographical functions#
Built-in functions are available for your convenience to store and query effiently geographical data.
Profile
Description
Input value
Return value
geohash64(double, double) -> int64
Transforms a latitude and a longitude into a sub-meter precise geohash.
Latitude and longitude
int64 hash
5.11. Lookup function#
The Lookup function enables you to map the result of an expression to the key-value store inside QuasarDB.
It stringifies the result of an expression. It will then return the content of keys matching the entry.
For example, if the row contains the integer 1, it will be transformed to the string 1.
Profile
Description
Input value
Return value
lookup(expression, type) -> column
Lookup keys based on the stringified result of the expression.
Expression and type
column of <type>s
lookup(expression) -> column
Equivalent to
lookup(expression, STRING)
Expression
column of strings
Type can be any of BLOB
, DOUBLE
, INT64
, STRING
, or TIMESTAMP
.
If the key is of the wrong type or missing, no row will be returned.
For compatibility, the form taking a prefix string lookup(string, expression)
is implicitly transformed
into string concatenation lookup(string || expression)
.
5.11.1. Examples#
For the following table trades
:
timestamp
agent
value
2018-01-01T10:01:02
3
4
2018-01-01T10:01:04
2
5
2018-01-01T10:01:08
1
4
And with the following blob keys (added through the key/value API):
Key
Value
‘agentid.1’
‘Bob’
‘agentid.2’
‘Terry’
‘agentid.3’
‘Alice’
The following query:
select lookup('agentid.' || agent), value from trades where value=4
Will output:
timestamp
lookup((‘agentid.’||agent))
value
2018-01-01T10:01:02
‘Alice’
4
2018-01-01T10:01:08
‘Bob’
4
For the following table basket
:
timestamp
item
quantity
2021-01-01T10:01:02
‘apples’
3
2021-01-01T10:01:04
‘oranges’
2
2021-01-01T10:01:08
‘bananas’
1
And with the following blob keys (added through the key/value API):
Key
Value
‘price.apples’
‘0.54’
‘price.bananas’
‘0.80’
‘price.oranges’
‘0.63’
The following query:
select item, quantity * lookup('price.' || item, DOUBLE) as price from basket
Will output:
timestamp
item
price
2021-01-01T10:01:02
‘apples’
1.62
2021-01-01T10:01:04
‘oranges’
1.26
2021-01-01T10:01:08
‘bananas’
0.8
5.12. Cast function#
The Cast function enables you to transform the result of an expression.
Profile
Description
Input value
Return value
cast(expression as EPOCH_NS) -> int64
Number of nanoseconds since 1 January 1970.
timestamp
int64
cast(expression as EPOCH_US) -> int64
Number of microseconds since 1 January 1970.
timestamp
int64
cast(expression as EPOCH_MS) -> int64
Number of milliseconds since 1 January 1970.
timestamp
int64
cast(expression as EPOCH_S) -> int64
Number of seconds since 1 January 1970.
timestamp
int64
5.13. Rounding Functions#
5.13.1. round#
The round
function is used to round a number to a specified number of decimal places using various rounding methods.
Parameters:
number
(required): The number to round.decimals
(optional, defaults to 0): The number of decimal places to round to.method
(optional, defaults to “nearest”): The rounding method to use. It can be one of the following:
Method |
Description |
Example |
---|---|---|
“nearest” |
Rounds to the nearest value. |
SELECT ROUND(3.45); Result: 3
|
“inward” |
Rounds towards zero (floor for negative numbers, ceiling for positive numbers). |
SELECT ROUND(3.5, 0, 'inward'); Result: 3
|
“outward” |
Rounds away from zero (ceiling for negative numbers, floor for positive numbers). |
SELECT ROUND(3.5, 0, 'outward'); Result: 4
|
“lower” |
Always rounds down (towards negative infinity). |
SELECT ROUND(3.45, 0, 'lower'); Result: 3
|
“upper” |
Always rounds up (towards positive infinity). |
SELECT ROUND(3.45, 0, 'upper'); Result: 4
|
5.13.2. ceil#
The ceil
function is a shortcut for rounding up (towards positive infinity). It takes the same parameters as round
and uses the “upper” method by default.
SELECT CEIL(3.456, 1); -- Equivalent to: SELECT ROUND(3.456, 1, 'upper');
Expected Result: 3.5
5.13.3. floor#
The floor
function is a shortcut for rounding down (towards negative infinity). It takes the same parameters as round
and uses the “lower” method by default.
SELECT FLOOR(3.456, 1); -- Equivalent to: SELECT ROUND(3.456, 1, 'lower');
Expected Result: 3.4
5.13.4. Examples#
Here are some examples of how to use these rounding functions:
Rounding to the nearest integer:
select round(3.55);
// Result: 4
Rounding to a specific number of decimal places:
select round(3.456, 2);
// Result: 3.46select round(3.456, 1);
// Result: 3.5
Using different rounding methods:
select round(-3.5, 0, "outward");
// Result: -4
These functions allow you to control how numbers are rounded in your queries, whether you need precise rounding, always rounding up or down, or using different rounding methods.