5. Functions & Operators#

5.1. Aggregate functions#

QuasarDB supports many server-side operations that can be applied on timeseries data. Below is an overview of the functions available and the value types they can operator on.

Calculations performed using these functions are distributed over the cluster, and many of them are vectorized. Since 3.0.0 QuasarDB uses light-weight indexes to speed up lookups and computations where possible.

Using an aggregation function on a column always requires all other columns to be grouped by.

The following query demonstrates a valid aggregation:

SELECT instrument_id, AVG(value) FROM measurements GROUP BY instrument_id

In this case, we use the AVG aggregation function, and group by instrument_id. This query, however, is invalid:

SELECT instrument_id, AVG(value) FROM measurements
Grouping based on durations always requires the special $timestamp field to be selected as follows::

SELECT $timestamp, AVG(value) FROM measurements GROUP BY 1h

This query will then yield the aggregation in groups of 1h.

Operation

Description

Input value

Return value

Complexity

Vectorized

Basic operations

first

First element

Any

Any

Constant

No

last

Last element

Any

Any

Constant

No

min

Minimum value

Double, int64

Double, int64

Linear

Yes

max

Maximum value

Double, int64

Double, int64

Linear

Yes

count

Number of elements

Any

Int64

Constant

No

sum

Sum of values

Double, int64

Double, int64

Linear

Yes

Averages

avg arithmetic_mean

Arithmetic mean

Double, int64

Double, int64

Linear

Yes

harmonic_mean

Harmonic mean

Double, int64

Double, int64

Linear

No

geometric_mean

Geometric mean

Double, int64

Double, int64

Linear

No

quadratic_mean

Quadratic mean

Double, int64

Double, int64

Linear

Yes

Statistics

distinct_count

Number of distinct elements

Any

Int64

Linear

No

adjacent_count

Number of differences between successive elements

Any

Int64

Linear

No

product

Product

Double, int64

Double, int64

Linear

Yes

spread

Spread

Double, int64

Double, int64

Linear

Yes

skewness

Skewness

Double, int64

Double, int64

Linear

No

kurtosis

Kurtosis

Double, int64

Double, int64

Linear

No

abs_min

min(abs(x))

Double, int64

Double, int64

Linear

No

abs_max

max(abs(x))

Double, int64

Double, int64

Linear

No

sum_of_squares abs_energy

Sum of squares

Double, int64

Double, int64

Linear

Yes

sample_variance

Sample variance

Double, int64

Double, int64

Linear

No

sample_stddev

Sample standard deviation

Double, int64

Double, int64

Linear

No

population_variance

Population variance

Double, int64

Double, int64

Linear

No

population_stddev

Population standard deviation

Double, int64

Double, int64

Linear

No

correlation

Population correlation

Double, int64

Double

Linear

No

covariance

Population covariance

Double, int64

Double

Linear

No

sum_of_abs

Sum of absolute values

Double, int64

Double, int64

Linear

No

sum_of_abs_changes

Sum of absolute value changes

Double, int64

Double, int64

Linear

No

slope

Slope

Double, int64

Double, int64

Linear

No

macd

Moving Average Convergence

Divergence

Double, int64

Double, int64

Linear

No

rsi

Relative Strength Index

Double, int64

Double, int64

Linear

No

array_accum

Accamulate arrays

Any

Array

Linear

No

Note

Retrieving timestamps for first/last/min/max

While most aggregation functions simply operate only on the group they operate in and return a single value, the FIRST, LAST, MIN and MAX aggregations have an additional trait: they always point to the value of a single row.

As such, when you get the result of running e.g. MAX(value), you often also want to know when this occured. For this reason, these specific aggregation functions also allow you to select the associated timestamp by running e.g. MAX(value).$timestamp.

For Example
SELECT MAX(value), MAX(value).$timestamp, MIN(value), MIN(value).$timestamp FROM measurements

This enables you not only calculate these aggregations, but also exactly when they occurred.

5.2. Window functions#

Window functions are similar to aggregation functions, but instead of aggregating “buckets”, they instead operate over a sliding window of multiple rows, and accumulate their results in the process.

Operation

Description

Input value

Return value

Complexity

Vectorized

cumsum cumulative_sum

Cumulative sum

Double, int64

Double, int64

Linear

No

cumprod cumulative_product

Cumulative product

Double, int64

Double, int64

Linear

No

ma || moving_average

Moving average

Double, int64

Double, int64

Linear

No

moving_min

Moving minimum

Double, int64

Double, int64

Linear

No

moving_max

Moving maximum

Double, int64

Double, int64

Linear

No

5.2.1. OVER Clause#

“OVER” is a feature that takes inspiration from SQL window functions. It’s like having a moving window over your data. Imagine this window sliding through your dataset, collecting and summarizing information as it goes.

With “OVER,” you can specify how this window behaves, defining the scope and limits of its movement. This allows you to perform all sorts of calculations and aggregations on your data.

It’s essential to know that it doesn’t yet fully support the official SQL syntax. Currently, it only works with time-based ranges. As this feature develops, the existing ‘transformations’ will eventually become alternative ways to use ‘OVER.

5.2.2. Examples#

Cumulative Sum Over All Rows

In this example, we calculate the cumulative sum of the y column over all rows without specifying a time range.
CREATE TABLE table_name (x DOUBLE, y DOUBLE);
INSERT INTO table_name ($timestamp, x, y) VALUES
  (2023-01-01, 1, 2),
  (2023-01-02, 2, 4),
  (2023-01-03, 3, 6)
SELECT $timestamp, x, sum(y) OVER() FROM table_name;

$timestamp                                    x    sum(y) OVER()
-----------------------------------------------------------------
2023-01-01T00:00:00.000000000Z                1               12
2023-01-02T00:00:00.000000000Z                2               12
2023-01-03T00:00:00.000000000Z                3               12

Returned 3 rows in 17,830 us
Scanned 6 points in 17,830 us (336 rows/sec)
Specifying a Time-Based Range
SELECT $timestamp, x, sum(y) OVER(RANGE($unbounded, $unbounded)) AS z FROM table_name;

$timestamp                                    x                z
-----------------------------------------------------------------
2023-01-01T00:00:00.000000000Z                1               12
2023-01-02T00:00:00.000000000Z                2               12
2023-01-03T00:00:00.000000000Z                3               12

Returned 3 rows in 1,976 us
Scanned 6 points in 1,976 us (3,035 rows/sec)

When you use RANGE($unbounded, $unbounded) in the “OVER” clause, it means you’re including all the rows in your dataset, allowing you to calculate aggregates or cumulative values without any time-based limitations. It’s a way to analyze the entire dataset as a whole.

Defining the time-based constraints for the analytical operation can be achieved in different ways using ‘RANGE’. For example, you can use ‘($unbounded, $timestamp)’ (range that starts from the beginning and goes up to the current timestamp) or ‘(‘2023-01-02’, $timestamp)’ (range that starts from a specific timestamp (January 2, 2023) and goes up to the current timestamp).

5.2.3. Usage:#

Think of “window functions” as a group of special math operations that you can do with your data. They include things like adding up numbers, finding averages, or picking the biggest or smallest values.

Now, here’s where “OVER” comes in: It’s like putting a frame around a part of your data. This frame tells those math operations exactly which data you want to work with.

Imagine you have a list of numbers. If you want to add up only the numbers in the last three positions, you use “OVER” to set up that frame. It’s like telling the math to only look at those last three numbers.

So, “window functions” are the math operations, and “OVER” helps you focus those operations on a specific part of your data. It’s super handy when you want to do things like adding up values as they slide through your data or finding the average in a moving window of time.

5.3. Distinguishing ‘OVER’ for Timeseries Tables and Aggregated Tables#

QuasarDB uses the ‘OVER’ function as a window function, allowing you to perform various calculations within defined windows. It’s crucial to understand that ‘OVER’ applies to both Timeseries Tables and Aggregated Tables, with variations in their implementations.

5.3.1. ‘OVER’ for Sliding Windows (Timeseries Tables)#

What It Does: Think of “OVER” in window functions like a spotlight on your data. It helps you perform calculations and find insights by focusing on specific portions of your dataset as it moves through time.

How It Works:

  • Sliding Windows: In this scenario, “OVER” defines time-based windows that move with your data. It’s like a spotlight sliding over your information, highlighting different parts as it goes.

  • Row-Centric: The calculations are done on individual rows, like analyzing a single frame within the sliding spotlight. Each row gets its calculations based on the data within its specific window.

  • Ordered by Time: When you’re dealing with time-based windows, the order of the calculations always follows the time of your data points. It ensures that everything aligns properly.

One unique feature of “OVER” in window functions is the ability to create “sliding” windows. For instance, using an SQL-like expression such as:

MAX(value) OVER($timestamp - 15 minutes)

means that, for each row, the system calculates the maximum value within a 15-minute window relative to that row’s timestamp. This calculation is based on the position of each row, and the window slides as you move through the dataset.

5.3.2. ‘OVER’ for Fixed Windows (Aggregated Tables)#

What It Does: With “OVER” in aggregated tables, you’re setting up fixed windows to analyze your data. These windows don’t move like a spotlight but rather remain in specific positions.

How It Works:

  • Fixed Windows: In this context, ‘OVER’ creates absolute time-based windows. These windows are predetermined by your table’s configuration. They don’t shift or change position as data flows in.

  • Window Configuration: The windows in aggregated tables are established by your table’s configuration. You don’t need to set a unique window for each data point. The window size and location are defined in advance.

  • Different Result Counts: In window functions, you get as many results as there are rows, each with its own window. In aggregated tables, the number of results matches the number of predefined windows you’ve set for the table.

In a nutshell, ‘OVER’ for Sliding Windows (Timeseries Tables) allows you to perform dynamic calculations within sliding time-based windows that are relative to each data point. On the other hand, ‘OVER’ for Fixed Windows (Aggregated Tables) allows you to work with predetermined, fixed windows that are set by your table’s configuration.

Remember, the terminology like “sliding” or “tumbling” windows might sound confusing, but it’s primarily related to aggregated tables and how their predefined windows behave. In window functions, the term “sliding” may also be used, but it refers to windows centered around your data and moving as time progresses.

5.4. Comparison operators#

For WHERE clauses, QuasarDB supports different boolean operators to filter rows from the results. Below is an overview of the operators available and the value types they can operator on.

Operator

Description

Applies to

==

Equal

Any

!=

Not equal

Any

<

Less than

Double, int64

>

Greater than

Double, int64

<=

Less than or equal to

Double, int64

>=

Greater than or equal to

Double, int64

in find(...)

Is contained in the resulting set

Blob, double, int64, string

You can use inline key/value lookups to check if a value is contained in the set of blobs/ints resulting from it. See IN FIND example.

5.5. Logical operators#

You can compose boolean operators in WHERE clause with logical operators to filter rows from the results. Below is an overview of the operators available.

Operator

Description

and

Logical and

or

Logical or

not

Logical not

5.6. Arithmetic operator#

Bitwise operators allow you to combine values in SELECT and WHERE clauses. Below is an overview of the operators available.

Operator

Description

Applies to

+

Addition

double, int64

-

Substraction

double, int64

*

Multiplication

double, int64

/

Division

double, int64

%

Modulus

double, int64

||

String concatenation

(string, string), (string, int64), (int64, string)

Aggregations with a single result can be combined with arithmetic, which allows to use expressions like sum(b - min(a)).

5.7. Bitwise operators#

Bitwise operators allow you to combine values in SELECT and WHERE clauses. Below is an overview of the operators available.

Operator

Description

Applies to

&

Bitwise and

int64

5.8. Regular expressions#

Regular expressions on blobs are possible within WHERE clauses. QuasarDB uses the extended POSIX regular expression grammar.

Operator

Description

Applies to

~

Regex case sensitive match

blob

!~

Regex case sensitive no match

blob

~*

Regex case insensitive match

blob

!~*

Regex case insensitive no match

blob

5.10. Geographical functions#

Built-in functions are available for your convenience to store and query effiently geographical data.

Profile

Description

Input value

Return value

geohash64(double, double) -> int64

Transforms a latitude and a longitude into a sub-meter precise geohash.

Latitude and longitude

int64 hash

5.11. Lookup function#

The Lookup function enables you to map the result of an expression to the key-value store inside QuasarDB.

It stringifies the result of an expression. It will then return the content of keys matching the entry.

For example, if the row contains the integer 1, it will be transformed to the string 1.

Profile

Description

Input value

Return value

lookup(expression, type) -> column

Lookup keys based on the stringified result of the expression.

Expression and type

column of <type>s

lookup(expression) -> column

Equivalent to lookup(expression, STRING)

Expression

column of strings

Type can be any of BLOB, DOUBLE, INT64, STRING, or TIMESTAMP. If the key is of the wrong type or missing, no row will be returned.

For compatibility, the form taking a prefix string lookup(string, expression) is implicitly transformed into string concatenation lookup(string || expression).

5.11.1. Examples#

For the following table trades:

timestamp

agent

value

2018-01-01T10:01:02

3

4

2018-01-01T10:01:04

2

5

2018-01-01T10:01:08

1

4

And with the following blob keys (added through the key/value API):

Key

Value

‘agentid.1’

‘Bob’

‘agentid.2’

‘Terry’

‘agentid.3’

‘Alice’

The following query:

select lookup('agentid.' || agent), value from trades where value=4

Will output:

timestamp

lookup((‘agentid.’||agent))

value

2018-01-01T10:01:02

‘Alice’

4

2018-01-01T10:01:08

‘Bob’

4

For the following table basket:

timestamp

item

quantity

2021-01-01T10:01:02

‘apples’

3

2021-01-01T10:01:04

‘oranges’

2

2021-01-01T10:01:08

‘bananas’

1

And with the following blob keys (added through the key/value API):

Key

Value

‘price.apples’

‘0.54’

‘price.bananas’

‘0.80’

‘price.oranges’

‘0.63’

The following query:

select item, quantity * lookup('price.' || item, DOUBLE) as price from basket

Will output:

timestamp

item

price

2021-01-01T10:01:02

‘apples’

1.62

2021-01-01T10:01:04

‘oranges’

1.26

2021-01-01T10:01:08

‘bananas’

0.8

5.12. Cast function#

The Cast function enables you to transform the result of an expression.

Profile

Description

Input value

Return value

cast(expression as EPOCH_NS) -> int64

Number of nanoseconds since 1 January 1970.

timestamp

int64

cast(expression as EPOCH_US) -> int64

Number of microseconds since 1 January 1970.

timestamp

int64

cast(expression as EPOCH_MS) -> int64

Number of milliseconds since 1 January 1970.

timestamp

int64

cast(expression as EPOCH_S) -> int64

Number of seconds since 1 January 1970.

timestamp

int64

5.13. Rounding Functions#

5.13.1. round#

The round function is used to round a number to a specified number of decimal places using various rounding methods.

  • Parameters:

    • number (required): The number to round.

    • decimals (optional, defaults to 0): The number of decimal places to round to.

    • method (optional, defaults to “nearest”): The rounding method to use. It can be one of the following:

Method

Description

Example

“nearest”

Rounds to the nearest value.

SELECT ROUND(3.45);               Result: 3

“inward”

Rounds towards zero (floor for negative numbers, ceiling for positive numbers).

SELECT ROUND(3.5, 0, 'inward');       Result: 3

“outward”

Rounds away from zero (ceiling for negative numbers, floor for positive numbers).

SELECT ROUND(3.5, 0, 'outward');       Result: 4

“lower”

Always rounds down (towards negative infinity).

SELECT ROUND(3.45, 0, 'lower');       Result: 3

“upper”

Always rounds up (towards positive infinity).

SELECT ROUND(3.45, 0, 'upper');       Result: 4

5.13.2. ceil#

The ceil function is a shortcut for rounding up (towards positive infinity). It takes the same parameters as round and uses the “upper” method by default.

SELECT CEIL(3.456, 1); -- Equivalent to: SELECT ROUND(3.456, 1, 'upper');
Expected Result: 3.5

5.13.3. floor#

The floor function is a shortcut for rounding down (towards negative infinity). It takes the same parameters as round and uses the “lower” method by default.

SELECT FLOOR(3.456, 1); -- Equivalent to: SELECT ROUND(3.456, 1, 'lower');
Expected Result: 3.4

5.13.4. Examples#

Here are some examples of how to use these rounding functions:

  1. Rounding to the nearest integer:

    • select round(3.55); // Result: 4

  2. Rounding to a specific number of decimal places:

    • select round(3.456, 2); // Result: 3.46

    • select round(3.456, 1); // Result: 3.5

  3. Using different rounding methods:

    • select round(-3.5, 0, "outward"); // Result: -4

These functions allow you to control how numbers are rounded in your queries, whether you need precise rounding, always rounding up or down, or using different rounding methods.