# 5. Functions & Operators#

## 5.1. Aggregate functions#

QuasarDB supports many server-side operations that can be applied on timeseries data. Below is an overview of the functions available and the value types they can operator on.

Calculations performed using these functions are distributed over the cluster, and many of them are vectorized. Since 3.0.0 QuasarDB uses light-weight indexes to speed up lookups and computations where possible.

Using an aggregation function on a column always requires all other columns to be grouped by.

The following query demonstrates a valid aggregation:

```
SELECT instrument_id, AVG(value) FROM measurements GROUP BY instrument_id
```

In this case, we use the `AVG`

aggregation function, and group by instrument_id. This query, however, is invalid:

```
SELECT instrument_id, AVG(value) FROM measurements
```

- Grouping based on durations always requires the special
`$timestamp`

field to be selected as follows:: SELECT $timestamp, AVG(value) FROM measurements GROUP BY 1h

This query will then yield the aggregation in groups of 1h.

Operation

Description

Input value

Return value

Complexity

Vectorized

Basic operations

`first`

First element

Any

Any

Constant

No

`last`

Last element

Any

Any

Constant

No

`min`

Minimum value

Double, int64

Double, int64

Linear

Yes

`max`

Maximum value

Double, int64

Double, int64

Linear

Yes

`count`

Number of elements

Any

Int64

Constant

No

`sum`

Sum of values

Double, int64

Double, int64

Linear

Yes

Averages

`avg`

`arithmetic_mean`

Arithmetic mean

Double, int64

Double, int64

Linear

Yes

`harmonic_mean`

Harmonic mean

Double, int64

Double, int64

Linear

No

`geometric_mean`

Geometric mean

Double, int64

Double, int64

Linear

No

`quadratic_mean`

Quadratic mean

Double, int64

Double, int64

Linear

Yes

Statistics

`distinct_count`

Number of distinct elements

Any

Int64

Linear

No

`adjacent_count`

Number of differences between successive elements

Any

Int64

Linear

No

`product`

Product

Double, int64

Double, int64

Linear

Yes

`spread`

Spread

Double, int64

Double, int64

Linear

Yes

`skewness`

Skewness

Double, int64

Double, int64

Linear

No

`kurtosis`

Kurtosis

Double, int64

Double, int64

Linear

No

`abs_min`

min(abs(x))

Double, int64

Double, int64

Linear

No

`abs_max`

max(abs(x))

Double, int64

Double, int64

Linear

No

`sum_of_squares`

`abs_energy`

Sum of squares

Double, int64

Double, int64

Linear

Yes

`sample_variance`

Sample variance

Double, int64

Double, int64

Linear

No

`sample_stddev`

Sample standard deviation

Double, int64

Double, int64

Linear

No

`population_variance`

Population variance

Double, int64

Double, int64

Linear

No

`population_stddev`

Population standard deviation

Double, int64

Double, int64

Linear

No

`correlation`

Population correlation

Double, int64

Double

Linear

No

`covariance`

Population covariance

Double, int64

Double

Linear

No

`sum_of_abs`

Sum of absolute values

Double, int64

Double, int64

Linear

No

`sum_of_abs_changes`

Sum of absolute value changes

Double, int64

Double, int64

Linear

No

`slope`

Slope

Double, int64

Double, int64

Linear

No

`macd`

- Moving Average Convergence
Divergence

Double, int64

Double, int64

Linear

No

`rsi`

Relative Strength Index

Double, int64

Double, int64

Linear

No

`array_accum`

Accamulate arrays

Any

Array

Linear

No

Note

**Retrieving timestamps for first/last/min/max**

While most aggregation functions simply operate only on the group they operate in and return a single value, the `FIRST`

, `LAST`

, `MIN`

and `MAX`

aggregations have an additional trait: they always point to the value of a single row.

As such, when you get the result of running e.g. `MAX(value)`

, you often also want to know *when* this occured. For this reason, these specific aggregation functions also allow you to select the associated timestamp by running e.g. `MAX(value).$timestamp`

.

- For Example
SELECT MAX(value), MAX(value).$timestamp, MIN(value), MIN(value).$timestamp FROM measurements

This enables you not only calculate these aggregations, but also exactly when they occurred.

## 5.2. Window functions#

Window functions are similar to aggregation functions, but instead of aggregating “buckets”, they instead operate over a sliding window of multiple rows, and accumulate their results in the process.

Operation

Description

Input value

Return value

Complexity

Vectorized

`cumsum`

`cumulative_sum`

Cumulative sum

Double, int64

Double, int64

Linear

No

`cumprod`

`cumulative_product`

Cumulative product

Double, int64

Double, int64

Linear

No

`ma`

||`moving_average`

Moving average

Double, int64

Double, int64

Linear

No

`moving_min`

Moving minimum

Double, int64

Double, int64

Linear

No

`moving_max`

Moving maximum

Double, int64

Double, int64

Linear

No

### 5.2.1. OVER Clause#

“OVER” is a feature that takes inspiration from SQL window functions. It’s like having a moving window over your data. Imagine this window sliding through your dataset, collecting and summarizing information as it goes.

With “OVER,” you can specify how this window behaves, defining the scope and limits of its movement. This allows you to perform all sorts of calculations and aggregations on your data.

It’s essential to know that it doesn’t yet fully support the official SQL syntax. Currently, it only works with time-based ranges. As this feature develops, the existing ‘transformations’ will eventually become alternative ways to use ‘OVER.

### 5.2.2. Examples#

Cumulative Sum Over All Rows

- In this example, we calculate the cumulative sum of the y column over all rows without specifying a time range.
CREATE TABLE table_name (x DOUBLE, y DOUBLE); INSERT INTO table_name ($timestamp, x, y) VALUES (2023-01-01, 1, 2), (2023-01-02, 2, 4), (2023-01-03, 3, 6)

SELECT $timestamp, x, sum(y) OVER() FROM table_name; $timestamp x sum(y) OVER() ----------------------------------------------------------------- 2023-01-01T00:00:00.000000000Z 1 12 2023-01-02T00:00:00.000000000Z 2 12 2023-01-03T00:00:00.000000000Z 3 12 Returned 3 rows in 17,830 us Scanned 6 points in 17,830 us (336 rows/sec)

- Specifying a Time-Based Range
SELECT $timestamp, x, sum(y) OVER(RANGE($unbounded, $unbounded)) AS z FROM table_name; $timestamp x z ----------------------------------------------------------------- 2023-01-01T00:00:00.000000000Z 1 12 2023-01-02T00:00:00.000000000Z 2 12 2023-01-03T00:00:00.000000000Z 3 12 Returned 3 rows in 1,976 us Scanned 6 points in 1,976 us (3,035 rows/sec)

When you use RANGE($unbounded, $unbounded) in the “OVER” clause, it means you’re including all the rows in your dataset, allowing you to calculate aggregates or cumulative values without any time-based limitations. It’s a way to analyze the entire dataset as a whole.

Defining the time-based constraints for the analytical operation can be achieved in different ways using ‘RANGE’. For example, you can use ‘($unbounded, $timestamp)’ (range that starts from the beginning and goes up to the current timestamp) or ‘(‘2023-01-02’, $timestamp)’ (range that starts from a specific timestamp (January 2, 2023) and goes up to the current timestamp).

### 5.2.3. Usage:#

Think of “window functions” as a group of special math operations that you can do with your data. They include things like adding up numbers, finding averages, or picking the biggest or smallest values.

Now, here’s where “OVER” comes in: It’s like putting a frame around a part of your data. This frame tells those math operations exactly which data you want to work with.

Imagine you have a list of numbers. If you want to add up only the numbers in the last three positions, you use “OVER” to set up that frame. It’s like telling the math to only look at those last three numbers.

So, “window functions” are the math operations, and “OVER” helps you focus those operations on a specific part of your data. It’s super handy when you want to do things like adding up values as they slide through your data or finding the average in a moving window of time.

## 5.3. Distinguishing ‘OVER’ for Timeseries Tables and Aggregated Tables#

QuasarDB uses the ‘OVER’ function as a window function, allowing you to perform various calculations within defined windows. It’s crucial to understand that ‘OVER’ applies to both Timeseries Tables and Aggregated Tables, with variations in their implementations.

### 5.3.1. ‘OVER’ for Sliding Windows (Timeseries Tables)#

**What It Does:**
Think of “OVER” in window functions like a spotlight on your data. It helps you perform calculations and find insights by focusing on specific portions of your dataset as it moves through time.

**How It Works:**

**Sliding Windows:**In this scenario, “OVER” defines time-based windows that move with your data. It’s like a spotlight sliding over your information, highlighting different parts as it goes.**Row-Centric:**The calculations are done on individual rows, like analyzing a single frame within the sliding spotlight. Each row gets its calculations based on the data within its specific window.**Ordered by Time:**When you’re dealing with time-based windows, the order of the calculations always follows the time of your data points. It ensures that everything aligns properly.

One unique feature of “OVER” in window functions is the ability to create “sliding” windows. For instance, using an SQL-like expression such as:

```
MAX(value) OVER($timestamp - 15 minutes)
```

means that, for each row, the system calculates the maximum value within a 15-minute window relative to that row’s timestamp. This calculation is based on the position of each row, and the window slides as you move through the dataset.

### 5.3.2. ‘OVER’ for Fixed Windows (Aggregated Tables)#

**What It Does:**
With “OVER” in aggregated tables, you’re setting up fixed windows to analyze your data. These windows don’t move like a spotlight but rather remain in specific positions.

**How It Works:**

**Fixed Windows:**In this context, ‘OVER’ creates absolute time-based windows. These windows are predetermined by your table’s configuration. They don’t shift or change position as data flows in.**Window Configuration:**The windows in aggregated tables are established by your table’s configuration. You don’t need to set a unique window for each data point. The window size and location are defined in advance.**Different Result Counts:**In window functions, you get as many results as there are rows, each with its own window. In aggregated tables, the number of results matches the number of predefined windows you’ve set for the table.

In a nutshell, ‘OVER’ for Sliding Windows (Timeseries Tables) allows you to perform dynamic calculations within sliding time-based windows that are relative to each data point. On the other hand, ‘OVER’ for Fixed Windows (Aggregated Tables) allows you to work with predetermined, fixed windows that are set by your table’s configuration.

Remember, the terminology like “sliding” or “tumbling” windows might sound confusing, but it’s primarily related to aggregated tables and how their predefined windows behave. In window functions, the term “sliding” may also be used, but it refers to windows centered around your data and moving as time progresses.

## 5.4. Comparison operators#

For WHERE clauses, QuasarDB supports different boolean operators to filter rows from the results. Below is an overview of the operators available and the value types they can operator on.

Operator

Description

Applies to

`==`

Equal

Any

`!=`

Not equal

Any

`<`

Less than

Double, int64

`>`

Greater than

Double, int64

`<=`

Less than or equal to

Double, int64

`>=`

Greater than or equal to

Double, int64

`in find(...)`

Is contained in the resulting set

Blob, double, int64, string

You can use inline key/value lookups to check if a value is contained in the set of blobs/ints resulting from it. See IN FIND example.

## 5.5. Logical operators#

You can compose boolean operators in WHERE clause with logical operators to filter rows from the results. Below is an overview of the operators available.

Operator

Description

`and`

Logical and

`or`

Logical or

`not`

Logical not

## 5.6. Arithmetic operator#

Bitwise operators allow you to combine values in SELECT and WHERE clauses. Below is an overview of the operators available.

Operator

Description

Applies to

`+`

Addition

double, int64

`-`

Substraction

double, int64

`*`

Multiplication

double, int64

`/`

Division

double, int64

`%`

Modulus

double, int64

`||`

String concatenation

(string, string), (string, int64), (int64, string)

Aggregations with a single result can be combined with arithmetic, which allows to use expressions like `sum(b - min(a))`

.

## 5.7. Bitwise operators#

Bitwise operators allow you to combine values in SELECT and WHERE clauses. Below is an overview of the operators available.

Operator

Description

Applies to

`&`

Bitwise and

int64

## 5.8. Regular expressions#

Regular expressions on blobs are possible within WHERE clauses. QuasarDB uses the extended POSIX regular expression grammar.

Operator

Description

Applies to

`~`

Regex case sensitive match

blob

`!~`

Regex case sensitive no match

blob

`~*`

Regex case insensitive match

blob

`!~*`

Regex case insensitive no match

blob

## 5.10. Geographical functions#

Built-in functions are available for your convenience to store and query effiently geographical data.

Profile

Description

Input value

Return value

`geohash64(double, double) -> int64`

Transforms a latitude and a longitude into a sub-meter precise geohash.

Latitude and longitude

int64 hash

## 5.11. Lookup function#

The Lookup function enables you to map the result of an expression to the key-value store inside QuasarDB.

It stringifies the result of an expression. It will then return the content of keys matching the entry.

For example, if the row contains the integer 1, it will be transformed to the string 1.

Profile

Description

Input value

Return value

`lookup(expression, type) -> column`

Lookup keys based on the stringified result of the expression.

Expression and type

column of <type>s

`lookup(expression) -> column`

Equivalent to

`lookup(expression, STRING)`

Expression

column of strings

Type can be any of `BLOB`

, `DOUBLE`

, `INT64`

, `STRING`

, or `TIMESTAMP`

.
If the key is of the wrong type or missing, no row will be returned.

For compatibility, the form taking a prefix string `lookup(string, expression)`

is implicitly transformed
into string concatenation `lookup(string || expression)`

.

### 5.11.1. Examples#

For the following table `trades`

:

timestamp

agent

value

`2018-01-01T10:01:02`

3

4

`2018-01-01T10:01:04`

2

5

`2018-01-01T10:01:08`

1

4

And with the following blob keys (added through the key/value API):

Key

Value

‘agentid.1’

‘Bob’

‘agentid.2’

‘Terry’

‘agentid.3’

‘Alice’

The following query:

```
select lookup('agentid.' || agent), value from trades where value=4
```

Will output:

timestamp

lookup((‘agentid.’||agent))

value

`2018-01-01T10:01:02`

‘Alice’

4

`2018-01-01T10:01:08`

‘Bob’

4

For the following table `basket`

:

timestamp

item

quantity

`2021-01-01T10:01:02`

‘apples’

3

`2021-01-01T10:01:04`

‘oranges’

2

`2021-01-01T10:01:08`

‘bananas’

1

And with the following blob keys (added through the key/value API):

Key

Value

‘price.apples’

‘0.54’

‘price.bananas’

‘0.80’

‘price.oranges’

‘0.63’

The following query:

```
select item, quantity * lookup('price.' || item, DOUBLE) as price from basket
```

Will output:

timestamp

item

price

`2021-01-01T10:01:02`

‘apples’

1.62

`2021-01-01T10:01:04`

‘oranges’

1.26

`2021-01-01T10:01:08`

‘bananas’

0.8

## 5.12. Cast function#

The Cast function enables you to transform the result of an expression.

Profile

Description

Input value

Return value

`cast(expression as EPOCH_NS) -> int64`

Number of nanoseconds since 1 January 1970.

timestamp

int64

`cast(expression as EPOCH_US) -> int64`

Number of microseconds since 1 January 1970.

timestamp

int64

`cast(expression as EPOCH_MS) -> int64`

Number of milliseconds since 1 January 1970.

timestamp

int64

`cast(expression as EPOCH_S) -> int64`

Number of seconds since 1 January 1970.

timestamp

int64

## 5.13. Rounding Functions#

### 5.13.1. round#

The `round`

function is used to round a number to a specified number of decimal places using various rounding methods.

Parameters:

`number`

(required): The number to round.`decimals`

(optional, defaults to 0): The number of decimal places to round to.`method`

(optional, defaults to “nearest”): The rounding method to use. It can be one of the following:

Method |
Description |
Example |
---|---|---|

“nearest” |
Rounds to the nearest value. |
```
SELECT ROUND(3.45); Result: 3
``` |

“inward” |
Rounds towards zero (floor for negative numbers, ceiling for positive numbers). |
```
SELECT ROUND(3.5, 0, 'inward'); Result: 3
``` |

“outward” |
Rounds away from zero (ceiling for negative numbers, floor for positive numbers). |
```
SELECT ROUND(3.5, 0, 'outward'); Result: 4
``` |

“lower” |
Always rounds down (towards negative infinity). |
```
SELECT ROUND(3.45, 0, 'lower'); Result: 3
``` |

“upper” |
Always rounds up (towards positive infinity). |
```
SELECT ROUND(3.45, 0, 'upper'); Result: 4
``` |

### 5.13.2. ceil#

The `ceil`

function is a shortcut for rounding up (towards positive infinity). It takes the same parameters as `round`

and uses the “upper” method by default.

```
SELECT CEIL(3.456, 1); -- Equivalent to: SELECT ROUND(3.456, 1, 'upper');
Expected Result: 3.5
```

### 5.13.3. floor#

The `floor`

function is a shortcut for rounding down (towards negative infinity). It takes the same parameters as `round`

and uses the “lower” method by default.

```
SELECT FLOOR(3.456, 1); -- Equivalent to: SELECT ROUND(3.456, 1, 'lower');
Expected Result: 3.4
```

### 5.13.4. Examples#

Here are some examples of how to use these rounding functions:

Rounding to the nearest integer:

`select round(3.55);`

// Result: 4

Rounding to a specific number of decimal places:

`select round(3.456, 2);`

// Result: 3.46`select round(3.456, 1);`

// Result: 3.5

Using different rounding methods:

`select round(-3.5, 0, "outward");`

// Result: -4

These functions allow you to control how numbers are rounded in your queries, whether you need precise rounding, always rounding up or down, or using different rounding methods.