12. Select

    12. Select

    12.1. Synopsis

    SELECT { <expression> | * } [, ... ]
    
      FROM { <table_name> | <find_expression> } [, ... ]
    
      [ <asof_type> ASOF JOIN { <table_name> | <find_expression> } [, ... ] [ <interpolation> ] [ <lookback> ] [ PREWHERE <condition> ]]
    
      [ ASOF JOIN <range_generator> [ <interpolation> ] [ PREWHERE <condition> ] ]
    
      [ IN { RANGE <range_spec> | '[' RANGE <range_spec> [, ...] ']' }
        [ WITH MONTHS IN ( <month>, <month> ) ]
        [ WITH DAYS IN ( <day>, <day> ) ]
        [ WITH TIME IN ( <time>, <time> ) ] ]
    
      [ WHERE <condition> ]
    
      [ RESTRICT TO <restriction> ]
    
      [ GROUP BY <group> [, ... ] ]
    
      [ PIVOT <column> ]
    
      [ HAVING <condition> ]
    
      [ ORDER BY { <expression> [ { ASC | DESC } ] } [, ... ] ]
    
      [ LIMIT <limit> ]
    
      [ OFFSET <offset> ]
    
    asof_type ::=
     FULL | LEFT | RIGHT
    
    expression ::=
      <column_name>
      | <function> ( <column_name> )
      | ( <sub_query> )
    
    interpolation ::=
      INTERPOLATE (method=<interpolation_method> [, limit_direction=<interpolation_limit_direction>])
    
    interpolation_limit_direction ::=
     FORWARD | BACKWARD | BOTH | NONE
    
    interpolation_method ::=
      CONSTANT | LINEAR
    
    lookback ::=
      WITH LOOKBACK <duration>
    
    range_spec ::=
      ( <timestamp>, <timestamp> )
      ( <timestamp>, <time_offset> )
    
    range_generator ::=
      ( <timestamp>, <timestamp>, <time_offset> )
      ( <timestamp>, <time_offset>, <time_offset> )
    
    restriction ::=
      <function> ( <column_name> )
    
    group ::=
      <expression>
      | <duration>
    
    find_expression ::=
      FIND ( { <tag_expression> | NOT <tag_expression> } [ AND ... ] )
    
    tag_expression ::=
      TAG = 'tag_name'
    
    sub_query ::=
      SELECT <function> ( <column_name> ) FROM <table_name> IN RANGE <sub_range_spec>
    
    sub_range_spec ::=
      ( <timestamp> | <column_name>, <timestamp> | <column_name> )
      ( <timestamp> | <column_name>, <time_offset> )
    

    12.2. Description

    SELECT retrieves rows from one or more tables. A SELECT statement can perform calculations on rows prior to returning the result which are performed server-side and distributed over the entire cluster.

    12.3. Parameters

    table_name

    The name of the table to retrieve rows from.

    find_expression

    When your tables are tagged, you can use inline key/value lookups to perform your query over multiple tables. To match all tables that have the tag “stocks”, you can use FIND(tag='stocks' AND type=ts).

    asof_type

    The type of ASOF JOIN to perform, it can be FULL, LEFT, or RIGHT.

    column_name

    A column name to read data from. Must be part of the table’s schema or will throw an error otherwise.

    function

    An aggregate function to apply over. Can only be used in combination with GROUP BY. For valid functions, please refer to the function reference.

    condition

    A WHERE or HAVING condition is any expression that evaluates to a boolean. Any row that does not match this predicate will be filtered from the results. For an overview of valid operators, please refer to the comparison operators reference. The WHERE condition is applied before the SELECT, whereas the HAVING condition is applied on the SELECT results.

    restriction

    A RESTRICT TO restriction selects only one row from each set of rows with duplicated $timestamp values. The restriction function can be one of first, last, min, max, abs_min, abs_max. For a description of these functions, please refer to the function reference.

    range_generator

    An expression to generate range intervals.

    limit

    Accepts a non-negative integer to limit the maximum number of rows returned by the query. Typical use cases include pagination and returning the top-X results.

    offset

    Accepts a non-negative integer to skip rows of the result. Most common use cases include pagination in combination with limit.

    timestamp

    An absolute timestamp. This can be either a date or a date + time. Supports precision for days, seconds or nanoseconds. For more information, please refer to the documentation for timestamps.

    time_offset

    A relative offset, can only be used in combination with an absolute_timestamp. For more information, please refer to the documentation for timestamps.

    month

    A month. Can be one of jan, feb, mar, apr, may, jun, jul, aug, sep, oct, nov or dec. When defining a subrange using WITH MONTHS IN, is both left and right inclusive: WITH MONTHS IN (jan, apr) will match January to April.

    day

    A weekday. Can be one of sun, mon, tue, wed, thu, fri or sat. When defining a subrange using WITH DAYS IN, is both left and right inclusive: WITH DAYS IN (mon, wed) will match Monday to Wednesday.

    time

    Time of day, having a precision of either minutes, seconds or nanoseconds. When defining a subrange using WITH TIME IN, is left inclusive and right exclusive: WITH TIME IN (09:22:00, 09:22:01) will not match a row whose time is exactly 09:22:01 AM.

    duration

    A time duration to group aggregates by. Valid values are any durations as specified in the documentation for durations.

    12.4. Examples

    Select all columns from all rows:

    SELECT * FROM example
    

    Note

    The examples below assume typical open, high, low, close, volume stocks time series.

    Get everything between January 1st 2007 and January 1st 2008 (left inclusive) for the time series “stocks.apple”:

    SELECT * FROM stocks.apple IN RANGE(2007, 2008)
    

    Get everything between November 3rd 2017, 20:01:22 and December 2nd, 2017, 06:20:10 (left inclusive) for the time series “stocks.apple”:

    SELECT * FROM stocks.apple IN RANGE(2017-11-03T20:01:22, 2017-12-02T06:20:10)
    

    Get the first 10 days of 2007 for “stocks.apple”:

    SELECT * FROM stocks.apple IN RANGE(2007, +10d)
    

    Get the last second of 2016 for “stocks.apple”:

    SELECT * FROM stocks.apple IN RANGE(2017, -1s)
    

    Get the close and open values that are greater than 3 of “stocks.apple” for the first 10 days of 2016:

    SELECT close, open FROM stocks.apple IN RANGE(2016, +10d) WHERE (close > 3) AND (open > 3)
    

    Get the last close value for March 28th 2016:

    SELECT LAST(close) FROM stocks.apple IN RANGE(2016-03-28, +1d)
    

    12.5. Advanced examples

    Note

    The examples below assume typical open, high, low, close, volume stocks time series.

    Get the open and close value when volume is greater than 0 for the first 5 days of 2016 and 2017 for “stocks.apple”:

    SELECT open, close FROM stocks.apple IN [range(2016, +5d), range(2017, +5d)] WHERE volume > 0
    

    Get the hourly arithmetic mean of volume exchanged for all nasdaq stocks for yesterday:

    SELECT arithmetic_mean(volume) FROM find(tag='nasdaq' AND type=ts) IN RANGE(yesterday, +1d) GROUP BY hour
    

    Get the sum of volumes for every Friday of January 2008 between 16:00 and 17:00 for “stocks.apple”:

    SELECT sum(volume) FROM stocks.apple IN RANGE(2008, +month) WITH DAYS IN (fri, fri) WITH TIME IN (16:00, 17:00)
    

    Get the daily open, high, low, close, volume for “stocks.apple” for the last 30 days:

    SELECT first(open), max(high), min(low), last(close), sum(volume) FROM stocks.apple IN RANGE(today, -30d) GROUP BY day
    

    Get the sum of volume and the number of lines for the last hour by 10 seconds group:

    SELECT sum(volume), count(volume) FROM stocks.apple IN RANGE(now, -1h) GROUP BY 10s
    

    Get the sum of volumes for “stocks.apple” the year 2008 and 2010, grouped by month (gregorian calendar):

    SELECT sum(volume) FROM stocks.apple IN [RANGE(2008, +1y), RANGE(2010, +1y)] GROUP BY month
    

    If we assume we have an additional ‘deal_timestamp’ column on our timeseries, we apply a filter on it:

    SELECT sum(volume) FROM stocks.apple IN RANGE(now, -1h) WHERE deal_timestamp=datetime(2009-11-23T09:30)
    

    12.6. IN FIND example

    Example: tagging stock tables by exchange:

    CREATE TABLE stocks (symbol INT64, open DOUBLE, close DOUBLE, high DOUBLE, low DOUBLE, volume INT64)
    
    timestamp                      symbol   open             close            high             low              volume
    -----------------------------------------------------------------------------------------------------------------------------
    2018-01-01T00:00:00.000000000Z 1        10.240000        10.170000        10.360000        10.070000        1027389
    2018-01-01T00:00:00.000000000Z 2        902.750000       909.510000       925.000000       890.120000       8437
    2018-01-01T00:00:00.000000000Z 3        45.770000        49.750000        53.000000        42.820000        9820
    

    And with the following int keys (added through the key/value API):

    Key

    Value

    Tags

    ‘symbol.apple’

    1

    ‘nasdaq’

    ‘symbol.facebook’

    2

    ‘nasdaq’

    ‘symbol.tesco’

    3

    ‘lse’

    The following query:

    select * from stocks where symbol in find(tag='nasdaq')
    

    Will output:

    timestamp                      symbol   open             close            high             low              volume
    -----------------------------------------------------------------------------------------------------------------------------
    2018-01-01T00:00:00.000000000Z 1        10.240000        10.170000        10.360000        10.070000        1027389
    2018-01-01T00:00:00.000000000Z 2        902.750000       909.510000       925.000000       890.120000       8437
    

    12.7. SUBQUERY example

    A subquery performs an aggregation for each resulting row of the outer query.

    Assuming these tables

    parent

    $timestamp

    contract_end

    2021-01-01T00:00:00

    2021-01-02T00:00:00

    2021-01-02T00:00:00

    2021-01-04T00:00:00

    child

    $timestamp

    c

    2021-01-01T00:00:00

    1

    2021-01-02T00:00:00

    2

    2021-01-03T00:00:00

    3

    2021-01-04T00:00:00

    4

    2021-01-05T00:00:00

    5

    The following query:

    SELECT $timestamp, contract_end, (SELECT SUM(c) FROM child IN RANGE (parent.$timestamp, parent.contract_end)) AS sum FROM parent;

    Will output:

    timestamp                                        contract_end              sum
    -------------------------------------------------------------------------------
    2021-01-01T00:00:00.000000000Z 2021-01-02T00:00:00.000000000Z                1
    2021-01-02T00:00:00.000000000Z 2021-01-04T00:00:00.000000000Z                5
    

    12.8. ASOF joins

    An ASOF joins several tables based on their timestamps. ASOF joins use the last value for the merge, based in the range of the query. If values exist before the range of the select, they will be ignored.

    There are four types of ASOF joins:

    • Left: The left table will be used as a reference. The timestamps for the right table will be using the last value for each timestamps of all columns of the left table.

    • Right: The right table will be used as a reference. The timestamps for the left table will be using the last value for each timestamps of all columns of the right table.

    • Full: Both table will be used as references.

    • Range: Joins the table on the left against a generated range.

    Currently, ASOF joins support joining against exactly one table. However, the number of tables to join is not limited. This means, for example, that if you are doing a LEFT ASOF JOIN, only one table may be specified on the left, but several can be specified on the right.

    This is correct:

    SELECT * FROM table_left LEFT ASOF JOIN table_right1, table_right2;

    This is not supported:

    SELECT * FROM table_left1, table_left2 LEFT ASOF JOIN table_right;

    It is possible to join the result of a find. However, the result of the FIND must yield exactly one value for the table to join against. The number of results for the tables being joined is not limited.

    This is correct if FIND(tag='left_tag') returns exactly one table, and FIND(tag='right_tag') returns at least one table:

    SELECT * FROM FIND(tag='left_tag') LEFT ASOF JOIN FIND(tag='right_tag');

    12.8.1. Filtering ASOF join results

    The WHERE clause is applied after the ASOF JOIN is executed. What if you want to filter the data before you join the two tables? QuasarDB has a specific PREWHERE clause to solve that problem.

    This query will filter out negative values before joining the two tables:

    SELECT * FROM table_left LEFT ASOF JOIN table_right PREWHERE table_left.value >= 0

    This query will filter out negative values after joining the two tables:

    SELECT * FROM table_left LEFT ASOF JOIN table_right WHERE table_left.value >= 0

    This query will filter out negative value before joining the two tables and filter out values greater than 10 after joining the two tables:

    SELECT * FROM table_left LEFT ASOF JOIN table_right PREWHERE table_left.value >= 0 WHERE table_left.value < 10

    12.8.2. Examples

    Assuming these tables

    table_left

    Timestamp

    Pressure

    2019-11-23T13:02:01

    100

    2019-11-23T13:03:03

    110

    2019-11-23T13:03:59

    105

    2019-11-23T13:05:00

    115

    table_right

    Timestamp

    Temperature

    2019-11-23T13:01:58

    56

    2019-11-23T13:03:03

    59

    2019-11-23T13:04:02

    58

    2019-11-23T13:05:02

    56

    2019-11-23T13:05:22

    57

    Here are the results for the three possible ASOF joins:

    SELECT $timestamp, pressure, temperature FROM table_left LEFT ASOF JOIN table_right;

    Timestamp

    Pressure

    Temperature

    2019-11-23T13:02:01

    100

    56

    2019-11-23T13:03:03

    110

    59

    2019-11-23T13:03:59

    105

    59

    2019-11-23T13:05:00

    115

    58

    SELECT $timestamp, pressure, temperature FROM table_left RIGHT ASOF JOIN table_right;

    Timestamp

    Pressure

    Temperature

    2019-11-23T13:01:58

    (void)

    56

    2019-11-23T13:03:03

    110

    59

    2019-11-23T13:04:02

    105

    58

    2019-11-23T13:05:02

    115

    56

    2019-11-23T13:05:22

    115

    57

    SELECT $timestamp, pressure, temperature FROM table_left FULL ASOF JOIN table_right;

    Timestamp

    Pressure

    Temperature

    2019-11-23T13:01:58

    (void)

    56

    2019-11-23T13:02:01

    100

    56

    2019-11-23T13:03:03

    110

    59

    2019-11-23T13:03:59

    105

    59

    2019-11-23T13:04:02

    105

    58

    2019-11-23T13:05:00

    115

    58

    2019-11-23T13:05:02

    115

    56

    2019-11-23T13:05:22

    115

    57

    SELECT $timestamp, pressure FROM table_left ASOF JOIN RANGE(2019-11-23T13:02:00, +5min, +1min);

    Timestamp

    Pressure

    2019-11-23T13:02:00

    (void)

    2019-11-23T13:03:00

    100

    2019-11-23T13:04:00

    105

    2019-11-23T13:05:00

    115

    2019-11-23T13:06:00

    115

    Using interpolation, we could obtain the following results:

    SELECT $timestamp, temperature FROM table_left FULL ASOF JOIN table_right INTERPOLATE(method=linear);

    Timestamp

    Temperature

    2019-11-23T13:01:58

    56

    2019-11-23T13:02:01

    56.13846154

    2019-11-23T13:03:03

    59

    2019-11-23T13:03:59

    58.05084746

    2019-11-23T13:04:02

    58

    2019-11-23T13:05:00

    56.06666667

    2019-11-23T13:05:02

    56

    2019-11-23T13:05:22

    57

    SELECT $timestamp, pressure FROM %{:1} FULL ASOF JOIN %{:2} INTERPOLATE(method=linear, limit_direction=none);

    Timestamp

    Pressure

    2019-11-23T13:01:58

    (void)

    2019-11-23T13:02:01

    100

    2019-11-23T13:03:03

    110

    2019-11-23T13:03:59

    105

    2019-11-23T13:04:02

    105.4918033

    2019-11-23T13:05:00

    115

    2019-11-23T13:05:02

    (void)

    2019-11-23T13:05:22

    (void)

    12.9. PIVOT

    A PIVOT creates a new table whose columns are selected timestamps and the product (pivot column value) X (selected columns). It’s often combined with ASOF, to treat pivot each column value as a table and align their content to the ASOF reference table. PIVOT queries supports lookup for pivot and pivoted columns. Usual filters (PREWHERE, WHERE and HAVING) and GROUP BY clauses are supported.

    12.9.1. Examples

    Assuming this table

    sells

    Timestamp

    Country

    Id

    2010-01-01

    France

    14

    2011-01-01

    Italy

    42

    2012-01-01

    Italy

    30

    2013-01-01

    France

    28

    2014-01-01

    France

    65

    A first simple query can be done:

    SELECT $timestamp, Country, Id FROM sells PIVOT ON Country ORDER BY Id;

    Timestamp

    France.Id

    Italy.Id

    2010-01-01

    14

    (void)

    2013-01-01

    28

    (void)

    2012-01-01

    (void)

    30

    2011-01-01

    (void)

    42

    2014-01-01

    65

    (void)

    Now, assuming these three tables

    ref_table

    Timestamp

    Label

    2017-01-01

    A

    2018-01-01

    B

    2019-01-01

    C

    north_sells

    Timestamp

    Vendor

    Benefits

    2017-01-01

    Bob

    10

    2018-01-01

    Alice

    20

    2019-01-01

    Bob

    15

    2020-01-01

    Alice

    40

    east_sells

    Timestamp

    Vendor

    Benefits

    2017-01-01

    Carol

    30

    2018-01-01

    Dave

    5

    2019-01-01

    Dave

    10

    A PIVOT ASOF query can be done:

    SELECT $timestamp, label, vendor, benefits FROM ref_table LEFT ASOF JOIN north_sells, east_sells PIVOT ON vendor;

    Timestamp

    Label

    Alice.benefits

    Bob.benefits

    Carol.benefits

    Dave.benefits

    2017-01-01

    A

    (void)

    10

    30

    (void)

    2018-01-01

    B

    20

    10

    30

    5

    2019-01-01

    C

    20

    15

    30

    10

    With a PREWHERE and GROUP BY clause, we can get average benefits on several years of some vendors:

    SELECT $timestamp, vendor, avg(benefits) FROM ref_table LEFT ASOF JOIN north_sells, east_sells IN RANGE(2016, +4y) PREWHERE vendor != 'Alice' GROUP BY 2y, vendor PIVOT ON vendor;

    Timestamp

    arithmetic_mean(Bob.benefits)

    arithmetic_mean(Carol.benefits)

    arithmetic_mean(Dave.benefits)

    2016-01-01

    10

    30

    (void)

    2018-01-01

    12.5

    30

    7.5