Primer#
What is quasardb?#
quasardb is a column-oriented, distributed timeseries database.
It has been designed to reliably store large amounts of data, while delivering a performance level that enables analysts to work interactively.
Where would you want to use quasardb? Here are a couple of use cases:
Market data store for back testing and analysis
Trades store for compliance
Time deviation database for compliance
Sensors data repository for predictive maintenance
Monitoring database for large deployments
Shall we dance?#
To start a quasardb server, just run it! We provide packages for many platforms, but you can always work in a local directory where you manually extracted your quasardb binaries. We also support docker for your convenience.
Let’s assume we extracted the quasardb archive in a local directory. The default configuration listens on the localhost, port 2836. If you type:
$ docker run -d --name qdb-server bureau14/qdb
This will launch the QuasarDB daemon as a docker container. You should be able to see your container running:
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
716abb5bf98c bureau14/qdb "/opt/qdb/scripts/qd…" 2 minutes ago Up 2 minutes 0.0.0.0:2836->2836/tcp qdb-server
quasardb is typically used via its multi-language API. For the purpose of this introduction, we will restrict ourselves to the Python API (see APIs) and the shell (see quasardb shell). The shell also comes with a query language similar to SQL (see Query language).
You can now start a quasardb shell in another terminal:
$ docker run -ti --link qdb-server:qdb-server bureau14/qdbsh --cluster qdb://qdb-server:2836
Inserting timeseries data#
We’re going to create a single-column timeseries and insert data. The name of the timeseries will be stocks and the name of its unique column will be close. The type of the data we will insert in the column is 64-bit floating point numbers:
CREATE TABLE stocks (close DOUBLE)
This is the code you need to write:
import quasardb
import datetime
# connecting, default port is 2836
c = quasardb.Cluster("qdb://127.0.0.1:2836")
# creating a timeseries object
my_ts = c.ts("stocks")
# creating the timeseries in the database
# it will throw an error if the timeseries already exist
my_ts.create([quasardb.ColumnInfo(quasardb.ColumnType.Double, "close")])
You create the timeseries only once, like you create a table in a SQL database only once. If you wish to delete (drop) the whole timeseries, you do the following.
Using the query language in a shell:
DROP TABLE stocks
Using the Python API:
c = quasardb.Cluster("qdb://127.0.0.1:2836")
my_ts = c.ts("stocks")
# remove the whole timeseries, in one call
my_ts.remove()
Note
Operations performed on timeseries are transactional. quasardb uses MVCC and 2PC to provide you with a high level of reliability.
Let’s add 3 values into our timeseries using a simple query:
INSERT INTO stocks ($timestamp, close) VALUES (2017-01-01, 1.0), (2017-01-02, 2.0), (2017-01-03, 3.0)
Moreover, one can leverage the power of numpy and insert numpy arrays directly into the timeseries when using the Python interface:
import numpy as np
c = quasardb.Cluster("qdb://127.0.0.1:2836")
# we don't create the timeseries again, we create an object to access it
my_ts = c.ts("stocks")
# creating the timeseries in the database if it doesn't exists
if not my_ts.exists():
my_ts.create([quasardb.ColumnInfo(quasardb.ColumnType.Double, "close")])
# generate values:
#
# 2017-01-01 - 1.0
# 2017-01-02 - 2.0
# 2017-01-03 - 3.0
dates = np.arange(np.datetime64('2017-01-01'), np.datetime64('2017-01-04')).astype('datetime64[ns]')
values = np.arange(1.0, 4.0)
# insert directly the numpy arrays
my_ts.double_insert("close", dates, values)
In this example we added 3 points, 1 point per day, for simplicity sake. In the examples directory of the Python API, you will find several examples showcasing the different ways you can insert data into a quasardb cluster.
Working with the data#
The data we inserted with the Python API or the shell has been normalized and is now accessible from all other APIs. We will now use our shell to visualize it:
Note
Everything we do with the shell can be done via the API of your choice.
The following command:
qdbsh
If you fancy some color, and your terminal supports it, you may want to try:
qdbsh --color-output=yes
Will start the shell and the following prompt should be displayed. Keep in mind you must have the daemon running on the localhost, as by default the shell will attempt to connect to the localhost:
qdbsh >
Let’s first check that our timeseries exists:
qdbsh > show stocks
2 columns
0. timestamp - nanosecond timestamp
1. close - 64-bit double
We can also dump the content of our timeseries:
qdbsh > select * from stocks in range(2017-01-01, 2017-01-10)
timestamp close
--------------------------------------------
2017-01-01T00:00:00.000000000Z 1.000000
2017-01-02T00:00:00.000000000Z 2.000000
2017-01-03T00:00:00.000000000Z 3.000000
As you can see the timestamp allows for nanosecond precision. Time definition syntax in quasardb is very flexible:
qdbsh > select * from stocks in range(2017, +10d)
timestamp close
--------------------------------------------
2017-01-01T00:00:00.000000000Z 1.000000
2017-01-02T00:00:00.000000000Z 2.000000
2017-01-03T00:00:00.000000000Z 3.000000
You can narrow down your searches with the WHERE keyword, exactly as you would in a regular SQL query:
qdbsh > select * from stocks in range(2017, +10d) where close < 2
timestamp close
--------------------------------------------
2017-01-01T00:00:00.000000000Z 1.000000
The database is also able to perform server-side aggregations for maximum performance. For example, you can ask for the average value of a timeseries, without having to retrieve all the data. Aggregations are able to leverage the enhanced instruction set of your CPU, when available.
For example, we can request the arithmetic mean of our stocks for the same interval:
qdbsh > select arithmetic_mean(close) from stocks in range(2017, +10d)
timestamp arithmetic_mean(close)
------------------------------------------------------
2017-01-01T00:00:00.000000000Z 2.00
Here are some queries you can try for yourself:
Show the minimum and maximum open value for the last 20 years:
select min(open), max(open) from stocks_A in range(now(), -20y)Display the open and close of two different time series:
select open, close from stocks_A, stocks_B in range(2017, +10d)Daily averages over a year:
select arithmetic_mean(close) from stocks in range(2017, +1y) group by dayDisplay the last known value of a timeseries with respect to the timestamps of another:
select value from bids in range(2017, +1d) asof(trades)
Note
For a list of supported functions, see Query language.
Organizing your data#
When you start to have a lot of timeseries, you probably want to find them based on criteria different than the name.
Out of the box, all timeseries are searchable by prefix and suffix:
qdbsh > prefix_get sto 100
1. stocks
qdbsh > suffix_get cks 100
1. stocks
This will return the 100 first matches for entries starting with “sto”, and the 100 first matches for entries ending with “cks”.
However, sometimes you want to organize your timeseries according to arbitraty criteria. For example, for our stocks, we may want to say that it comes from the NYSE and that the currency is USD.
quasardb has a powerful feature named “tags” that enables you to tag timeseries and do reverse lookups based on those tags:
qdbsh > attach_tag stocks nyq
qdbsh > attach_tag stocks usd
Then you can look up based on those tags:
qdbsh > get_tagged nyq
1. stocks - timeseries
qdbsh > get_tagged usd
1. stocks - timeseries
It’s also possible to ask more complex questions, such as “give me everything that is tagged with ‘usd’ but not ‘nyq’”:
qdbsh >find(tag='usd' and not tag='nyq')
An entry matching the provided alias cannot be found.
Which is, in our case, the correct answer!
Can you inject the result of a find in a select? Of course you can:
qdbsh > select arithmetic_mean(close) from find(tag='nyq') in range(2017, +10d)
timestamp arithmetic_mean(close)
------------------------------------------------------
2017-01-01T00:00:00.000000000Z 2.00
That demo is nice, but what happens when I go to production?#
A fair question which has a simple answer: the size and configuration of the cluster has no impact on the client code. quasardb will take care of the sharding and distribution of the data transparently, whether you are writing to and reading from the database.
The only thing that may change is the connection string. For example if you have a cluster of four machines, your connection string can be:
c = quasardb.Cluster("qdb://127.0.0.1:2836,192.168.1.2:2836,192.168.1.3:2836,192.168.1.4:2836", timeout=datetime.timedelta(minutes=1))
or:
c = quasardb.Cluster("qdb://127.0.0.1:2836,192.168.1.2:2836,192.168.1.3:2836,192.168.1.4:2836", timeout=datetime.timedelta(minutes=1))
and even:
c = quasardb.Cluster("qdb://127.0.0.1:2836", timeout=datetime.timedelta(minutes=1))
That’s because the quasardb protocol has built-in discovery! Just give any node in the cluster and we take care of the rest. The more nodes the better as we can try another node if the one provided is down at the moment of the connection.
Going further#
We hope this quick tour left you wanting for more! quasardb is feature-rich yet simple to use and operate. If you want to go further, the best course of action is to start with the documentation of the API for the language of your choice (APIs).
If you’d like to learn more about building a quasardb cluster, head over to the administrative section (Server Administration).
Curious about the underlying concepts, we have a section dedicated to it (Internals).
Wrap up#
Things to remember about quasardb:
Fast and scalable
High-performance binary protocol
Multi-platform: FreeBSD, Linux 2.6+, OS X and Windows (32-bit and 64-bit)
Peer-to-peer network distribution
Transparent persistence
Native timeseries support
Distributed transactions
Rich typing
Tag-based search
Fire and forget: deploy, run and return to your core business.