4.2. Creating OHLC charts with Quasar

4.2. Creating OHLC charts with Quasar

In finance, order book data is captured for a variety of purposes. In this guide we will walk to one specific common use case for finance, converting raw order / trade data into OHLC charts.

The guide will cover:

  • Modeling order and trade data with Quasar;

  • Using Quasar’s query language to convert raw orderbook data to OHLC in real-time;

  • Using the Quasar Pandas API for ingestion and querying.

Completing this guide will take 30-60 minutes.

4.2.1. Preparation

If you wish to run this guide locally, there are two ways to prepare your local development environment:

4.2.1.1. Docker

We have prepared a pre-loaded Docker image which contains everything needed to run practice guide. To get started, please launch the bureau14/howto-crypto-ohlc Docker container as follows:

$ docker run -ti --net host bureau14/howto-crypto-ohlc:3.13.1
Launching QuasarDB in background..
Launching Jupyter lab...
[I 13:20:59.346 NotebookApp] Writing notebook server cookie secret to /home/qdb/.local/share/jupyter/runtime/notebook_cookie_secret
[I 13:20:59.501 NotebookApp] Serving notebooks from local directory: /work/notebook
[I 13:20:59.501 NotebookApp] Jupyter Notebook 6.4.6 is running at:
[I 13:20:59.501 NotebookApp] http://localhost:8888/?token=...
[I 13:20:59.501 NotebookApp]  or http://127.0.0.1:8888/?token=...

You can now navigate with your browser to the URL provided by the Jupyter notebook and continue with this exercise.

4.2.1.2. Standalone installation

4.2.1.2.1. Install and launch QuasarDB

Please install & launch QuasarDB for your environment; the free community edition is sufficient.

For more information and instructions on how to install QuasarDB, please follow our installation guide.

4.2.1.2.2. Download this Jupyter notebook

You can download this notebook prepackaged from this location:

crypto-ohlc-env.tar.gz

Please download this, and extract it in a folder on your local machine.

4.2.1.2.3. Prepare Python environment

Assuming you have downloaded and extracted the Jupyter notebook, please install your environment as follows:

# Install requirements

$ python3 -m pip install -r requirements.txt

# Launch local notebook

$ jupyter notebook ./crypto-ohlc.ipynb

[I 17:16:02.496 NotebookApp] Serving notebooks from local directory: /home/user/qdb-guides/
[I 17:16:02.496 NotebookApp] Jupyter Notebook 6.4.6 is running at:
[I 17:16:02.496 NotebookApp] http://localhost:8888/?token=...
[I 17:16:02.496 NotebookApp]  or http://127.0.0.1:8888/?token=...
[I 17:16:02.496 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).

A new jupyter notebook should automatically open, otherwise please manually navigate and you can navigate your browser to http://localhost:8888/ with the environment prepared.

You are now completed with the preparations.


4.2.2. Open-High-Low-Close

In this tutorial, we will be working with raw orderbook market data: for each mutation to the orderbook, we receive and store a new record in the database.

Example mutations that can occur:

  • A new order is submitted to the exchange;

  • A trade is made between two parties;

  • An order is cancelled.

The volume of this data is typically huge. In order to make analysis and visualization of this data more feasible, this raw data is often converted into an Open-High-Low-Close (OHLC) shape.

According to the Wikipedia page about OHLC charts:

An open-high-low-close chart (also OHLC) is a type of chart typically used to illustrate movements in the price of a financial instrument over time. Each vertical line on the chart shows the price range (the highest and lowest prices) over one unit of time, e.g., one day or one hour.

Tick marks project from each side of the line indicating the opening price (e.g., for a daily bar chart this would be the starting price for that day) on the left, and the closing price for that time period on the right.

image.png

For us, this means we want to take the raw market data and process each symbol as follows:

  • Aggregate the market data into regular intervals, e.g. 1 minute;

  • For each interval, aggregate each of the OHLC dimensions.

In this guide you will learn how to ingest, model, and query this data.


4.2.3. Imports and setups

Now that we have explained what OHLC data is, we can start working on the actual solution.

First let’s go through some boilerplate to import libraries and establish a connection with the QuasarDB cluster.

[1]:
import datetime
import json
import random
import pprint
import copy
from tqdm import tqdm

import requests
import numpy as np
import pandas as pd
import quasardb
import quasardb.pandas as qdbpd

# Utilities for plotting waveforms
from utils import plot_stock, plot_ohlc
pp = pprint.PrettyPrinter(indent=2, sort_dicts=False)

def get_conn():
    return quasardb.Cluster("qdb://127.0.0.1:2836")

# This connection will be used for the remainder of the notebook
conn = get_conn()
Loading BokehJS ...

4.2.4. Load sample data

For this demo, we will pull live data from the public Coinbase API. We will be focusing on trade data only, but the same concepts apply to order data as well.

[2]:
# Pull live trade data from Coinbase
def pull_data(product, max_pages=25):
    after = None

    for i in tqdm(range(max_pages)):
        if after is None:
            url = 'https://api.pro.coinbase.com/products/{}/trades'.format(product)
        else:
            url = 'https://api.pro.coinbase.com/products/{}/trades?after={}'.format(product, after)

        r = requests.get(url, timeout=15)
        results = r.json()
        for result in results:
            yield result

        if not r.headers.get('cb-after'):
            break
        else:
            after = r.headers['cb-after']

product = 'BTC-USD'
df = pd.DataFrame(pull_data(product, max_pages=100)) # decrease max_pages to pull less data
df
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:32<00:00,  3.09it/s]
[2]:
time trade_id price size side
0 2022-01-10T16:13:05.32308Z 261952992 41151.00000000 0.01214698 sell
1 2022-01-10T16:13:05.273736Z 261952991 41151.00000000 0.00116054 sell
2 2022-01-10T16:13:05.257686Z 261952990 41151.00000000 0.00992776 sell
3 2022-01-10T16:13:05.257686Z 261952989 41150.90000000 0.00100000 sell
4 2022-01-10T16:13:05.257686Z 261952988 41150.08000000 0.00100000 sell
... ... ... ... ... ...
99995 2022-01-10T14:18:52.928663Z 261852997 40000.00000000 0.00995025 buy
99996 2022-01-10T14:18:52.928663Z 261852996 40000.00000000 0.00237249 buy
99997 2022-01-10T14:18:52.928663Z 261852995 40000.00000000 0.00125000 buy
99998 2022-01-10T14:18:52.928663Z 261852994 40000.00000000 0.01617267 buy
99999 2022-01-10T14:18:52.928663Z 261852993 40000.00000000 0.10000000 buy

100000 rows × 5 columns

4.2.5. Ingesting into Quasar

We can now ingest this data into Quasar. We can make use of the QuasarDB Pandas integration to easily store the dataframe as a table inside the database.

To do this, we need to perform the following steps:

  • Ensure that each trade timestamp is the dataframe’s index;

  • Ensure that each timestamp is a nanosecond precision datetime64;

  • Create a new table for this product’s trades, which we will name products/btcusd/trades.

  • Load the data into the table.

[3]:
# Ensure our timestamps are of type datetime64[ns]
df['time'] = pd.to_datetime(df['time'])

# Price and size as well
df = df.astype({'price': 'float64',
                'size': 'float64'})


# Update our dataframe index
df = df.set_index('time', drop=True)
df
[3]:
trade_id price size side
time
2022-01-10 16:13:05.323080+00:00 261952992 41151.00 0.012147 sell
2022-01-10 16:13:05.273736+00:00 261952991 41151.00 0.001161 sell
2022-01-10 16:13:05.257686+00:00 261952990 41151.00 0.009928 sell
2022-01-10 16:13:05.257686+00:00 261952989 41150.90 0.001000 sell
2022-01-10 16:13:05.257686+00:00 261952988 41150.08 0.001000 sell
... ... ... ... ...
2022-01-10 14:18:52.928663+00:00 261852997 40000.00 0.009950 buy
2022-01-10 14:18:52.928663+00:00 261852996 40000.00 0.002372 buy
2022-01-10 14:18:52.928663+00:00 261852995 40000.00 0.001250 buy
2022-01-10 14:18:52.928663+00:00 261852994 40000.00 0.016173 buy
2022-01-10 14:18:52.928663+00:00 261852993 40000.00 0.100000 buy

100000 rows × 4 columns

[4]:
# We can now use the QuasarDB pandas interface to create the table and store the data.
#
# To ensure we don't insert the same data multiple times, let's first drop the table if
# it already existed.
table = conn.table('products/btcusd/trades')

try:
    table.remove()
    print("removed table {}".format(table))
except:
    # Table did not yet exist
    pass

# By setting create=True, we explicitly tell the Pandas API to create the table if
# it did not yet exist.
qdbpd.write_dataframe(df, conn, table, create=True)

print("inserted dataframe")
inserted dataframe
[5]:
# We can now use the same pandas API to read back the data
qdbpd.read_dataframe(table)
[5]:
trade_id price size side
2022-01-10 14:18:52.928663 261855692 40000.00 0.118255 buy
2022-01-10 14:18:52.928663 261855691 40000.00 0.046940 buy
2022-01-10 14:18:52.928663 261855690 40000.00 0.464255 buy
2022-01-10 14:18:52.928663 261855689 40000.00 0.002500 buy
2022-01-10 14:18:52.928663 261855688 40000.00 0.009900 buy
... ... ... ... ...
2022-01-10 16:13:05.257686 261952990 41151.00 0.009928 sell
2022-01-10 16:13:05.257686 261952989 41150.90 0.001000 sell
2022-01-10 16:13:05.257686 261952988 41150.08 0.001000 sell
2022-01-10 16:13:05.273736 261952991 41151.00 0.001161 sell
2022-01-10 16:13:05.323080 261952992 41151.00 0.012147 sell

100000 rows × 4 columns

4.2.6. Querying the data

As the data is now inserted into Quasar, we can now query it using the query interface.

We’ll first do a simple plot of all recent trade prices.

[6]:
q = """
SELECT
  $timestamp,
  price
FROM
  "products/btcusd/trades"
"""
df = qdbpd.query(conn, q)
plot_stock(df, x_axis_type='datetime')
[6]:

<Bokeh Notebook handle for In[6]>

We can turn this data into a classic candlestick OHLC chart in realtime by making using of Quasar’s aggregation functionality. Below is an overview of what Quasar aggregation to use for each of the OHLC features:

Feature

Function

Open

FIRST()

High

MAX()

Low

MIN()

Close

LAST()

When using aggregation functions, we need to add two conditions to the query:

  • A “group by” qualifier, that determines what groups to aggregate on. In our case, we will want to specify a time duration, e.g. “5min” if we want OHLC charts with 5 minute buckets;

  • A timerange qualifier: this tells Quasar the timerange we are interested in, which is required when using aggregations.

The query we will use will look like:

SELECT
   $timestamp,
   FIRST(price),
   MAX(price),
   MIN(price),
   LAST(price)
FROM ..
IN RANGE <timespan>
GROUP BY <duration>

We can then turn the results of this query into OHLC candlesticks as demonstrated in the code below:

[7]:
q = """
SELECT
   $timestamp,
   FIRST(price) AS open,
   MAX(price) AS high,
   MIN(price) AS low,
   LAST(price) AS close
FROM "products/btcusd/trades"
IN RANGE (NOW(), -6h)
GROUP BY 5min
"""
df = qdbpd.query(conn, q).set_index('$timestamp')
plot_ohlc(df)
[7]:

<Bokeh Notebook handle for In[7]>

4.2.7. Conclusion

And that’s all you need to do to get the data in a shape that allows for drawing candlesticks.

In this guide we covered:

  • How to interact with QuasarDB’s pandas API;

  • How to create tables, ingest data and query the data;

  • How to perform aggregations necessary to render OHLC charts.