5.4. Crypto triangular arbitrage using ASOF JOIN#

From Investopedia:

Triangular arbitrage is the result of a discrepancy between three foreign currencies that occurs when the currency’s exchange rates do not exactly match up.

In this guide we are going to see how you can leverage advanced joining capabilities of Quasar to find triangular arbitrage opportunities in cryptocurrencies.

More precisely, in this guide, we will:

  • Craft a suitable data model for cryptocurrency trade data

  • Capture and store the data into a Quasar cluster

  • Query and plot trade data for a single cryptocurrency

  • Query, align, and plot trade data for multiple single cryptocurrencies

We will be using Python, Pandas, and plotly. You can achieve similar results with any programming language or tool that can query Quasar.

5.4.1. Preparation#

5.4.1.1. Install and launch QuasarDB#

Please install & launch QuasarDB for your environment; the free community edition is sufficient.

For more information and instructions on how to install QuasarDB, please refer to our installation guide.

5.4.1.2. Prepare your Jupyter notebook#

Please follow the steps below to create and set up your Jupyter notebook:

# Install requirements
$ python3 -m pip install jupyter \
                         pandas \
                         numpy \
                         quasardb

# Launch local notebook
$ jupyter notebook ./qdb-howto.ipynb

[I 17:16:02.496 NotebookApp] Serving notebooks from local directory: /home/user/qdb-howto/
[I 17:16:02.496 NotebookApp] Jupyter Notebook 6.4.6 is running at:
[I 17:16:02.496 NotebookApp] http://localhost:8888/?token=...
[I 17:16:02.496 NotebookApp]  or http://127.0.0.1:8888/?token=...
[I 17:16:02.496 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).

A new jupyter notebook should automatically open, otherwise please manually navigate and you can navigate your browser to http://localhost:8888/ with the environment prepared.

You are now completed with the preparations.

5.4.2. Triangular arbitrage#

Triangular arbitrage is done by analyzing the discrepancy between three currencies. We will look at USD, BTC, and ETH in our example.

In theory, converting from ETH to USD, should be equivalent to converting from ETH to BTC, and then BTC to USD. However, in carefully analyzing rates, one can find opportunities that result in small, consistent, and low-risk profit.

To find these opportunities, you need to analyze changes in rates and execute trades as soon as the discrepancy is large enough to profit.

This guide will focus on collecting data and ensuring the values are aligned to plot curves of each rate to find arbitrage opportunities. We will not take into account fees.

Warning

This guide is for educational purposes only. In particular, it doesn’t take into account trading fees and relies on trade data. We offer no guarantee regarding its accuracy or completeness.

5.4.3. Getting started#

Coinbase offers a free, public API to pull realtime trade data from. For this demonstration, we will be retrieving realtime trade data for three different pairs:

  • BTC-USD

  • ETH-BTC

  • ETH-USD

Here is how we define a function to load the data for a single symbol into a Pandas Dataframe:

import numpy as np
import pandas as pd
import requests
import io

def pull_coinbase(product, max_pages=25):
    after = None

    xs = list()

    for i in range(max_pages):
        if after is None:
            url = 'https://api.pro.coinbase.com/products/{}/trades'.format(product)
        else:
            url = 'https://api.pro.coinbase.com/products/{}/trades?after={}'.format(product, after)

        r = requests.get(url, timeout=30)
        xs.extend(r.json())

        if not r.headers.get('cb-after'):
            break
        else:
            after = r.headers['cb-after']


    # Coinbase sometimes emits non-JSON rows for some reason, remove those
    is_dict = lambda x: isinstance(x, dict)
    xs_ = filter(is_dict, xs)

    df = pd.DataFrame(xs_)
    # Convert 'time' column to timezone-naive format
    df['time'] = pd.to_datetime(df['time']).dt.tz_localize(None)

    dtypes = {'time': np.dtype('datetime64[ns]'),
              'price': np.dtype('float64'),
              'size': np.dtype('float64'),
              'side': np.dtype('U')}

    return df.astype(dtypes).set_index('time').reindex()

We can invoke this function to load BTC-USD symbol data:

btcusd = pull_coinbase('BTC-USD')
btcusd

trade_id

side

size

price

time

2023-12-06 22:08:59.571599

583806221

buy

0.000628

43916.52

2023-12-06 22:08:59.571599

583806220

buy

0.000372

43916.52

2023-12-06 22:08:59.516604

583806219

buy

0.018400

43916.52

2023-12-06 22:08:57.972088

583806218

sell

0.010839

43916.37

2023-12-06 22:08:57.971081

583806217

buy

0.002289

43914.11

2023-12-06 20:35:25.073088

583781226

sell

0.009953

44015.73

2023-12-06 20:35:25.068514

583781225

buy

0.043600

44014.39

2023-12-06 20:35:25.058233

583781224

buy

0.068242

44015.72

2023-12-06 20:35:24.958960

583781223

buy

0.043600

44017.20

2023-12-06 20:35:24.923809

583781222

buy

0.069991

44017.54

This data looks like this:

import plotly.express as px
fig = px.line(x=btcusd.index, y=btcusd.price)
fig.update_layout(template='plotly_dark',
                  title='BTC-USD')

fig.show()
../../_images/plot-1.png

Before we can get started with ingesting this data into Quasar, we must import the required libraries and establish a connection with the database.

5.4.4. Writing to Quasar#

import quasardb
import quasardb.pandas as qdbpd

conn = quasardb.Cluster('qdb://127.0.0.1:2836')
conn
<quasardb.quasardb.Cluster at 0x7f33cb3f35b0>

The QuasarDB pandas API, imported as qdbpd, has a write_dataframe function we can leverage to store a Pandas dataframe directly into QuasarDB. The following code shows you how to use it to write the data the dataframe into a table called btcusd:

table = conn.table('btcusd')
qdbpd.write_dataframe(btcusd,
                      conn,
                      table,
                      # Automatically create the table if not found
                      create=True)

In addition to writing (and reading) dataframes from entire tables, the QuasarDB pandas API exposes the query function which can be used to use our query language. For example, we can select all data from the btcusd table using the following query:

SELECT * FROM btcusd

In Pandas, we can do this as follows:

df = qdbpd.query(conn, "SELECT * FROM btcusd")
df

$timestamp

$table

trade_id

side

size

price

0

2023-12-06 20:35:24.923809

btcusd

583781222

buy

0.069991

44017.54

1

2023-12-06 20:35:24.958960

btcusd

583781223

buy

0.043600

44017.20

2

2023-12-06 20:35:25.058233

btcusd

583781224

buy

0.068242

44015.72

3

2023-12-06 20:35:25.068514

btcusd

583781225

buy

0.043600

44014.39

4

2023-12-06 20:35:25.073088

btcusd

583781226

sell

0.009953

44015.73

24995

2023-12-06 22:08:57.971081

btcusd

583806217

buy

0.002289

43914.11

24996

2023-12-06 22:08:57.972088

btcusd

583806218

sell

0.010839

43916.37

24997

2023-12-06 22:08:59.516604

btcusd

583806219

buy

0.018400

43916.52

24998

2023-12-06 22:08:59.571599

btcusd

583806220

buy

0.000372

43916.52

24999

2023-12-06 22:08:59.571599

btcusd

583806221

buy

0.000628

43916.52

As you can see above, the trade data is indeed available in this table.

We now have loaded the data into Quasar. We can repeat this process for the two other products we want to track, ETH-USD and ETH-BTC:

xs = [('ethusd', 'ETH-USD'),
      ('ethbtc', 'ETH-BTC')]

for (tablename, symbol) in xs:
    qdbpd.write_dataframe(pull_coinbase(symbol),
                          conn,
                          conn.table(tablename),
                          create=True)

So now we have three tables: btcusd, ethusd and ethbtc.

We can use an asof join between these tables as follows:

SELECT
  "$timestamp",
  btcusd.price * ethbtc.price AS btcethusd,
  ethusd.price AS ethusd
FROM btcusd
LEFT ASOF JOIN ethusd, ethbtc

This will instruct QuasarDB to join the data as follows:

  • For each row of btcusd, get the $timestamp.

  • Look for the last known price inside ethusd as of $timestamp

  • Look for the last known price inside ethbtc as of $timestamp

  • Calculate the price of acquiring ETH by first buying BTC for USD and then acquiring ETH for BTC;

  • Merge the results into a single row.

This will then return a dataframe that looks like this:

import plotly.graph_objects as go

q = """
SELECT
  $timestamp,
  btcusd.price * ethbtc.price AS btcethusd,
  ethusd.price AS ethusd
FROM ethusd
LEFT ASOF JOIN btcusd, ethbtc
WHERE
  price IS NOT NULL
"""
df = qdbpd.query(conn, q).set_index('$timestamp').reindex()

fig = go.Figure()
fig.add_trace(go.Scatter(x=df.index, y=df['btcethusd'], name='ETH-BTC-USD'))
fig.add_trace(go.Scatter(x=df.index, y=df['ethusd'], name='ETH-USD'))
fig.update_layout(template='plotly_dark',
                  title='ETH-USD vs ETH-BTC-USD')
fig.show()
../../_images/plot-2.png

5.4.5. Conclusion#

Aligning timeseries is crucial for many financial analyses, and triangular arbitrage is no exception.

Thanks to the built-in support for ASOF JOINS, Quasar enables you to quickly and efficiently join multiple timeseries.

The approach we used in this guide will work for any timeseries dataset; Quasar will optimize the query on the fly.

5.4.5.1. Further readings#