5.7. How to handle corporate actions#

In trading, a corporate action is an event initiated by a publicly traded company that could affect a stock’s price.

According to investopedia:

A corporate action is any activity that brings material change to an organization and impacts its stakeholders, including shareholders, both common and preferred, as well as bondholders. These events are generally approved by the company’s board of directors; shareholders may be permitted to vote on some events as well. Some corporate actions require shareholders to submit a response.

Examples of corporate actions include stock splits, dividends and acquisitions.

Accounting for corporate actions in your database is a common data modelling issue.

In this guide, we will:

  • Explain the implications of corporate actions on stock trading data;

  • Propose a model on how to account for corporate actions;

  • Demonstrate queries with price adjusted for corporate actions.

We will be using Python, Pandas, and plotly. You can achieve similar results with any programming language or tool that can query Quasar.

5.7.1. Preparation#

5.7.1.1. Install and launch QuasarDB#

Please install & launch QuasarDB for your environment; the free community edition is sufficient.

For more information and instructions on how to install QuasarDB, please refer to our installation guide.

5.7.1.2. Prepare your Jupyter notebook#

Please follow the steps below to create and set up your Jupyter notebook:

# Install requirements
$ python3 -m pip install jupyter \
                         pandas \
                         numpy \
                         quasardb

# Launch local notebook
$ jupyter notebook ./qdb-howto.ipynb

[I 17:16:02.496 NotebookApp] Serving notebooks from local directory: /home/user/qdb-howto/
[I 17:16:02.496 NotebookApp] Jupyter Notebook 6.4.6 is running at:
[I 17:16:02.496 NotebookApp] http://localhost:8888/?token=...
[I 17:16:02.496 NotebookApp]  or http://127.0.0.1:8888/?token=...
[I 17:16:02.496 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).

A new jupyter notebook should automatically open, otherwise please manually navigate and you can navigate your browser to http://localhost:8888/ with the environment prepared.

You are now completed with the preparations.

5.7.2. A stock’s price#

For this example, we’ll use Microsoft’s dividend of Feb 16, 2022 as a case study. For this, we have prepared an OHLC dataset of public MSFT stock data. Let’s grab this data:

import pandas as pd
import requests
import io

def grab_csv(fname):
   url = 'https://doc.quasar.ai/howto/corporate-actions/{}'.format(fname)
   data = requests.get(url, timeout=15).content
   df = pd.read_csv(io.StringIO(data.decode('utf-8')))
   df = df.rename(columns={'date': '$timestamp'})
   df = df.set_index('$timestamp')
   df = df.reindex()
   return df

ohlc = grab_csv('msft.csv')
ohlc

$timestamp

open

high

low

close

volume

2022-02-07

306.170013

307.839996

299.899994

300.950012

28533300

2022-02-08

301.250000

305.559998

299.950012

304.559998

32421200

2022-02-09

309.869995

311.929993

307.390015

311.209991

31284700

2022-02-10

304.040009

309.119995

300.700012

302.380005

45386200

2022-02-11

303.190002

304.290009

294.220001

295.040009

39175600

2022-02-14

293.769989

296.760010

291.350006

295.000000

36359500

2022-02-15

300.010010

300.799988

297.019989

300.470001

27058300

2022-02-16

298.369995

300.869995

293.679993

299.500000

29982100

2022-02-17

296.359985

296.799988

290.000000

290.110011

32461600

2022-02-18

293.049988

293.859985

286.309998

287.309993

34264000

2022-02-22

285.000000

291.540009

284.500000

287.100001

41736100

2022-02-23

290.179993

291.700012

280.100006

279.649989

37811200

2022-02-24

272.510010

295.160004

271.519989

293.969996

56989700

2022-02-25

295.140015

297.630005

291.649994

296.689998

32546700

Even though we know that a $0.62 dividend payout occurred on Feb 16, you cannot see it in this data: that is because what you’re looking at are the prices adjusted for corporate actions.

Often, you will want to go back in time and see the unadjusted price: the price without any adjustments for corporate actions. How can we achieve this?

5.7.3. Storing price adjustments#

Let’s assume it’s early February, Microsoft has announced they will pay out $0.62 dividend per share after trading closes on Feb 16, and you are tasked with preparing accordingly. The prices you have in your database at that point are, of course, the unadjusted prices.

There are a few approaches we can take here:

  1. After trading closes on Feb 16, we will update our entire history and adjust all prices/volumes.

  2. For every row, we store a “price” and an “adjusted_price”: we modify our ingestion engine to be aware of this stock split, and activate it after trading closes on Feb 16;

  3. We keep track of all corporate actions in a separate table, and if the user desires, calculate the adjusted price on the fly.

Generally, we discourage the first option: not only is it fragile and time-sensitive, but you will not be able to easily recalculate the unadjusted, original price based on it. This is a problem for e.g. back-testing algorithms, and as any corporate action may have an influence on the price, you generally want to be aware if and when this happened.

The second option could work, but increases complexity: instead of just recording that an adjustment happened at a certain date, it now becomes logic we need to maintain in our ingestion process. This increases the chance of bugs, and really makes things more complicated than they should be.

As such, we recommend the third option: for each stock, we keep track of the prices in one table, and the adjustments in another table.

The next section describes a data model for this approach.

5.7.4. Types of corporate actions#

There are many types of corporate actions, but generally speaking, they can be classified as follows:

  • Is it a multiplicative corporate action (e.g. stock splits), or additive (e.g. dividend) ?

  • Does it apply only to price, or both price and volume?

With these classifications in mind, we can model all corporate actions using the following table structure:

CREATE TABLE msft_adjustments (
   price_add DOUBLE,
   price_mult DOUBLE,
   volume_add DOUBLE,
   volume_mult DOUBLE)

For each corporate action, we have a single value on how it affects the price or volume.

For example:

  • As a $0.62 dividend is additive and only affects price, we would only set price_add=0.62, and keep the rest of the values unset / null.

  • A 5:1 stock split is multiplicative, and affects both price and volume. As such, we would set price_mult=0.2 and volume_mult=5.

5.7.5. Implementing the data model#

5.7.5.1. A single adjustment#

Let’s first represent our dividend as a table:

import numpy as np

adjustments = [{'$timestamp': np.datetime64('2022-02-16', 'ns'),
                'price_add': 0.62}]
adjustments = pd.DataFrame(adjustments).set_index('$timestamp').reindex()
adjustments

$timestamp

price_add

2022-02-16

0.62

We now insert both the OHLC data as well as the adjustments as two separate tables: microsoft.ohlc and microsoft.adjustments. For this, we’ll establish a connection with QuasarDB, and use our Pandas integration to directly write dataframes into Quasar:

import quasardb
import quasardb.pandas as qdbpd

conn = quasardb.Cluster('qdb://127.0.0.1:2836')
try:
  conn.table('microsoft.ohlc').remove()
  conn.table('microsoft.adjustments').remove()
except:
  pass

# Write OHLC table to `microsoft.ohlc`, and automatically create the table
qdbpd.write_dataframe(ohlc, conn, conn.table('microsoft.ohlc'), create=True)

# Explicitly create the adjustments table to ensure we have the correct columns
cols = [quasardb.ColumnInfo(quasardb.ColumnType.Double, "price_add"),
        quasardb.ColumnInfo(quasardb.ColumnType.Double, "price_mult"),
        quasardb.ColumnInfo(quasardb.ColumnType.Double, "volume_add"),
        quasardb.ColumnInfo(quasardb.ColumnType.Double, "volume_mult")]
conn.table('microsoft.adjustments').create(cols)

# Write corporate adjustments table to `microsoft.adjustments`, do not
# automatically create the table
qdbpd.write_dataframe(adjustments, conn, conn.table('microsoft.adjustments'), create=False)

For good measure, let’s query both of these to verify the data is stored correctly. For this we will use the read_dataframe function, which reads an entire table as a dataframe:

qdbpd.read_dataframe(conn.table('microsoft.ohlc'))

open

high

low

close

volume

2022-02-07

306.170013

307.839996

299.899994

300.950012

28533300

2022-02-08

301.250000

305.559998

299.950012

304.559998

32421200

2022-02-09

309.869995

311.929993

307.390015

311.209991

31284700

2022-02-10

304.040009

309.119995

300.700012

302.380005

45386200

2022-02-11

303.190002

304.290009

294.220001

295.040009

39175600

2022-02-14

293.769989

296.760010

291.350006

295.000000

36359500

2022-02-15

300.010010

300.799988

297.019989

300.470001

27058300

2022-02-16

298.369995

300.869995

293.679993

299.500000

29982100

2022-02-17

296.359985

296.799988

290.000000

290.110011

32461600

2022-02-18

293.049988

293.859985

286.309998

287.309993

34264000

2022-02-22

285.000000

291.540009

284.500000

287.100001

41736100

2022-02-23

290.179993

291.700012

280.100006

279.649989

37811200

2022-02-24

272.510010

295.160004

271.519989

293.969996

56989700

2022-02-25

295.140015

297.630005

291.649994

296.689998

32546700

Looks good, the corporate adjustments:

qdbpd.read_dataframe(conn.table('microsoft.adjustments'))

price_add

price_mult

volume_add

volume_mult

2022-02-16

0.62

NaN

NaN

NaN

Great! Now on to the good parts, how can we combine these two?

For this, we will use an ASOF JOIN. Effectively, what we will do:

  • For every row in the OHLC table, determine the most recent “adjustments” value;

  • Dynamically calculate the adjusted price based on the adjustments.

Let’s first join the data together:

q = """
SELECT
   $timestamp,
   "microsoft.ohlc".close AS price,
   "microsoft.adjustments".price_add
FROM "microsoft.ohlc"
LEFT ASOF JOIN "microsoft.adjustments"
ORDER BY $timestamp DESC
"""
df = qdbpd.query(conn, q, index='$timestamp')
df

price

microsoft.adjustments.price_add

2022-02-25

296.689998

0.62

2022-02-24

293.969996

0.62

2022-02-23

279.649989

0.62

2022-02-22

287.100001

0.62

2022-02-18

287.309993

0.62

2022-02-17

290.110011

0.62

2022-02-16

299.500000

0.62

2022-02-15

300.470001

NaN

2022-02-14

295.000000

NaN

2022-02-11

295.040009

NaN

2022-02-10

302.380005

NaN

2022-02-09

311.209991

NaN

2022-02-08

304.559998

NaN

2022-02-07

300.950012

NaN

Great, we have the price adjustment aligned next to the close price. Next, we’ll use some arithmetic to dynamically calculate the adjusted price:

q = """
SELECT
   $timestamp,
   "microsoft.ohlc".close AS price,
   ("microsoft.ohlc".close + "microsoft.adjustments".price_add) AS adjusted_price
FROM "microsoft.ohlc"
LEFT ASOF JOIN "microsoft.adjustments"
ORDER BY $timestamp DESC
"""
df = qdbpd.query(conn, q, index='$timestamp')
df

price

adjusted_price

2022-02-25

296.689998

297.309998

2022-02-24

293.969996

294.589996

2022-02-23

279.649989

280.269989

2022-02-22

287.100001

287.720001

2022-02-18

287.309993

287.929993

2022-02-17

290.110011

290.730011

2022-02-16

299.500000

300.120000

2022-02-15

300.470001

300.470001

2022-02-14

295.000000

295.000000

2022-02-11

295.040009

295.040009

2022-02-10

302.380005

302.380005

2022-02-09

311.209991

311.209991

2022-02-08

304.559998

304.559998

2022-02-07

300.950012

300.950012

As you can see, ASOF JOIN is an elegant and effective tool to handle this.

5.7.5.2. Multiple adjustments#

Does this work with multiple corporate adjustments? Let’s assume, hypothetically, what would happen if we had a second adjustment a few days later, what would the adjustment value be when doing an ASOF JOIN?

# Insert additional adjustment of $0.15 at 2022-02-18
q = """
INSERT INTO \"microsoft.adjustments\" (
    $timestamp,
    price_add)
VALUES (
    2022-02-18,
    0.15)
"""
qdbpd.query(conn, q)

q = """
SELECT
   $timestamp,
   "microsoft.ohlc".close AS price,
   "microsoft.adjustments".price_add
FROM "microsoft.ohlc"
LEFT ASOF JOIN "microsoft.adjustments"
ORDER BY $timestamp DESC
"""
df = qdbpd.query(conn, q, index='$timestamp')
df

price

microsoft.adjustments.price_add

2022-02-25

296.689998

0.15

2022-02-24

293.969996

0.15

2022-02-23

279.649989

0.15

2022-02-22

287.100001

0.15

2022-02-18

287.309993

0.15

2022-02-17

290.110011

0.62

2022-02-16

299.500000

0.62

2022-02-15

300.470001

NaN

2022-02-14

295.000000

NaN

2022-02-11

295.040009

NaN

2022-02-10

302.380005

NaN

2022-02-09

311.209991

NaN

2022-02-08

304.559998

NaN

2022-02-07

300.950012

NaN

That’s not exactly what we’re looking for: it appears as if the join just picks the last value at a certain point in time, but what we’re looking for are cumulative adjustments.

At this point, we have two options:

  1. Instead of recording just the individual corporate actions, record the cumulative values instead: for example, rather than storing $0.15 in the example above, store $0.62 + $0.15 = $0.77.

This will work, but it’s not as elegant and increases chance on error.

  1. Dynamically accumulate the sum of all price adjustments.

As we would prefer the latter option, we’re in luck: QuasarDB has the ability to dynamically accumulate the sums using the CUMSUM function.

Let’s see how it works by applying it on the adjustments table:

q = """
SELECT
   $timestamp,
   price_add,
   CUMSUM(price_add)
FROM "microsoft.adjustments"
"""
df = qdbpd.query(conn, q, index='$timestamp')
df

price_add

cumulative_sum(price_add)

2022-02-16

0.62

0.62

2022-02-18

0.15

0.77

Exactly what we’re looking for! Now, let’s integrate this into our ASOF JOIN to get a fully dynamic acucmulation of all price adjustments.

q = """
SELECT
   $timestamp,
   "microsoft.ohlc".close AS price,
   "microsoft.ohlc".close + CUMSUM("microsoft.adjustments".price_add) AS adusted_price
FROM "microsoft.ohlc"
LEFT ASOF JOIN "microsoft.adjustments"
ORDER BY $timestamp DESC
"""
df = qdbpd.query(conn, q, index='$timestamp')
df

price

adusted_price

2022-02-25

296.689998

297.459998

2022-02-24

293.969996

294.739996

2022-02-23

279.649989

280.419989

2022-02-22

287.100001

287.870001

2022-02-18

287.309993

288.079993

2022-02-17

290.110011

290.730011

2022-02-16

299.500000

300.120000

2022-02-15

300.470001

300.470001

2022-02-14

295.000000

295.000000

2022-02-11

295.040009

295.040009

2022-02-10

302.380005

302.380005

2022-02-09

311.209991

311.209991

2022-02-08

304.559998

304.559998

2022-02-07

300.950012

300.950012

5.7.5.3. Multiplications#

In addition to simple price additions, we also have price multiplications. Now that we know how to do it, integrating these is straightforward by using the built-in CUMPROD function.

The example below simulates two stock splits, where the price of each share drops, and the volume increases.

# First "seed" the multiplication columns, so that we know for sure they will start
# with 1.00 (instead of 0.00)
q = """
INSERT INTO \"microsoft.adjustments\" (
    $timestamp,
    price_mult,
    volume_mult)
VALUES (
    2022-02-01,
    1.0,
    1.0)
"""
qdbpd.query(conn, q)

# 2x stock split at 2022-02-21
q = """
INSERT INTO \"microsoft.adjustments\" (
    $timestamp,
    price_mult,
    volume_mult)
VALUES (
    2022-02-21,
    0.5,
    2.0)
"""
qdbpd.query(conn, q)

# 4x stock split at 2022-02-23
q = """
INSERT INTO \"microsoft.adjustments\" (
    $timestamp,
    price_mult,
    volume_mult)
VALUES (
    2022-02-23,
    0.25,
    4.0)
"""
qdbpd.query(conn, q)

# For completeness, let's inspect all our adjustments
df = qdbpd.query(conn, "SELECT * FROM \"microsoft.adjustments\"")
df

$timestamp

$table

price_add

price_mult

volume_add

volume_mult

0

2022-02-01

microsoft.adjustments

NaN

1.00

NaN

1.0

1

2022-02-16

microsoft.adjustments

0.62

NaN

NaN

NaN

2

2022-02-18

microsoft.adjustments

0.15

NaN

NaN

NaN

3

2022-02-21

microsoft.adjustments

NaN

0.50

NaN

2.0

4

2022-02-23

microsoft.adjustments

NaN

0.25

NaN

4.0

We can see the CUMPROD function at work by applying it over the price_mult column:

q = """
SELECT
   $timestamp,
   price_mult,
   CUMPROD(price_mult)
FROM "microsoft.adjustments"
"""
df = qdbpd.query(conn, q, index='$timestamp')
df

price_mult

cumulative_product(price_mult)

2022-02-01

1.00

1.000

2022-02-16

NaN

1.000

2022-02-18

NaN

1.000

2022-02-21

0.50

0.500

2022-02-23

0.25

0.125

Now, let’s integrate this into our ASOF JOIN:

q = """
SELECT
   $timestamp,
   "microsoft.ohlc".close AS price,
   ("microsoft.ohlc".close
     * CUMPROD("microsoft.adjustments".price_mult))
    + CUMSUM("microsoft.adjustments".price_add) AS adusted_price
FROM "microsoft.ohlc"
LEFT ASOF JOIN "microsoft.adjustments"
ORDER BY $timestamp DESC
"""
df = qdbpd.query(conn, q, index='$timestamp')
df

price

adusted_price

2022-02-25

296.689998

37.856250

2022-02-24

293.969996

37.516250

2022-02-23

279.649989

35.726249

2022-02-22

287.100001

144.320001

2022-02-18

287.309993

288.079993

2022-02-17

290.110011

290.730011

2022-02-16

299.500000

300.120000

2022-02-15

300.470001

300.470001

2022-02-14

295.000000

295.000000

2022-02-11

295.040009

295.040009

2022-02-10

302.380005

302.380005

2022-02-09

311.209991

311.209991

2022-02-08

304.559998

304.559998

2022-02-07

300.950012

300.950012

As you can see, the adjustment is taken into account properly, and our price now reflects the stock splits.

5.7.5.4. Putting it all together#

To put it together, let’s apply the same logic to the volume column to calculate the adjusted volume.

q = """
SELECT
   $timestamp,

   "microsoft.ohlc".close AS price,
   ("microsoft.ohlc".close
     * CUMPROD("microsoft.adjustments".price_mult))
    + CUMSUM("microsoft.adjustments".price_add) AS adusted_price,

   "microsoft.ohlc".volume AS volume,
   ("microsoft.ohlc".volume
     * CUMPROD("microsoft.adjustments".volume_mult))
    + CUMSUM("microsoft.adjustments".volume_add) AS adusted_volume

FROM "microsoft.ohlc"
LEFT ASOF JOIN "microsoft.adjustments"
ORDER BY $timestamp DESC
"""
df = qdbpd.query(conn, q, index='$timestamp')
df

price

adusted_price

volume

adusted_volume

2022-02-25

296.689998

37.856250

32546700

260373600.0

2022-02-24

293.969996

37.516250

56989700

455917600.0

2022-02-23

279.649989

35.726249

37811200

302489600.0

2022-02-22

287.100001

144.320001

41736100

83472200.0

2022-02-18

287.309993

288.079993

34264000

34264000.0

2022-02-17

290.110011

290.730011

32461600

32461600.0

2022-02-16

299.500000

300.120000

29982100

29982100.0

2022-02-15

300.470001

300.470001

27058300

27058300.0

2022-02-14

295.000000

295.000000

36359500

36359500.0

2022-02-11

295.040009

295.040009

39175600

39175600.0

2022-02-10

302.380005

302.380005

45386200

45386200.0

2022-02-09

311.209991

311.209991

31284700

31284700.0

2022-02-08

304.559998

304.559998

32421200

32421200.0

2022-02-07

300.950012

300.950012

28533300

28533300.0

5.7.6. Conclusion#

In this tutorial you learned:

  • how to integrate corporate actions into your data model as a separate “adjustments” table;

  • use ASOF JOIN to join both the adjustments and OHLC tables together;

  • leverage the CUMSUM and CUMPROD functions to aggregate all adjustments over time.