5.7. How to handle corporate actions#

In trading, a corporate action is an event initiated by a publicly traded company that could affect a stock’s price.

According to investopedia:

A corporate action is any activity that brings material change to an organization and impacts its stakeholders, including shareholders, both common and preferred, as well as bondholders. These events are generally approved by the company’s board of directors; shareholders may be permitted to vote on some events as well. Some corporate actions require shareholders to submit a response.

Examples of corporate actions include stock splits, dividends and acquisitions.

Accounting for corporate actions in your database is a common data modelling issue.

In this guide, we will:

Explain the implications of corporate actions on stock trading data;

Propose a model on how to account for corporate actions;

Demonstrate queries with price adjusted for corporate actions.

We will be using Python, Pandas, and plotly. You can achieve similar results with any programming language or tool that can query Quasar.

5.7.1. Preparation#

5.7.1.1. Install and launch QuasarDB#

Please install & launch QuasarDB for your environment; the free community edition is sufficient.

For more information and instructions on how to install QuasarDB, please refer to our installation guide.

5.7.1.2. Prepare your Jupyter notebook#

Please follow the steps below to create and set up your Jupyter notebook:

# Install requirements
$ python3 -m pip install jupyter \
                         pandas \
                         numpy \
                         quasardb

# Launch local notebook
$ jupyter notebook ./qdb-howto.ipynb

[I 17:16:02.496 NotebookApp] Serving notebooks from local directory: /home/user/qdb-howto/
[I 17:16:02.496 NotebookApp] Jupyter Notebook 6.4.6 is running at:
[I 17:16:02.496 NotebookApp] http://localhost:8888/?token=...
[I 17:16:02.496 NotebookApp]  or http://127.0.0.1:8888/?token=...
[I 17:16:02.496 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).

A new jupyter notebook should automatically open, otherwise please manually navigate and you can navigate your browser to http://localhost:8888/ with the environment prepared.

You are now completed with the preparations.

5.7.2. A stock’s price#

For this example, we’ll use Microsoft’s dividend of Feb 16, 2022 as a case study. For this, we have prepared an OHLC dataset of public MSFT stock data. Let’s grab this data:

import pandas as pd
import requests
import io

def grab_csv(fname):
   url = 'https://doc.quasar.ai/howto/corporate-actions/{}'.format(fname)
   data = requests.get(url, timeout=15).content
   df = pd.read_csv(io.StringIO(data.decode('utf-8')))
   df = df.rename(columns={'date': '$timestamp'})
   df = df.set_index('$timestamp')
   df = df.reindex()
   return df

ohlc = grab_csv('msft.csv')
ohlc

$timestamp	open	high	low	close	volume
2022-02-07	306.170013	307.839996	299.899994	300.950012	28533300
2022-02-08	301.250000	305.559998	299.950012	304.559998	32421200
2022-02-09	309.869995	311.929993	307.390015	311.209991	31284700
2022-02-10	304.040009	309.119995	300.700012	302.380005	45386200
2022-02-11	303.190002	304.290009	294.220001	295.040009	39175600
2022-02-14	293.769989	296.760010	291.350006	295.000000	36359500
2022-02-15	300.010010	300.799988	297.019989	300.470001	27058300
2022-02-16	298.369995	300.869995	293.679993	299.500000	29982100
2022-02-17	296.359985	296.799988	290.000000	290.110011	32461600
2022-02-18	293.049988	293.859985	286.309998	287.309993	34264000
2022-02-22	285.000000	291.540009	284.500000	287.100001	41736100
2022-02-23	290.179993	291.700012	280.100006	279.649989	37811200
2022-02-24	272.510010	295.160004	271.519989	293.969996	56989700
2022-02-25	295.140015	297.630005	291.649994	296.689998	32546700

Even though we know that a $0.62 dividend payout occurred on Feb 16, you cannot see it in this data: that is because what you’re looking at are the prices adjusted for corporate actions.

Often, you will want to go back in time and see the unadjusted price: the price without any adjustments for corporate actions. How can we achieve this?

5.7.3. Storing price adjustments#

Let’s assume it’s early February, Microsoft has announced they will pay out $0.62 dividend per share after trading closes on Feb 16, and you are tasked with preparing accordingly. The prices you have in your database at that point are, of course, the unadjusted prices.

There are a few approaches we can take here:

After trading closes on Feb 16, we will update our entire history and adjust all prices/volumes.

For every row, we store a “price” and an “adjusted_price”: we modify our ingestion engine to be aware of this stock split, and activate it after trading closes on Feb 16;

We keep track of all corporate actions in a separate table, and if the user desires, calculate the adjusted price on the fly.

Generally, we discourage the first option: not only is it fragile and time-sensitive, but you will not be able to easily recalculate the unadjusted, original price based on it. This is a problem for e.g. back-testing algorithms, and as any corporate action may have an influence on the price, you generally want to be aware if and when this happened.

The second option could work, but increases complexity: instead of just recording that an adjustment happened at a certain date, it now becomes logic we need to maintain in our ingestion process. This increases the chance of bugs, and really makes things more complicated than they should be.

As such, we recommend the third option: for each stock, we keep track of the prices in one table, and the adjustments in another table.

The next section describes a data model for this approach.

5.7.4. Types of corporate actions#

There are many types of corporate actions, but generally speaking, they can be classified as follows:

Is it a multiplicative corporate action (e.g. stock splits), or additive (e.g. dividend) ?

Does it apply only to price, or both price and volume?

With these classifications in mind, we can model all corporate actions using the following table structure:

CREATE TABLE msft_adjustments (
   price_add DOUBLE,
   price_mult DOUBLE,
   volume_add DOUBLE,
   volume_mult DOUBLE)

For each corporate action, we have a single value on how it affects the price or volume.

For example:

As a $0.62 dividend is additive and only affects price, we would only set price_add=0.62, and keep the rest of the values unset / null.

A 5:1 stock split is multiplicative, and affects both price and volume. As such, we would set price_mult=0.2 and volume_mult=5.

5.7.5. Implementing the data model#

5.7.5.1. A single adjustment#

Let’s first represent our dividend as a table:

import numpy as np

adjustments = [{'$timestamp': np.datetime64('2022-02-16', 'ns'),
                'price_add': 0.62}]
adjustments = pd.DataFrame(adjustments).set_index('$timestamp').reindex()
adjustments

$timestamp	price_add
2022-02-16	0.62

We now insert both the OHLC data as well as the adjustments as two separate tables: microsoft.ohlc and microsoft.adjustments. For this, we’ll establish a connection with QuasarDB, and use our Pandas integration to directly write dataframes into Quasar:

import quasardb
import quasardb.pandas as qdbpd

conn = quasardb.Cluster('qdb://127.0.0.1:2836')
try:
  conn.table('microsoft.ohlc').remove()
  conn.table('microsoft.adjustments').remove()
except:
  pass

# Write OHLC table to `microsoft.ohlc`, and automatically create the table
qdbpd.write_dataframe(ohlc, conn, conn.table('microsoft.ohlc'), create=True)

# Explicitly create the adjustments table to ensure we have the correct columns
cols = [quasardb.ColumnInfo(quasardb.ColumnType.Double, "price_add"),
        quasardb.ColumnInfo(quasardb.ColumnType.Double, "price_mult"),
        quasardb.ColumnInfo(quasardb.ColumnType.Double, "volume_add"),
        quasardb.ColumnInfo(quasardb.ColumnType.Double, "volume_mult")]
conn.table('microsoft.adjustments').create(cols)

# Write corporate adjustments table to `microsoft.adjustments`, do not
# automatically create the table
qdbpd.write_dataframe(adjustments, conn, conn.table('microsoft.adjustments'), create=False)

For good measure, let’s query both of these to verify the data is stored correctly. For this we will use the read_dataframe function, which reads an entire table as a dataframe:

qdbpd.read_dataframe(conn.table('microsoft.ohlc'))

	open	high	low	close	volume
2022-02-07	306.170013	307.839996	299.899994	300.950012	28533300
2022-02-08	301.250000	305.559998	299.950012	304.559998	32421200
2022-02-09	309.869995	311.929993	307.390015	311.209991	31284700
2022-02-10	304.040009	309.119995	300.700012	302.380005	45386200
2022-02-11	303.190002	304.290009	294.220001	295.040009	39175600
2022-02-14	293.769989	296.760010	291.350006	295.000000	36359500
2022-02-15	300.010010	300.799988	297.019989	300.470001	27058300
2022-02-16	298.369995	300.869995	293.679993	299.500000	29982100
2022-02-17	296.359985	296.799988	290.000000	290.110011	32461600
2022-02-18	293.049988	293.859985	286.309998	287.309993	34264000
2022-02-22	285.000000	291.540009	284.500000	287.100001	41736100
2022-02-23	290.179993	291.700012	280.100006	279.649989	37811200
2022-02-24	272.510010	295.160004	271.519989	293.969996	56989700
2022-02-25	295.140015	297.630005	291.649994	296.689998	32546700

Looks good, the corporate adjustments:

qdbpd.read_dataframe(conn.table('microsoft.adjustments'))

	price_add	price_mult	volume_add	volume_mult
2022-02-16	0.62	NaN	NaN	NaN

Great! Now on to the good parts, how can we combine these two?

For this, we will use an ASOF JOIN. Effectively, what we will do:

For every row in the OHLC table, determine the most recent “adjustments” value;

Dynamically calculate the adjusted price based on the adjustments.

Let’s first join the data together:

q = """
SELECT
   $timestamp,
   "microsoft.ohlc".close AS price,
   "microsoft.adjustments".price_add
FROM "microsoft.ohlc"
LEFT ASOF JOIN "microsoft.adjustments"
ORDER BY $timestamp DESC
"""
df = qdbpd.query(conn, q, index='$timestamp')
df

	price	microsoft.adjustments.price_add
2022-02-25	296.689998	0.62
2022-02-24	293.969996	0.62
2022-02-23	279.649989	0.62
2022-02-22	287.100001	0.62
2022-02-18	287.309993	0.62
2022-02-17	290.110011	0.62
2022-02-16	299.500000	0.62
2022-02-15	300.470001	NaN
2022-02-14	295.000000	NaN
2022-02-11	295.040009	NaN
2022-02-10	302.380005	NaN
2022-02-09	311.209991	NaN
2022-02-08	304.559998	NaN
2022-02-07	300.950012	NaN

Great, we have the price adjustment aligned next to the close price. Next, we’ll use some arithmetic to dynamically calculate the adjusted price:

q = """
SELECT
   $timestamp,
   "microsoft.ohlc".close AS price,
   ("microsoft.ohlc".close + "microsoft.adjustments".price_add) AS adjusted_price
FROM "microsoft.ohlc"
LEFT ASOF JOIN "microsoft.adjustments"
ORDER BY $timestamp DESC
"""
df = qdbpd.query(conn, q, index='$timestamp')
df

	price	adjusted_price
2022-02-25	296.689998	297.309998
2022-02-24	293.969996	294.589996
2022-02-23	279.649989	280.269989
2022-02-22	287.100001	287.720001
2022-02-18	287.309993	287.929993
2022-02-17	290.110011	290.730011
2022-02-16	299.500000	300.120000
2022-02-15	300.470001	300.470001
2022-02-14	295.000000	295.000000
2022-02-11	295.040009	295.040009
2022-02-10	302.380005	302.380005
2022-02-09	311.209991	311.209991
2022-02-08	304.559998	304.559998
2022-02-07	300.950012	300.950012

As you can see, ASOF JOIN is an elegant and effective tool to handle this.

5.7.5.2. Multiple adjustments#

Does this work with multiple corporate adjustments? Let’s assume, hypothetically, what would happen if we had a second adjustment a few days later, what would the adjustment value be when doing an ASOF JOIN?

# Insert additional adjustment of $0.15 at 2022-02-18
q = """
INSERT INTO \"microsoft.adjustments\" (
    $timestamp,
    price_add)
VALUES (
    2022-02-18,
    0.15)
"""
qdbpd.query(conn, q)

q = """
SELECT
   $timestamp,
   "microsoft.ohlc".close AS price,
   "microsoft.adjustments".price_add
FROM "microsoft.ohlc"
LEFT ASOF JOIN "microsoft.adjustments"
ORDER BY $timestamp DESC
"""
df = qdbpd.query(conn, q, index='$timestamp')
df

	price	microsoft.adjustments.price_add
2022-02-25	296.689998	0.15
2022-02-24	293.969996	0.15
2022-02-23	279.649989	0.15
2022-02-22	287.100001	0.15
2022-02-18	287.309993	0.15
2022-02-17	290.110011	0.62
2022-02-16	299.500000	0.62
2022-02-15	300.470001	NaN
2022-02-14	295.000000	NaN
2022-02-11	295.040009	NaN
2022-02-10	302.380005	NaN
2022-02-09	311.209991	NaN
2022-02-08	304.559998	NaN
2022-02-07	300.950012	NaN

That’s not exactly what we’re looking for: it appears as if the join just picks the last value at a certain point in time, but what we’re looking for are cumulative adjustments.

At this point, we have two options:

Instead of recording just the individual corporate actions, record the cumulative values instead: for example, rather than storing $0.15 in the example above, store $0.62 + $0.15 = $0.77.

This will work, but it’s not as elegant and increases chance on error.

Dynamically accumulate the sum of all price adjustments.

As we would prefer the latter option, we’re in luck: QuasarDB has the ability to dynamically accumulate the sums using the CUMSUM function.

Let’s see how it works by applying it on the adjustments table:

q = """
SELECT
   $timestamp,
   price_add,
   CUMSUM(price_add)
FROM "microsoft.adjustments"
"""
df = qdbpd.query(conn, q, index='$timestamp')
df

	price_add	cumulative_sum(price_add)
2022-02-16	0.62	0.62
2022-02-18	0.15	0.77

Exactly what we’re looking for! Now, let’s integrate this into our ASOF JOIN to get a fully dynamic acucmulation of all price adjustments.

q = """
SELECT
   $timestamp,
   "microsoft.ohlc".close AS price,
   "microsoft.ohlc".close + CUMSUM("microsoft.adjustments".price_add) AS adusted_price
FROM "microsoft.ohlc"
LEFT ASOF JOIN "microsoft.adjustments"
ORDER BY $timestamp DESC
"""
df = qdbpd.query(conn, q, index='$timestamp')
df

	price	adusted_price
2022-02-25	296.689998	297.459998
2022-02-24	293.969996	294.739996
2022-02-23	279.649989	280.419989
2022-02-22	287.100001	287.870001
2022-02-18	287.309993	288.079993
2022-02-17	290.110011	290.730011
2022-02-16	299.500000	300.120000
2022-02-15	300.470001	300.470001
2022-02-14	295.000000	295.000000
2022-02-11	295.040009	295.040009
2022-02-10	302.380005	302.380005
2022-02-09	311.209991	311.209991
2022-02-08	304.559998	304.559998
2022-02-07	300.950012	300.950012

5.7.5.3. Multiplications#

In addition to simple price additions, we also have price multiplications. Now that we know how to do it, integrating these is straightforward by using the built-in CUMPROD function.

The example below simulates two stock splits, where the price of each share drops, and the volume increases.

# First "seed" the multiplication columns, so that we know for sure they will start
# with 1.00 (instead of 0.00)
q = """
INSERT INTO \"microsoft.adjustments\" (
    $timestamp,
    price_mult,
    volume_mult)
VALUES (
    2022-02-01,
    1.0,
    1.0)
"""
qdbpd.query(conn, q)

# 2x stock split at 2022-02-21
q = """
INSERT INTO \"microsoft.adjustments\" (
    $timestamp,
    price_mult,
    volume_mult)
VALUES (
    2022-02-21,
    0.5,
    2.0)
"""
qdbpd.query(conn, q)

# 4x stock split at 2022-02-23
q = """
INSERT INTO \"microsoft.adjustments\" (
    $timestamp,
    price_mult,
    volume_mult)
VALUES (
    2022-02-23,
    0.25,
    4.0)
"""
qdbpd.query(conn, q)

# For completeness, let's inspect all our adjustments
df = qdbpd.query(conn, "SELECT * FROM \"microsoft.adjustments\"")
df

	$timestamp	$table	price_add	price_mult	volume_add	volume_mult
0	2022-02-01	microsoft.adjustments	NaN	1.00	NaN	1.0
1	2022-02-16	microsoft.adjustments	0.62	NaN	NaN	NaN
2	2022-02-18	microsoft.adjustments	0.15	NaN	NaN	NaN
3	2022-02-21	microsoft.adjustments	NaN	0.50	NaN	2.0
4	2022-02-23	microsoft.adjustments	NaN	0.25	NaN	4.0

We can see the CUMPROD function at work by applying it over the price_mult column:

q = """
SELECT
   $timestamp,
   price_mult,
   CUMPROD(price_mult)
FROM "microsoft.adjustments"
"""
df = qdbpd.query(conn, q, index='$timestamp')
df

	price_mult	cumulative_product(price_mult)
2022-02-01	1.00	1.000
2022-02-16	NaN	1.000
2022-02-18	NaN	1.000
2022-02-21	0.50	0.500
2022-02-23	0.25	0.125

Now, let’s integrate this into our ASOF JOIN:

q = """
SELECT
   $timestamp,
   "microsoft.ohlc".close AS price,
   ("microsoft.ohlc".close
     * CUMPROD("microsoft.adjustments".price_mult))
    + CUMSUM("microsoft.adjustments".price_add) AS adusted_price
FROM "microsoft.ohlc"
LEFT ASOF JOIN "microsoft.adjustments"
ORDER BY $timestamp DESC
"""
df = qdbpd.query(conn, q, index='$timestamp')
df

	price	adusted_price
2022-02-25	296.689998	37.856250
2022-02-24	293.969996	37.516250
2022-02-23	279.649989	35.726249
2022-02-22	287.100001	144.320001
2022-02-18	287.309993	288.079993
2022-02-17	290.110011	290.730011
2022-02-16	299.500000	300.120000
2022-02-15	300.470001	300.470001
2022-02-14	295.000000	295.000000
2022-02-11	295.040009	295.040009
2022-02-10	302.380005	302.380005
2022-02-09	311.209991	311.209991
2022-02-08	304.559998	304.559998
2022-02-07	300.950012	300.950012

As you can see, the adjustment is taken into account properly, and our price now reflects the stock splits.

5.7.5.4. Putting it all together#

To put it together, let’s apply the same logic to the volume column to calculate the adjusted volume.

q = """
SELECT
   $timestamp,

   "microsoft.ohlc".close AS price,
   ("microsoft.ohlc".close
     * CUMPROD("microsoft.adjustments".price_mult))
    + CUMSUM("microsoft.adjustments".price_add) AS adusted_price,

   "microsoft.ohlc".volume AS volume,
   ("microsoft.ohlc".volume
     * CUMPROD("microsoft.adjustments".volume_mult))
    + CUMSUM("microsoft.adjustments".volume_add) AS adusted_volume

FROM "microsoft.ohlc"
LEFT ASOF JOIN "microsoft.adjustments"
ORDER BY $timestamp DESC
"""
df = qdbpd.query(conn, q, index='$timestamp')
df

	price	adusted_price	volume	adusted_volume
2022-02-25	296.689998	37.856250	32546700	260373600.0
2022-02-24	293.969996	37.516250	56989700	455917600.0
2022-02-23	279.649989	35.726249	37811200	302489600.0
2022-02-22	287.100001	144.320001	41736100	83472200.0
2022-02-18	287.309993	288.079993	34264000	34264000.0
2022-02-17	290.110011	290.730011	32461600	32461600.0
2022-02-16	299.500000	300.120000	29982100	29982100.0
2022-02-15	300.470001	300.470001	27058300	27058300.0
2022-02-14	295.000000	295.000000	36359500	36359500.0
2022-02-11	295.040009	295.040009	39175600	39175600.0
2022-02-10	302.380005	302.380005	45386200	45386200.0
2022-02-09	311.209991	311.209991	31284700	31284700.0
2022-02-08	304.559998	304.559998	32421200	32421200.0
2022-02-07	300.950012	300.950012	28533300	28533300.0

5.7.6. Conclusion#

In this tutorial you learned:

how to integrate corporate actions into your data model as a separate “adjustments” table;

use ASOF JOIN to join both the adjustments and OHLC tables together;

leverage the CUMSUM and CUMPROD functions to aggregate all adjustments over time.