5.7. How to handle corporate actions#
In trading, a corporate action is an event initiated by a publicly traded company that could affect a stock’s price.
According to investopedia:
A corporate action is any activity that brings material change to an organization and impacts its stakeholders, including shareholders, both common and preferred, as well as bondholders. These events are generally approved by the company’s board of directors; shareholders may be permitted to vote on some events as well. Some corporate actions require shareholders to submit a response.
Examples of corporate actions include stock splits, dividends and acquisitions.
Accounting for corporate actions in your database is a common data modelling issue.
In this guide, we will:
Explain the implications of corporate actions on stock trading data;
Propose a model on how to account for corporate actions;
Demonstrate queries with price adjusted for corporate actions.
We will be using Python, Pandas, and plotly. You can achieve similar results with any programming language or tool that can query Quasar.
5.7.1. Preparation#
5.7.1.1. Install and launch QuasarDB#
Please install & launch QuasarDB for your environment; the free community edition is sufficient.
For more information and instructions on how to install QuasarDB, please refer to our installation guide.
5.7.1.2. Prepare your Jupyter notebook#
Please follow the steps below to create and set up your Jupyter notebook:
# Install requirements
$ python3 -m pip install jupyter \
pandas \
numpy \
quasardb
# Launch local notebook
$ jupyter notebook ./qdb-howto.ipynb
[I 17:16:02.496 NotebookApp] Serving notebooks from local directory: /home/user/qdb-howto/
[I 17:16:02.496 NotebookApp] Jupyter Notebook 6.4.6 is running at:
[I 17:16:02.496 NotebookApp] http://localhost:8888/?token=...
[I 17:16:02.496 NotebookApp] or http://127.0.0.1:8888/?token=...
[I 17:16:02.496 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
A new jupyter notebook should automatically open, otherwise please manually navigate and you can navigate your browser to http://localhost:8888/ with the environment prepared.
You are now completed with the preparations.
5.7.2. A stock’s price#
For this example, we’ll use Microsoft’s dividend of Feb 16, 2022 as a case study. For this, we have prepared an OHLC dataset of public MSFT stock data. Let’s grab this data:
import pandas as pd
import requests
import io
def grab_csv(fname):
url = 'https://doc.quasar.ai/howto/corporate-actions/{}'.format(fname)
data = requests.get(url, timeout=15).content
df = pd.read_csv(io.StringIO(data.decode('utf-8')))
df = df.rename(columns={'date': '$timestamp'})
df = df.set_index('$timestamp')
df = df.reindex()
return df
ohlc = grab_csv('msft.csv')
ohlc
$timestamp |
open |
high |
low |
close |
volume |
---|---|---|---|---|---|
2022-02-07 |
306.170013 |
307.839996 |
299.899994 |
300.950012 |
28533300 |
2022-02-08 |
301.250000 |
305.559998 |
299.950012 |
304.559998 |
32421200 |
2022-02-09 |
309.869995 |
311.929993 |
307.390015 |
311.209991 |
31284700 |
2022-02-10 |
304.040009 |
309.119995 |
300.700012 |
302.380005 |
45386200 |
2022-02-11 |
303.190002 |
304.290009 |
294.220001 |
295.040009 |
39175600 |
2022-02-14 |
293.769989 |
296.760010 |
291.350006 |
295.000000 |
36359500 |
2022-02-15 |
300.010010 |
300.799988 |
297.019989 |
300.470001 |
27058300 |
2022-02-16 |
298.369995 |
300.869995 |
293.679993 |
299.500000 |
29982100 |
2022-02-17 |
296.359985 |
296.799988 |
290.000000 |
290.110011 |
32461600 |
2022-02-18 |
293.049988 |
293.859985 |
286.309998 |
287.309993 |
34264000 |
2022-02-22 |
285.000000 |
291.540009 |
284.500000 |
287.100001 |
41736100 |
2022-02-23 |
290.179993 |
291.700012 |
280.100006 |
279.649989 |
37811200 |
2022-02-24 |
272.510010 |
295.160004 |
271.519989 |
293.969996 |
56989700 |
2022-02-25 |
295.140015 |
297.630005 |
291.649994 |
296.689998 |
32546700 |
Even though we know that a $0.62 dividend payout occurred on Feb 16, you cannot see it in this data: that is because what you’re looking at are the prices adjusted for corporate actions.
Often, you will want to go back in time and see the unadjusted price: the price without any adjustments for corporate actions. How can we achieve this?
5.7.3. Storing price adjustments#
Let’s assume it’s early February, Microsoft has announced they will pay out $0.62 dividend per share after trading closes on Feb 16, and you are tasked with preparing accordingly. The prices you have in your database at that point are, of course, the unadjusted prices.
There are a few approaches we can take here:
After trading closes on Feb 16, we will update our entire history and adjust all prices/volumes.
For every row, we store a “price” and an “adjusted_price”: we modify our ingestion engine to be aware of this stock split, and activate it after trading closes on Feb 16;
We keep track of all corporate actions in a separate table, and if the user desires, calculate the adjusted price on the fly.
Generally, we discourage the first option: not only is it fragile and time-sensitive, but you will not be able to easily recalculate the unadjusted, original price based on it. This is a problem for e.g. back-testing algorithms, and as any corporate action may have an influence on the price, you generally want to be aware if and when this happened.
The second option could work, but increases complexity: instead of just recording that an adjustment happened at a certain date, it now becomes logic we need to maintain in our ingestion process. This increases the chance of bugs, and really makes things more complicated than they should be.
As such, we recommend the third option: for each stock, we keep track of the prices in one table, and the adjustments in another table.
The next section describes a data model for this approach.
5.7.4. Types of corporate actions#
There are many types of corporate actions, but generally speaking, they can be classified as follows:
Is it a multiplicative corporate action (e.g. stock splits), or additive (e.g. dividend) ?
Does it apply only to price, or both price and volume?
With these classifications in mind, we can model all corporate actions using the following table structure:
CREATE TABLE msft_adjustments (
price_add DOUBLE,
price_mult DOUBLE,
volume_add DOUBLE,
volume_mult DOUBLE)
For each corporate action, we have a single value on how it affects the price or volume.
For example:
As a $0.62 dividend is additive and only affects price, we would only set
price_add=0.62
, and keep the rest of the values unset / null.A 5:1 stock split is multiplicative, and affects both price and volume. As such, we would set
price_mult=0.2
andvolume_mult=5
.
5.7.5. Implementing the data model#
5.7.5.1. A single adjustment#
Let’s first represent our dividend as a table:
import numpy as np
adjustments = [{'$timestamp': np.datetime64('2022-02-16', 'ns'),
'price_add': 0.62}]
adjustments = pd.DataFrame(adjustments).set_index('$timestamp').reindex()
adjustments
$timestamp |
price_add |
---|---|
2022-02-16 |
0.62 |
We now insert both the OHLC data as well as the adjustments as two separate tables: microsoft.ohlc
and microsoft.adjustments
. For this, we’ll establish a connection with QuasarDB, and use our Pandas integration to directly write dataframes into Quasar:
import quasardb
import quasardb.pandas as qdbpd
conn = quasardb.Cluster('qdb://127.0.0.1:2836')
try:
conn.table('microsoft.ohlc').remove()
conn.table('microsoft.adjustments').remove()
except:
pass
# Write OHLC table to `microsoft.ohlc`, and automatically create the table
qdbpd.write_dataframe(ohlc, conn, conn.table('microsoft.ohlc'), create=True)
# Explicitly create the adjustments table to ensure we have the correct columns
cols = [quasardb.ColumnInfo(quasardb.ColumnType.Double, "price_add"),
quasardb.ColumnInfo(quasardb.ColumnType.Double, "price_mult"),
quasardb.ColumnInfo(quasardb.ColumnType.Double, "volume_add"),
quasardb.ColumnInfo(quasardb.ColumnType.Double, "volume_mult")]
conn.table('microsoft.adjustments').create(cols)
# Write corporate adjustments table to `microsoft.adjustments`, do not
# automatically create the table
qdbpd.write_dataframe(adjustments, conn, conn.table('microsoft.adjustments'), create=False)
For good measure, let’s query both of these to verify the data is stored correctly. For this we will use the read_dataframe
function, which reads an entire table as a dataframe:
qdbpd.read_dataframe(conn.table('microsoft.ohlc'))
open |
high |
low |
close |
volume |
|
---|---|---|---|---|---|
2022-02-07 |
306.170013 |
307.839996 |
299.899994 |
300.950012 |
28533300 |
2022-02-08 |
301.250000 |
305.559998 |
299.950012 |
304.559998 |
32421200 |
2022-02-09 |
309.869995 |
311.929993 |
307.390015 |
311.209991 |
31284700 |
2022-02-10 |
304.040009 |
309.119995 |
300.700012 |
302.380005 |
45386200 |
2022-02-11 |
303.190002 |
304.290009 |
294.220001 |
295.040009 |
39175600 |
2022-02-14 |
293.769989 |
296.760010 |
291.350006 |
295.000000 |
36359500 |
2022-02-15 |
300.010010 |
300.799988 |
297.019989 |
300.470001 |
27058300 |
2022-02-16 |
298.369995 |
300.869995 |
293.679993 |
299.500000 |
29982100 |
2022-02-17 |
296.359985 |
296.799988 |
290.000000 |
290.110011 |
32461600 |
2022-02-18 |
293.049988 |
293.859985 |
286.309998 |
287.309993 |
34264000 |
2022-02-22 |
285.000000 |
291.540009 |
284.500000 |
287.100001 |
41736100 |
2022-02-23 |
290.179993 |
291.700012 |
280.100006 |
279.649989 |
37811200 |
2022-02-24 |
272.510010 |
295.160004 |
271.519989 |
293.969996 |
56989700 |
2022-02-25 |
295.140015 |
297.630005 |
291.649994 |
296.689998 |
32546700 |
Looks good, the corporate adjustments:
qdbpd.read_dataframe(conn.table('microsoft.adjustments'))
price_add |
price_mult |
volume_add |
volume_mult |
|
---|---|---|---|---|
2022-02-16 |
0.62 |
NaN |
NaN |
NaN |
Great! Now on to the good parts, how can we combine these two?
For this, we will use an ASOF JOIN. Effectively, what we will do:
For every row in the OHLC table, determine the most recent “adjustments” value;
Dynamically calculate the adjusted price based on the adjustments.
Let’s first join the data together:
q = """
SELECT
$timestamp,
"microsoft.ohlc".close AS price,
"microsoft.adjustments".price_add
FROM "microsoft.ohlc"
LEFT ASOF JOIN "microsoft.adjustments"
ORDER BY $timestamp DESC
"""
df = qdbpd.query(conn, q, index='$timestamp')
df
price |
microsoft.adjustments.price_add |
|
---|---|---|
2022-02-25 |
296.689998 |
0.62 |
2022-02-24 |
293.969996 |
0.62 |
2022-02-23 |
279.649989 |
0.62 |
2022-02-22 |
287.100001 |
0.62 |
2022-02-18 |
287.309993 |
0.62 |
2022-02-17 |
290.110011 |
0.62 |
2022-02-16 |
299.500000 |
0.62 |
2022-02-15 |
300.470001 |
NaN |
2022-02-14 |
295.000000 |
NaN |
2022-02-11 |
295.040009 |
NaN |
2022-02-10 |
302.380005 |
NaN |
2022-02-09 |
311.209991 |
NaN |
2022-02-08 |
304.559998 |
NaN |
2022-02-07 |
300.950012 |
NaN |
Great, we have the price adjustment aligned next to the close price. Next, we’ll use some arithmetic to dynamically calculate the adjusted price:
q = """
SELECT
$timestamp,
"microsoft.ohlc".close AS price,
("microsoft.ohlc".close + "microsoft.adjustments".price_add) AS adjusted_price
FROM "microsoft.ohlc"
LEFT ASOF JOIN "microsoft.adjustments"
ORDER BY $timestamp DESC
"""
df = qdbpd.query(conn, q, index='$timestamp')
df
price |
adjusted_price |
|
---|---|---|
2022-02-25 |
296.689998 |
297.309998 |
2022-02-24 |
293.969996 |
294.589996 |
2022-02-23 |
279.649989 |
280.269989 |
2022-02-22 |
287.100001 |
287.720001 |
2022-02-18 |
287.309993 |
287.929993 |
2022-02-17 |
290.110011 |
290.730011 |
2022-02-16 |
299.500000 |
300.120000 |
2022-02-15 |
300.470001 |
300.470001 |
2022-02-14 |
295.000000 |
295.000000 |
2022-02-11 |
295.040009 |
295.040009 |
2022-02-10 |
302.380005 |
302.380005 |
2022-02-09 |
311.209991 |
311.209991 |
2022-02-08 |
304.559998 |
304.559998 |
2022-02-07 |
300.950012 |
300.950012 |
As you can see, ASOF JOIN is an elegant and effective tool to handle this.
5.7.5.2. Multiple adjustments#
Does this work with multiple corporate adjustments? Let’s assume, hypothetically, what would happen if we had a second adjustment a few days later, what would the adjustment value be when doing an ASOF JOIN?
# Insert additional adjustment of $0.15 at 2022-02-18
q = """
INSERT INTO \"microsoft.adjustments\" (
$timestamp,
price_add)
VALUES (
2022-02-18,
0.15)
"""
qdbpd.query(conn, q)
q = """
SELECT
$timestamp,
"microsoft.ohlc".close AS price,
"microsoft.adjustments".price_add
FROM "microsoft.ohlc"
LEFT ASOF JOIN "microsoft.adjustments"
ORDER BY $timestamp DESC
"""
df = qdbpd.query(conn, q, index='$timestamp')
df
price |
microsoft.adjustments.price_add |
|
---|---|---|
2022-02-25 |
296.689998 |
0.15 |
2022-02-24 |
293.969996 |
0.15 |
2022-02-23 |
279.649989 |
0.15 |
2022-02-22 |
287.100001 |
0.15 |
2022-02-18 |
287.309993 |
0.15 |
2022-02-17 |
290.110011 |
0.62 |
2022-02-16 |
299.500000 |
0.62 |
2022-02-15 |
300.470001 |
NaN |
2022-02-14 |
295.000000 |
NaN |
2022-02-11 |
295.040009 |
NaN |
2022-02-10 |
302.380005 |
NaN |
2022-02-09 |
311.209991 |
NaN |
2022-02-08 |
304.559998 |
NaN |
2022-02-07 |
300.950012 |
NaN |
That’s not exactly what we’re looking for: it appears as if the join just picks the last value at a certain point in time, but what we’re looking for are cumulative adjustments.
At this point, we have two options:
Instead of recording just the individual corporate actions, record the cumulative values instead: for example, rather than storing $0.15 in the example above, store $0.62 + $0.15 = $0.77.
This will work, but it’s not as elegant and increases chance on error.
Dynamically accumulate the sum of all price adjustments.
As we would prefer the latter option, we’re in luck: QuasarDB has the ability to dynamically accumulate the sums using the CUMSUM
function.
Let’s see how it works by applying it on the adjustments table:
q = """
SELECT
$timestamp,
price_add,
CUMSUM(price_add)
FROM "microsoft.adjustments"
"""
df = qdbpd.query(conn, q, index='$timestamp')
df
price_add |
cumulative_sum(price_add) |
|
---|---|---|
2022-02-16 |
0.62 |
0.62 |
2022-02-18 |
0.15 |
0.77 |
Exactly what we’re looking for! Now, let’s integrate this into our ASOF JOIN to get a fully dynamic acucmulation of all price adjustments.
q = """
SELECT
$timestamp,
"microsoft.ohlc".close AS price,
"microsoft.ohlc".close + CUMSUM("microsoft.adjustments".price_add) AS adusted_price
FROM "microsoft.ohlc"
LEFT ASOF JOIN "microsoft.adjustments"
ORDER BY $timestamp DESC
"""
df = qdbpd.query(conn, q, index='$timestamp')
df
price |
adusted_price |
|
---|---|---|
2022-02-25 |
296.689998 |
297.459998 |
2022-02-24 |
293.969996 |
294.739996 |
2022-02-23 |
279.649989 |
280.419989 |
2022-02-22 |
287.100001 |
287.870001 |
2022-02-18 |
287.309993 |
288.079993 |
2022-02-17 |
290.110011 |
290.730011 |
2022-02-16 |
299.500000 |
300.120000 |
2022-02-15 |
300.470001 |
300.470001 |
2022-02-14 |
295.000000 |
295.000000 |
2022-02-11 |
295.040009 |
295.040009 |
2022-02-10 |
302.380005 |
302.380005 |
2022-02-09 |
311.209991 |
311.209991 |
2022-02-08 |
304.559998 |
304.559998 |
2022-02-07 |
300.950012 |
300.950012 |
5.7.5.3. Multiplications#
In addition to simple price additions, we also have price multiplications. Now that we know how to do it, integrating these is straightforward by using the built-in CUMPROD
function.
The example below simulates two stock splits, where the price of each share drops, and the volume increases.
# First "seed" the multiplication columns, so that we know for sure they will start
# with 1.00 (instead of 0.00)
q = """
INSERT INTO \"microsoft.adjustments\" (
$timestamp,
price_mult,
volume_mult)
VALUES (
2022-02-01,
1.0,
1.0)
"""
qdbpd.query(conn, q)
# 2x stock split at 2022-02-21
q = """
INSERT INTO \"microsoft.adjustments\" (
$timestamp,
price_mult,
volume_mult)
VALUES (
2022-02-21,
0.5,
2.0)
"""
qdbpd.query(conn, q)
# 4x stock split at 2022-02-23
q = """
INSERT INTO \"microsoft.adjustments\" (
$timestamp,
price_mult,
volume_mult)
VALUES (
2022-02-23,
0.25,
4.0)
"""
qdbpd.query(conn, q)
# For completeness, let's inspect all our adjustments
df = qdbpd.query(conn, "SELECT * FROM \"microsoft.adjustments\"")
df
$timestamp |
$table |
price_add |
price_mult |
volume_add |
volume_mult |
|
---|---|---|---|---|---|---|
0 |
2022-02-01 |
microsoft.adjustments |
NaN |
1.00 |
NaN |
1.0 |
1 |
2022-02-16 |
microsoft.adjustments |
0.62 |
NaN |
NaN |
NaN |
2 |
2022-02-18 |
microsoft.adjustments |
0.15 |
NaN |
NaN |
NaN |
3 |
2022-02-21 |
microsoft.adjustments |
NaN |
0.50 |
NaN |
2.0 |
4 |
2022-02-23 |
microsoft.adjustments |
NaN |
0.25 |
NaN |
4.0 |
We can see the CUMPROD
function at work by applying it over the price_mult
column:
q = """
SELECT
$timestamp,
price_mult,
CUMPROD(price_mult)
FROM "microsoft.adjustments"
"""
df = qdbpd.query(conn, q, index='$timestamp')
df
price_mult |
cumulative_product(price_mult) |
|
---|---|---|
2022-02-01 |
1.00 |
1.000 |
2022-02-16 |
NaN |
1.000 |
2022-02-18 |
NaN |
1.000 |
2022-02-21 |
0.50 |
0.500 |
2022-02-23 |
0.25 |
0.125 |
Now, let’s integrate this into our ASOF JOIN:
q = """
SELECT
$timestamp,
"microsoft.ohlc".close AS price,
("microsoft.ohlc".close
* CUMPROD("microsoft.adjustments".price_mult))
+ CUMSUM("microsoft.adjustments".price_add) AS adusted_price
FROM "microsoft.ohlc"
LEFT ASOF JOIN "microsoft.adjustments"
ORDER BY $timestamp DESC
"""
df = qdbpd.query(conn, q, index='$timestamp')
df
price |
adusted_price |
|
---|---|---|
2022-02-25 |
296.689998 |
37.856250 |
2022-02-24 |
293.969996 |
37.516250 |
2022-02-23 |
279.649989 |
35.726249 |
2022-02-22 |
287.100001 |
144.320001 |
2022-02-18 |
287.309993 |
288.079993 |
2022-02-17 |
290.110011 |
290.730011 |
2022-02-16 |
299.500000 |
300.120000 |
2022-02-15 |
300.470001 |
300.470001 |
2022-02-14 |
295.000000 |
295.000000 |
2022-02-11 |
295.040009 |
295.040009 |
2022-02-10 |
302.380005 |
302.380005 |
2022-02-09 |
311.209991 |
311.209991 |
2022-02-08 |
304.559998 |
304.559998 |
2022-02-07 |
300.950012 |
300.950012 |
As you can see, the adjustment is taken into account properly, and our price now reflects the stock splits.
5.7.5.4. Putting it all together#
To put it together, let’s apply the same logic to the volume
column to calculate the adjusted volume.
q = """
SELECT
$timestamp,
"microsoft.ohlc".close AS price,
("microsoft.ohlc".close
* CUMPROD("microsoft.adjustments".price_mult))
+ CUMSUM("microsoft.adjustments".price_add) AS adusted_price,
"microsoft.ohlc".volume AS volume,
("microsoft.ohlc".volume
* CUMPROD("microsoft.adjustments".volume_mult))
+ CUMSUM("microsoft.adjustments".volume_add) AS adusted_volume
FROM "microsoft.ohlc"
LEFT ASOF JOIN "microsoft.adjustments"
ORDER BY $timestamp DESC
"""
df = qdbpd.query(conn, q, index='$timestamp')
df
price |
adusted_price |
volume |
adusted_volume |
|
---|---|---|---|---|
2022-02-25 |
296.689998 |
37.856250 |
32546700 |
260373600.0 |
2022-02-24 |
293.969996 |
37.516250 |
56989700 |
455917600.0 |
2022-02-23 |
279.649989 |
35.726249 |
37811200 |
302489600.0 |
2022-02-22 |
287.100001 |
144.320001 |
41736100 |
83472200.0 |
2022-02-18 |
287.309993 |
288.079993 |
34264000 |
34264000.0 |
2022-02-17 |
290.110011 |
290.730011 |
32461600 |
32461600.0 |
2022-02-16 |
299.500000 |
300.120000 |
29982100 |
29982100.0 |
2022-02-15 |
300.470001 |
300.470001 |
27058300 |
27058300.0 |
2022-02-14 |
295.000000 |
295.000000 |
36359500 |
36359500.0 |
2022-02-11 |
295.040009 |
295.040009 |
39175600 |
39175600.0 |
2022-02-10 |
302.380005 |
302.380005 |
45386200 |
45386200.0 |
2022-02-09 |
311.209991 |
311.209991 |
31284700 |
31284700.0 |
2022-02-08 |
304.559998 |
304.559998 |
32421200 |
32421200.0 |
2022-02-07 |
300.950012 |
300.950012 |
28533300 |
28533300.0 |
5.7.6. Conclusion#
In this tutorial you learned:
how to integrate corporate actions into your data model as a separate “adjustments” table;
use
ASOF JOIN
to join both the adjustments and OHLC tables together;leverage the
CUMSUM
andCUMPROD
functions to aggregate all adjustments over time.