5.7. How to handle corporate actions

    5.7. How to handle corporate actions

    In trading, a corporate action is an event initiated by a publicly traded company that could affect a stock’s price.

    According to investopedia:

    A corporate action is any activity that brings material change to an organization and impacts its stakeholders, including shareholders, both common and preferred, as well as bondholders. These events are generally approved by the company’s board of directors; shareholders may be permitted to vote on some events as well. Some corporate actions require shareholders to submit a response.

    Examples of corporate actions include stock splits, dividends and acquisitions.

    Accounting for corporate actions in your database is a common data modelling issue.

    In this guide, we will:

    • Explain the implications of corporate actions on stock trading data;

    • Propose a model on how to account for corporate actions;

    • Demonstrate queries with price adjusted for corporate actions.

    We will be using Python, Pandas, and plotly. You can achieve similar results with any programming language or tool that can query Quasar.

    5.7.1. Preparation

    5.7.1.1. Install and launch QuasarDB

    Please install & launch QuasarDB for your environment; the free community edition is sufficient.

    For more information and instructions on how to install QuasarDB, please refer to our installation guide.

    5.7.1.2. Prepare your Jupyter notebook

    Please follow the steps below to create and set up your Jupyter notebook:

    # Install requirements
    $ python3 -m pip install jupyter \
                             pandas \
                             numpy \
                             quasardb
    
    # Launch local notebook
    $ jupyter notebook ./qdb-howto.ipynb
    
    [I 17:16:02.496 NotebookApp] Serving notebooks from local directory: /home/user/qdb-howto/
    [I 17:16:02.496 NotebookApp] Jupyter Notebook 6.4.6 is running at:
    [I 17:16:02.496 NotebookApp] http://localhost:8888/?token=...
    [I 17:16:02.496 NotebookApp]  or http://127.0.0.1:8888/?token=...
    [I 17:16:02.496 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
    

    A new jupyter notebook should automatically open, otherwise please manually navigate and you can navigate your browser to http://localhost:8888/ with the environment prepared.

    You are now completed with the preparations.

    5.7.2. A stock’s price

    For this example, we’ll use Microsoft’s dividend of Feb 16, 2022 as a case study. For this, we have prepared an OHLC dataset of public MSFT stock data. Let’s grab this data:

    import pandas as pd
    import requests
    import io
    
    def grab_csv(fname):
       url = 'https://doc.quasar.ai/howto/corporate-actions/{}'.format(fname)
       data = requests.get(url, timeout=15).content
       df = pd.read_csv(io.StringIO(data.decode('utf-8')))
       df = df.rename(columns={'date': '$timestamp'})
       df = df.set_index('$timestamp')
       df = df.reindex()
       return df
    
    ohlc = grab_csv('msft.csv')
    ohlc
    
    open high low close volume
    $timestamp
    2022-02-07 306.170013 307.839996 299.899994 300.950012 28533300
    2022-02-08 301.250000 305.559998 299.950012 304.559998 32421200
    2022-02-09 309.869995 311.929993 307.390015 311.209991 31284700
    2022-02-10 304.040009 309.119995 300.700012 302.380005 45386200
    2022-02-11 303.190002 304.290009 294.220001 295.040009 39175600
    2022-02-14 293.769989 296.760010 291.350006 295.000000 36359500
    2022-02-15 300.010010 300.799988 297.019989 300.470001 27058300
    2022-02-16 298.369995 300.869995 293.679993 299.500000 29982100
    2022-02-17 296.359985 296.799988 290.000000 290.110011 32461600
    2022-02-18 293.049988 293.859985 286.309998 287.309993 34264000
    2022-02-22 285.000000 291.540009 284.500000 287.100001 41736100
    2022-02-23 290.179993 291.700012 280.100006 279.649989 37811200
    2022-02-24 272.510010 295.160004 271.519989 293.969996 56989700
    2022-02-25 295.140015 297.630005 291.649994 296.689998 32546700

    Even though we know that a $0.62 dividend payout occurred on Feb 16, you cannot see it in this data: that is because what you’re looking at are the prices adjusted for corporate actions.

    Often, you will want to go back in time and see the unadjusted price: the price without any adjustments for corporate actions. How can we achieve this?

    5.7.3. Storing price adjustments

    Let’s assume it’s early February, Microsoft has announced they will pay out $0.62 dividend per share after trading closes on Feb 16, and you are tasked with preparing accordingly. The prices you have in your database at that point are, of course, the unadjusted prices.

    There are a few approaches we can take here:

    1. After trading closes on Feb 16, we will update our entire history and adjust all prices/volumes.

    2. For every row, we store a “price” and an “adjusted_price”: we modify our ingestion engine to be aware of this stock split, and activate it after trading closes on Feb 16;

    3. We keep track of all corporate actions in a separate table, and if the user desires, calculate the adjusted price on the fly.

    Generally, we discourage the first option: not only is it fragile and time-sensitive, but you will not be able to easily recalculate the unadjusted, original price based on it. This is a problem for e.g. back-testing algorithms, and as any corporate action may have an influence on the price, you generally want to be aware if and when this happened.

    The second option could work, but increases complexity: instead of just recording that an adjustment happened at a certain date, it now becomes logic we need to maintain in our ingestion process. This increases the chance of bugs, and really makes things more complicated than they should be.

    As such, we recommend the third option: for each stock, we keep track of the prices in one table, and the adjustments in another table.

    The next section describes a data model for this approach.

    5.7.4. Types of corporate actions

    There are many types of corporate actions, but generally speaking, they can be classified as follows:

    • Is it a multiplicative corporate action (e.g. stock splits), or additive (e.g. dividend) ?

    • Does it apply only to price, or both price and volume?

    With these classifications in mind, we can model all corporate actions using the following table structure:

    CREATE TABLE msft_adjustments (
       price_add DOUBLE,
       price_mult DOUBLE,
       volume_add DOUBLE,
       volume_mult DOUBLE)
    

    For each corporate action, we have a single value on how it affects the price or volume.

    For example:

    • As a $0.62 dividend is additive and only affects price, we would only set price_add=0.62, and keep the rest of the values unset / null.

    • A 5:1 stock split is multiplicative, and affects both price and volume. As such, we would set price_mult=0.2 and volume_mult=5.

    5.7.5. Implementing the data model

    5.7.5.1. A single adjustment

    Let’s first represent our dividend as a table:

    import numpy as np
    
    adjustments = [{'$timestamp': np.datetime64('2022-02-16', 'ns'),
                    'price_add': 0.62}]
    adjustments = pd.DataFrame(adjustments).set_index('$timestamp').reindex()
    adjustments
    
    price_add
    $timestamp
    2022-02-16 0.62

    We now insert both the OHLC data as well as the adjustments as two separate tables: microsoft.ohlc and microsoft.adjustments. For this, we’ll establish a connection with QuasarDB, and use our Pandas integration to directly write dataframes into Quasar:

    import quasardb
    import quasardb.pandas as qdbpd
    
    conn = quasardb.Cluster('qdb://127.0.0.1:2836')
    try:
      conn.table('microsoft.ohlc').remove()
      conn.table('microsoft.adjustments').remove()
    except:
      pass
    
    # Write OHLC table to `microsoft.ohlc`, and automatically create the table
    qdbpd.write_dataframe(ohlc, conn, conn.table('microsoft.ohlc'), create=True)
    
    # Explicitly create the adjustments table to ensure we have the correct columns
    cols = [quasardb.ColumnInfo(quasardb.ColumnType.Double, "price_add"),
            quasardb.ColumnInfo(quasardb.ColumnType.Double, "price_mult"),
            quasardb.ColumnInfo(quasardb.ColumnType.Double, "volume_add"),
            quasardb.ColumnInfo(quasardb.ColumnType.Double, "volume_mult")]
    conn.table('microsoft.adjustments').create(cols)
    
    # Write corporate adjustments table to `microsoft.adjustments`, do not
    # automatically create the table
    qdbpd.write_dataframe(adjustments, conn, conn.table('microsoft.adjustments'), create=False)
    

    For good measure, let’s query both of these to verify the data is stored correctly. For this we will use the read_dataframe function, which reads an entire table as a dataframe:

    qdbpd.read_dataframe(conn.table('microsoft.ohlc'))
    
    open high low close volume
    2022-02-07 306.170013 307.839996 299.899994 300.950012 28533300
    2022-02-08 301.250000 305.559998 299.950012 304.559998 32421200
    2022-02-09 309.869995 311.929993 307.390015 311.209991 31284700
    2022-02-10 304.040009 309.119995 300.700012 302.380005 45386200
    2022-02-11 303.190002 304.290009 294.220001 295.040009 39175600
    2022-02-14 293.769989 296.760010 291.350006 295.000000 36359500
    2022-02-15 300.010010 300.799988 297.019989 300.470001 27058300
    2022-02-16 298.369995 300.869995 293.679993 299.500000 29982100
    2022-02-17 296.359985 296.799988 290.000000 290.110011 32461600
    2022-02-18 293.049988 293.859985 286.309998 287.309993 34264000
    2022-02-22 285.000000 291.540009 284.500000 287.100001 41736100
    2022-02-23 290.179993 291.700012 280.100006 279.649989 37811200
    2022-02-24 272.510010 295.160004 271.519989 293.969996 56989700
    2022-02-25 295.140015 297.630005 291.649994 296.689998 32546700

    Looks good, the corporate adjustments:

    qdbpd.read_dataframe(conn.table('microsoft.adjustments'))
    
    price_add price_mult volume_add volume_mult
    2022-02-16 0.62 NaN NaN NaN

    Great! Now on to the good parts, how can we combine these two?

    For this, we will use an ASOF JOIN. Effectively, what we will do:

    • For every row in the OHLC table, determine the most recent “adjustments” value;

    • Dynamically calculate the adjusted price based on the adjustments.

    Let’s first join the data together:

    q = """
    SELECT
       $timestamp,
       "microsoft.ohlc".close AS price,
       "microsoft.adjustments".price_add
    FROM "microsoft.ohlc"
    LEFT ASOF JOIN "microsoft.adjustments"
    ORDER BY $timestamp DESC
    """
    qdbpd.query(conn, q, index='$timestamp')
    
    price microsoft.adjustments.price_add
    2022-02-25 296.689998 0.62
    2022-02-24 293.969996 0.62
    2022-02-23 279.649989 0.62
    2022-02-22 287.100001 0.62
    2022-02-18 287.309993 0.62
    2022-02-17 290.110011 0.62
    2022-02-16 299.500000 0.62
    2022-02-15 300.470001 NaN
    2022-02-14 295.000000 NaN
    2022-02-11 295.040009 NaN
    2022-02-10 302.380005 NaN
    2022-02-09 311.209991 NaN
    2022-02-08 304.559998 NaN
    2022-02-07 300.950012 NaN

    Great, we have the price adjustment aligned next to the close price. Next, we’ll use some arithmetic to dynamically calculate the adjusted price:

    q = """
    SELECT
       $timestamp,
       "microsoft.ohlc".close AS price,
       ("microsoft.ohlc".close + "microsoft.adjustments".price_add) AS adjusted_price
    FROM "microsoft.ohlc"
    LEFT ASOF JOIN "microsoft.adjustments"
    ORDER BY $timestamp DESC
    """
    qdbpd.query(conn, q, index='$timestamp')
    
    price adjusted_price
    2022-02-25 296.689998 297.309998
    2022-02-24 293.969996 294.589996
    2022-02-23 279.649989 280.269989
    2022-02-22 287.100001 287.720001
    2022-02-18 287.309993 287.929993
    2022-02-17 290.110011 290.730011
    2022-02-16 299.500000 300.120000
    2022-02-15 300.470001 300.470001
    2022-02-14 295.000000 295.000000
    2022-02-11 295.040009 295.040009
    2022-02-10 302.380005 302.380005
    2022-02-09 311.209991 311.209991
    2022-02-08 304.559998 304.559998
    2022-02-07 300.950012 300.950012

    As you can see, ASOF JOIN is an elegant and effective tool to handle this.

    5.7.5.2. Multiple adjustments

    Does this work with multiple corporate adjustments? Let’s assume, hypothetically, what would happen if we had a second adjustment a few days later, what would the adjustment value be when doing an ASOF JOIN?

    # Insert additional adjustment of $0.15 at 2022-02-18
    q = """
    INSERT INTO \"microsoft.adjustments\" (
        $timestamp,
        price_add)
    VALUES (
        2022-02-18,
        0.15)
    """
    qdbpd.query(conn, q)
    
    q = """
    SELECT
       $timestamp,
       "microsoft.ohlc".close AS price,
       "microsoft.adjustments".price_add
    FROM "microsoft.ohlc"
    LEFT ASOF JOIN "microsoft.adjustments"
    ORDER BY $timestamp DESC
    """
    qdbpd.query(conn, q, index='$timestamp')
    
    price microsoft.adjustments.price_add
    2022-02-25 296.689998 0.15
    2022-02-24 293.969996 0.15
    2022-02-23 279.649989 0.15
    2022-02-22 287.100001 0.15
    2022-02-18 287.309993 0.15
    2022-02-17 290.110011 0.62
    2022-02-16 299.500000 0.62
    2022-02-15 300.470001 NaN
    2022-02-14 295.000000 NaN
    2022-02-11 295.040009 NaN
    2022-02-10 302.380005 NaN
    2022-02-09 311.209991 NaN
    2022-02-08 304.559998 NaN
    2022-02-07 300.950012 NaN

    That’s not exactly what we’re looking for: it appears as if the join just picks the last value at a certain point in time, but what we’re looking for are cumulative adjustments.

    At this point, we have two options:

    1. Instead of recording just the individual corporate actions, record the cumulative values instead: for example, rather than storing $0.15 in the example above, store $0.62 + $0.15 = $0.77.

    This will work, but it’s not as elegant and increases chance on error.

    1. Dynamically accumulate the sum of all price adjustments.

    As we would prefer the latter option, we’re in luck: QuasarDB has the ability to dynamically accumulate the sums using the CUMSUM function.

    Let’s see how it works by applying it on the adjustments table:

    q = """
    SELECT
       $timestamp,
       price_add,
       CUMSUM(price_add)
    FROM "microsoft.adjustments"
    """
    qdbpd.query(conn, q, index='$timestamp')
    
    price_add cumulative_sum(price_add)
    2022-02-16 0.62 0.62
    2022-02-18 0.15 0.77

    Exactly what we’re looking for! Now, let’s integrate this into our ASOF JOIN to get a fully dynamic acucmulation of all price adjustments.

    q = """
    SELECT
       $timestamp,
       "microsoft.ohlc".close AS price,
       "microsoft.ohlc".close + CUMSUM("microsoft.adjustments".price_add) AS adusted_price
    FROM "microsoft.ohlc"
    LEFT ASOF JOIN "microsoft.adjustments"
    ORDER BY $timestamp DESC
    """
    qdbpd.query(conn, q, index='$timestamp')
    
    price adusted_price
    2022-02-25 296.689998 297.459998
    2022-02-24 293.969996 294.739996
    2022-02-23 279.649989 280.419989
    2022-02-22 287.100001 287.870001
    2022-02-18 287.309993 288.079993
    2022-02-17 290.110011 290.730011
    2022-02-16 299.500000 300.120000
    2022-02-15 300.470001 300.470001
    2022-02-14 295.000000 295.000000
    2022-02-11 295.040009 295.040009
    2022-02-10 302.380005 302.380005
    2022-02-09 311.209991 311.209991
    2022-02-08 304.559998 304.559998
    2022-02-07 300.950012 300.950012

    5.7.5.3. Multiplications

    In addition to simple price additions, we also have price multiplications. Now that we know how to do it, integrating these is straightforward by using the built-in CUMPROD function.

    The example below simulates two stock splits, where the price of each share drops, and the volume increases.

    # First "seed" the multiplication columns, so that we know for sure they will start
    # with 1.00 (instead of 0.00)
    q = """
    INSERT INTO \"microsoft.adjustments\" (
        $timestamp,
        price_mult,
        volume_mult)
    VALUES (
        2022-02-01,
        1.0,
        1.0)
    """
    qdbpd.query(conn, q)
    
    # 2x stock split at 2022-02-21
    q = """
    INSERT INTO \"microsoft.adjustments\" (
        $timestamp,
        price_mult,
        volume_mult)
    VALUES (
        2022-02-21,
        0.5,
        2.0)
    """
    qdbpd.query(conn, q)
    
    # 4x stock split at 2022-02-23
    q = """
    INSERT INTO \"microsoft.adjustments\" (
        $timestamp,
        price_mult,
        volume_mult)
    VALUES (
        2022-02-23,
        0.25,
        4.0)
    """
    qdbpd.query(conn, q)
    
    # For completeness, let's inspect all our adjustments
    qdbpd.query(conn, "SELECT * FROM \"microsoft.adjustments\"")
    
    $timestamp $table price_add price_mult volume_add volume_mult
    0 2022-02-01 microsoft.adjustments NaN 1.00 NaN 1.0
    1 2022-02-16 microsoft.adjustments 0.62 NaN NaN NaN
    2 2022-02-18 microsoft.adjustments 0.15 NaN NaN NaN
    3 2022-02-21 microsoft.adjustments NaN 0.50 NaN 2.0
    4 2022-02-23 microsoft.adjustments NaN 0.25 NaN 4.0

    We can see the CUMPROD function at work by applying it over the price_mult column:

    q = """
    SELECT
       $timestamp,
       price_mult,
       CUMPROD(price_mult)
    FROM "microsoft.adjustments"
    """
    qdbpd.query(conn, q, index='$timestamp')
    
    price_mult cumulative_product(price_mult)
    2022-02-01 1.00 1.000
    2022-02-16 NaN 1.000
    2022-02-18 NaN 1.000
    2022-02-21 0.50 0.500
    2022-02-23 0.25 0.125

    Now, let’s integrate this into our ASOF JOIN:

    q = """
    SELECT
       $timestamp,
       "microsoft.ohlc".close AS price,
       ("microsoft.ohlc".close
         * CUMPROD("microsoft.adjustments".price_mult))
        + CUMSUM("microsoft.adjustments".price_add) AS adusted_price
    FROM "microsoft.ohlc"
    LEFT ASOF JOIN "microsoft.adjustments"
    ORDER BY $timestamp DESC
    """
    qdbpd.query(conn, q, index='$timestamp')
    
    price adusted_price
    2022-02-25 296.689998 37.856250
    2022-02-24 293.969996 37.516250
    2022-02-23 279.649989 35.726249
    2022-02-22 287.100001 144.320001
    2022-02-18 287.309993 288.079993
    2022-02-17 290.110011 290.730011
    2022-02-16 299.500000 300.120000
    2022-02-15 300.470001 300.470001
    2022-02-14 295.000000 295.000000
    2022-02-11 295.040009 295.040009
    2022-02-10 302.380005 302.380005
    2022-02-09 311.209991 311.209991
    2022-02-08 304.559998 304.559998
    2022-02-07 300.950012 300.950012

    As you can see, the adjustment is taken into account properly, and our price now reflects the stock splits.

    5.7.5.4. Putting it all together

    To put it together, let’s apply the same logic to the volume column to calculate the adjusted volume.

    q = """
    SELECT
       $timestamp,
    
       "microsoft.ohlc".close AS price,
       ("microsoft.ohlc".close
         * CUMPROD("microsoft.adjustments".price_mult))
        + CUMSUM("microsoft.adjustments".price_add) AS adusted_price,
    
       "microsoft.ohlc".volume AS volume,
       ("microsoft.ohlc".volume
         * CUMPROD("microsoft.adjustments".volume_mult))
        + CUMSUM("microsoft.adjustments".volume_add) AS adusted_volume
    
    FROM "microsoft.ohlc"
    LEFT ASOF JOIN "microsoft.adjustments"
    ORDER BY $timestamp DESC
    """
    qdbpd.query(conn, q, index='$timestamp')
    
    price adusted_price volume adusted_volume
    2022-02-25 296.689998 37.856250 32546700 260373600.0
    2022-02-24 293.969996 37.516250 56989700 455917600.0
    2022-02-23 279.649989 35.726249 37811200 302489600.0
    2022-02-22 287.100001 144.320001 41736100 83472200.0
    2022-02-18 287.309993 288.079993 34264000 34264000.0
    2022-02-17 290.110011 290.730011 32461600 32461600.0
    2022-02-16 299.500000 300.120000 29982100 29982100.0
    2022-02-15 300.470001 300.470001 27058300 27058300.0
    2022-02-14 295.000000 295.000000 36359500 36359500.0
    2022-02-11 295.040009 295.040009 39175600 39175600.0
    2022-02-10 302.380005 302.380005 45386200 45386200.0
    2022-02-09 311.209991 311.209991 31284700 31284700.0
    2022-02-08 304.559998 304.559998 32421200 32421200.0
    2022-02-07 300.950012 300.950012 28533300 28533300.0

    5.7.6. Conclusion

    In this tutorial you learned:

    • how to integrate corporate actions into your data model as a separate “adjustments” table;

    • use ASOF JOIN to join both the adjustments and OHLC tables together;

    • leverage the CUMSUM and CUMPROD functions to aggregate all adjustments over time.