Polars Order Book

October 5, 2024 · View on GitHub

Polars Order Book provides plugins for the Polars library that efficiently calculate summary information (price and quantity) for the top N levels of an order book.

Features

  • Top N Levels: Compute the price and quantity for the top N price levels of both bid and ask sides of the order book.
  • High Performance: Designed with performance in mind.
  • Multiple Input Formats: Supports various types of order book updates:
    • Price level updates: (side, price, new_quantity)
    • Order mutations: (side, price, quantity_change)
    • Order mutations with modifications: (side, price, quantity, prev_price, prev_quantity)

Usage

Here are examples of how to use the plugin:

Example 1: Price Level Updates

import polars as pl
from polars_order_book import top_n_levels_from_price_updates

df = pl.DataFrame(
    {
        "is_bid": [True, True, False, False, True, True],
        "price": [1, 2, 4, 5, 2, 2],
        "qty": [100, 200, 400, 500, 250, 0],
    }
)
expr = top_n_levels_from_price_updates(
    price=df["price"], qty=df["qty"], is_bid=df["is_bid"], n=2
)
result = df.with_columns(expr.alias("top_levels")).unnest("top_levels")
print(result)

# Output
shape: (6, 11)
┌────────┬───────┬─────┬─────────────┬─────────────┬───────────┬───────────┬─────────────┬─────────────┬───────────┬───────────┐
│ is_bid ┆ price ┆ qty ┆ bid_price_1 ┆ bid_price_2 ┆ bid_qty_1 ┆ bid_qty_2 ┆ ask_price_1 ┆ ask_price_2 ┆ ask_qty_1 ┆ ask_qty_2 │
---------------------------------
bool   ┆ i64   ┆ i64 ┆ i64         ┆ i64         ┆ i64       ┆ i64       ┆ i64         ┆ i64         ┆ i64       ┆ i64       │
╞════════╪═══════╪═════╪═════════════╪═════════════╪═══════════╪═══════════╪═════════════╪═════════════╪═══════════╪═══════════╡
│ true   ┆ 11001           ┆ null        ┆ 100       ┆ null      ┆ null        ┆ null        ┆ null      ┆ null      │
│ true   ┆ 220021200100       ┆ null        ┆ null        ┆ null      ┆ null      │
│ false  ┆ 4400212001004           ┆ null        ┆ 400       ┆ null      │
│ false  ┆ 55002120010045400500
│ true   ┆ 22502125010045400500
│ true   ┆ 201           ┆ null        ┆ 100       ┆ null      ┆ 45400500
└────────┴───────┴─────┴─────────────┴─────────────┴───────────┴───────────┴─────────────┴─────────────┴───────────┴───────────┘

Example 2: Order Mutations

import polars as pl
from polars_order_book import top_n_levels_from_price_mutations

df = pl.DataFrame(
    {
        "is_bid": [True, True, False, False, True, True],
        "price": [1, 2, 4, 5, 2, 2],
        "qty": [100, 200, 400, 500, 50, -250],
    }
)
expr = top_n_levels_from_price_mutations(price="price", qty="qty", is_bid="is_bid", n=2)
result = df.with_columns(expr.alias("top_levels")).unnest("top_levels")
print(result)

# Output
shape: (6, 11)
┌────────┬───────┬──────┬─────────────┬─────────────┬───────────┬───────────┬─────────────┬─────────────┬───────────┬───────────┐
│ is_bid ┆ price ┆ qty  ┆ bid_price_1 ┆ bid_price_2 ┆ bid_qty_1 ┆ bid_qty_2 ┆ ask_price_1 ┆ ask_price_2 ┆ ask_qty_1 ┆ ask_qty_2 │
---------------------------------
bool   ┆ i64   ┆ i64  ┆ i64         ┆ i64         ┆ i64       ┆ i64       ┆ i64         ┆ i64         ┆ i64       ┆ i64       │
╞════════╪═══════╪══════╪═════════════╪═════════════╪═══════════╪═══════════╪═════════════╪═════════════╪═══════════╪═══════════╡
│ true   ┆ 11001           ┆ null        ┆ 100       ┆ null      ┆ null        ┆ null        ┆ null      ┆ null      │
│ true   ┆ 220021200100       ┆ null        ┆ null        ┆ null      ┆ null      │
│ false  ┆ 4400212001004           ┆ null        ┆ 400       ┆ null      │
│ false  ┆ 55002120010045400500
│ true   ┆ 2502125010045400500
│ true   ┆ 2-2501           ┆ null        ┆ 100       ┆ null      ┆ 45400500
└────────┴───────┴──────┴─────────────┴─────────────┴───────────┴───────────┴─────────────┴─────────────┴───────────┴───────────┘

Example 3: Order Mutations with Modifications

import polars as pl
from polars_order_book import top_n_levels_from_price_mutations_with_modify

df = pl.DataFrame(
    {
        "is_bid": [True, False, True, False, True, False],
        "price": [1, 6, 2, 5, 3, 4],
        "qty": [100, 600, 200, 500, 300, 400],
        "prev_price": [None, None, 1, 6, 2, 5],
        "prev_qty": [None, None, 100, 600, 200, 500],
    }
)
expr = top_n_levels_from_price_mutations_with_modify(
    "price", "qty", "is_bid", "prev_price", "prev_qty", n=2
)
result = df.with_columns(expr.alias("top_levels")).unnest("top_levels")
print(result)

# Output
shape: (6, 13)
┌────────┬───────┬─────┬────────────┬──────────┬─────────────┬─────────────┬───────────┬───────────┬─────────────┬─────────────┬───────────┬───────────┐
│ is_bid ┆ price ┆ qty ┆ prev_price ┆ prev_qty ┆ bid_price_1 ┆ bid_price_2 ┆ bid_qty_1 ┆ bid_qty_2 ┆ ask_price_1 ┆ ask_price_2 ┆ ask_qty_1 ┆ ask_qty_2 │
---------------------------------------
bool   ┆ i64   ┆ i64 ┆ i64        ┆ i64      ┆ i64         ┆ i64         ┆ i64       ┆ i64       ┆ i64         ┆ i64         ┆ i64       ┆ i64       │
╞════════╪═══════╪═════╪════════════╪══════════╪═════════════╪═════════════╪═══════════╪═══════════╪═════════════╪═════════════╪═══════════╪═══════════╡
│ true   ┆ 1100 ┆ null       ┆ null     ┆ 1           ┆ null        ┆ 100       ┆ null      ┆ null        ┆ null        ┆ null      ┆ null      │
│ false  ┆ 6600 ┆ null       ┆ null     ┆ 1           ┆ null        ┆ 100       ┆ null      ┆ 6           ┆ null        ┆ 600       ┆ null      │
│ true   ┆ 220011002           ┆ null        ┆ 200       ┆ null      ┆ 6           ┆ null        ┆ 600       ┆ null      │
│ false  ┆ 550066002           ┆ null        ┆ 200       ┆ null      ┆ 5           ┆ null        ┆ 500       ┆ null      │
│ true   ┆ 330022003           ┆ null        ┆ 300       ┆ null      ┆ 5           ┆ null        ┆ 500       ┆ null      │
│ false  ┆ 440055003           ┆ null        ┆ 300       ┆ null      ┆ 4           ┆ null        ┆ 400       ┆ null      │
└────────┴───────┴─────┴────────────┴──────────┴─────────────┴─────────────┴───────────┴───────────┴─────────────┴─────────────┴───────────┴───────────┘

Practical Considerations and Tips

Converting Exchange Messages to Mutations

In practice, you may need to modify the order book data you have to get it into one of the supported input formats. The following example demonstrates several common modifications:

  • Convert side column to an is_bid boolean
  • Convert float price column to integers for internal processing, and convert output prices back to float
  • Represent deletes and trades as negative quantity mutations
import polars as pl
from polars_order_book import top_n_levels_from_price_mutations

messages = pl.DataFrame(
    {
        "message_type": ["add", "add", "add", "add", "trade", "delete"],
        "side": ["bid", "bid", "ask", "ask", "bid", "bid"],
        "price": [0.01, 0.02, 0.04, 0.05, 0.02, 0.02],
        "qty": [100, 200, 400, 500, 50, 150],
    }
)
PRICE_FACTOR = 100
mutations = messages.lazy().select(
    is_bid=pl.col("side") == "bid",
    price=(pl.col("price") * PRICE_FACTOR).round().cast(pl.Int64),
    qty=pl.when(pl.col("message_type").is_in(["delete", "trade"]))
    .then(-pl.col("qty"))
    .otherwise(pl.col("qty")),
)
expr = top_n_levels_from_price_mutations(price="price", qty="qty", is_bid="is_bid", n=2)
top_levels = (
    mutations.with_columns(top_levels=expr)
    .select("top_levels")
    .unnest("top_levels")
    .with_columns(pl.selectors.matches(r"^(bid|ask)_price_\d+$") / PRICE_FACTOR)  # Cast prices back to floats
    .collect()
)
result = pl.concat([messages, top_levels], how="horizontal")
print(result)

# Output
shape: (6, 12)
┌──────────────┬──────┬───────┬─────┬─────────────┬─────────────┬───────────┬───────────┬─────────────┬─────────────┬───────────┬───────────┐
│ message_type ┆ side ┆ price ┆ qty ┆ bid_price_1 ┆ bid_price_2 ┆ bid_qty_1 ┆ bid_qty_2 ┆ ask_price_1 ┆ ask_price_2 ┆ ask_qty_1 ┆ ask_qty_2 │
------------------------------------
strstr  ┆ f64   ┆ i64 ┆ f64         ┆ f64         ┆ i64       ┆ i64       ┆ f64         ┆ f64         ┆ i64       ┆ i64       │
╞══════════════╪══════╪═══════╪═════╪═════════════╪═════════════╪═══════════╪═══════════╪═════════════╪═════════════╪═══════════╪═══════════╡
│ add          ┆ bid  ┆ 0.011000.01        ┆ null        ┆ 100       ┆ null      ┆ null        ┆ null        ┆ null      ┆ null      │
│ add          ┆ bid  ┆ 0.022000.020.01200100       ┆ null        ┆ null        ┆ null      ┆ null      │
│ add          ┆ ask  ┆ 0.044000.020.012001000.04        ┆ null        ┆ 400       ┆ null      │
│ add          ┆ ask  ┆ 0.055000.020.012001000.040.05400500
│ trade        ┆ bid  ┆ 0.02500.020.011501000.040.05400500
│ delete       ┆ bid  ┆ 0.021500.01        ┆ null        ┆ 100       ┆ null      ┆ 0.040.05400500
└──────────────┴──────┴───────┴─────┴─────────────┴─────────────┴───────────┴───────────┴─────────────┴─────────────┴───────────┴───────────┘

Potential Pitfalls

  1. Unsorted Data: Messages must be processed in the correct order. Always sort your data by timestamp or sequence number before applying order book calculations.

  2. Multiple Products: For datasets containing multiple products, apply the top_n_level_* expression in a group-by context. For example:

    ...
    result = (
        mutations.group_by("product_id")
        .agg(
            top_levels=top_n_levels_from_price_mutations(
                price="price", qty="qty", is_bid="is_bid", n=2
            )
        )
        .unnest("top_levels")
    )
    
  3. Repeated Information: Ensure each mutation is applied only once. For instance, if you have both trade (one per passive order executed) and trade_summary (one per aggressive order) messages, discard the trade_summary to avoid double-counting.

  4. Order Book Resets: Your dataset may includes periods where the order book is cleared without explicit delete messages for all orders. To handle this:

    • Add a reset_count column to your data, incrementing it each time the book is reset.
    • Apply the top_n_level_* expression with a group-by on this column:
    ...
    result = (
        mutations.group_by(["product_id", "reset_count"])
        .agg(
            top_levels=top_n_levels_from_price_mutations(
                price="price", qty="qty", is_bid="is_bid", n=2
            )
        )
        .unnest("top_levels")
    )
    

Polar Order Bear