Like what you see? Follow Adrian on Twitter to be notified of new content.

Code changes. Data changes. Outputs change. Somewhere between the first analysis and an odd position in production, little mismatches creep in: a misstated value, off-by-one date ranges, rounding shifts, subtle drift in calculations, missing IDs. The most reliable way to catch them is to compare a new DataFrame to a previously validated one–a reconciliation, or rec, test.

recx is a lightweight library that makes these recs declarative, repeatable, and pleasant to read. It’s early and experimental, but I’ve been using it in my production trading pipeline for a long while. I’m open-sourcing it so others can use it and give feedback.

What is a rec test?

A rec test compares a baseline (known-good) DataFrame to a candidate (new run) and checks that they align on keys (same dates or IDs) and agree on values (within tolerances you care about).

For example, you fetch daily prices from a broker and build features for trading. On day 1 you get:

date price
2024-01-01 100.0
2024-01-02 101.0
2024-01-03 103.0
2024-01-04 102.0

On day 2 you get:

date price
2024-01-01 100.0
2024-01-02 101.0
2024-01-03 103.0
2024-01-04 101.5
2024-01-05 100.0

The price on 2024-01-04 has changed slightly. A rec test catches it.

Rec failures don’t just happen in vendor feeds. Most mismatches show up in your own outputs: a refactor, a feature flag, a changed default. Things end up “almost” equal. A rec test defines what “almost” means for your project and highlights exactly where the baseline and candidate diverge.

Why do you need rec tests?

That tiny 0.5 drift on 2024-01-04 doesn’t stay tiny: it rolls into features, changes model inputs, nudges decisions, and leaks into P&L. Without a rec test, you can’t be sure live results follow the same process as your backtests.

Where the drift comes from:

The idea is that every time you run your pipeline, you do a rec test between the new run (candidate) and the previous run (baseline).

Why rec tests help:

Meet recx

recx focuses on Pandas DataFrames and keeps the API small:

A simple example:

import pandas as pd
from recx import Rec, EqualCheck, AbsTolCheck

# Create a baseline DataFrame
baseline = pd.DataFrame({
    "price": [100.00, 200.00, 300.00],
    "status": ["active", "inactive", "active"]
})

# Candidate has a small price change in the
# last record. Statuses match.
candidate = pd.DataFrame({
    "price": [100.00, 200.00, 301.00],
    "status": ["active", "inactive", "active"]
})

# Declare the rec
rec = Rec({
    "price": AbsTolCheck(tol=0.01),
    "status": EqualCheck(),
})

result = rec.run(baseline, candidate)

# Prints a concise pass/fail report
result.summary()

The summary output:

───────────────────────────────────────────────────────────────────────
                    DataFrame Reconciliation Summary                   
───────────────────────────────────────────────────────────────────────
Baseline: rows=3 cols=2
Candidate: rows=3 cols=2

1 check(s) FAILED ❌
missing_indices_check ...................................... PASSED ꪜ
extra_indices_check ........................................ PASSED ꪜ
Column 'price' with AbsTolCheck(tol=0.01) ... [1/3 (33.33%)] FAILED ❌
Column 'status' with EqualCheck ............................ PASSED ꪜ

Failing rows:

Column 'price':
 │   Showing up to 10 rows
 │      baseline  candidate  abs_error
 │   2     300.0      301.0        1.0

The Rec object maps columns to checks, runs them, and returns a RecResult that gives you a readable summary plus programmatic results:

Use recx when you are rebuilding datasets in your pipeline and want to ensure they stay consistent over time.

Getting started

Install with:

pip install recx

Then skim the “Getting Started” and “Usage” docs for patterns like regex selection, skipping columns, and writing custom checks. The repo README notes the project is early/experimental, so expect API polish over time.

Contributions and feedback are welcome!

Like what you see? Follow Adrian on Twitter to be notified of new content.

Footnotes

Corrections

If you see any mistakes or room for improvement, please reach out to me on Twitter @DrAdrian.

Citation

Please cite this work as:

Letchford (2025), "DataFrame Rec Tests with Recx", OS Quant.

In BibTeX please use:

@article{Letchford2025,
    author = {Letchford, Adrian},
    title = {DataFrame Rec Tests with Recx},
    journal = {OS Quant},
    year = {2025},
    note = {https://osquant.com/papers/dataframe-rec-tests-with-recx/},
}