Code changes. Data changes. Outputs change. Somewhere between the first analysis and an odd position in production, little mismatches creep in: a misstated value, off-by-one date ranges, rounding shifts, subtle drift in calculations, missing IDs. The most reliable way to catch them is to compare a new DataFrame to a previously validated one–a reconciliation, or rec, test.

recx is a lightweight library that makes these recs declarative, repeatable, and pleasant to read. It’s early and experimental, but I’ve been using it in my production trading pipeline for a long while. I’m open-sourcing it so others can use it and give feedback.

What is a rec test?

A rec test compares a baseline (known-good) DataFrame to a candidate (new run) and checks that they align on keys (same dates or IDs) and agree on values (within tolerances you care about).

For example, you fetch daily prices from a broker and build features for trading. On day 1 you get:

date price
2024-01-01 100.0
2024-01-02 101.0
2024-01-03 103.0
2024-01-04 102.0

On day 2 you get:

date price
2024-01-01 100.0
2024-01-02 101.0
2024-01-03 103.0
2024-01-04 101.5
2024-01-05 100.0

The price on 2024-01-04 has changed slightly. A rec test catches it.

Rec failures don’t just happen in vendor feeds. Most mismatches show up in your own outputs: a refactor, a feature flag, a changed default. Things end up “almost” equal. A rec test defines what “almost” means for your project and highlights exactly where the baseline and candidate diverge.

Why do you need rec tests?

That tiny 0.5 drift on 2024-01-04 doesn’t stay tiny: it rolls into features, changes model inputs, nudges decisions, and leaks into P&L. Without a rec test, you can’t be sure live results follow the same process as your backtests.

Where the drift comes from:

The idea is that every time you run your pipeline, you do a rec test between the new run (candidate) and the previous run (baseline).

Why rec tests help:

Meet recx

recx focuses on Pandas DataFrames and keeps the API small:

A simple example:

import pandas as pd
from recx import Rec, EqualCheck, AbsTolCheck

# Create a baseline DataFrame
baseline = pd.DataFrame({
    "price": [100.00, 200.00, 300.00],
    "status": ["active", "inactive", "active"]
})

# Candidate has a small price change in the
# last record. Statuses match.
candidate = pd.DataFrame({
    "price": [100.00, 200.00, 301.00],
    "status": ["active", "inactive", "active"]
})

# Declare the rec
rec = Rec({
    "price": AbsTolCheck(tol=0.01),
    "status": EqualCheck(),
})

result = rec.run(baseline, candidate)

# Prints a concise pass/fail report
result.summary()

The summary output:

───────────────────────────────────────────────────────────────────────
                    DataFrame Reconciliation Summary                   
───────────────────────────────────────────────────────────────────────
Baseline: rows=3 cols=2
Candidate: rows=3 cols=2

1 check(s) FAILED ❌
missing_indices_check ...................................... PASSED ꪜ
extra_indices_check ........................................ PASSED ꪜ
Column 'price' with AbsTolCheck(tol=0.01) ... [1/3 (33.33%)] FAILED ❌
Column 'status' with EqualCheck ............................ PASSED ꪜ

Failing rows:

Column 'price':
 │   Showing up to 10 rows
 │      baseline  candidate  abs_error
 │   2     300.0      301.0        1.0

The Rec object maps columns to checks, runs them, and returns a RecResult that gives you a readable summary plus programmatic results:

Use recx when you are rebuilding datasets in your pipeline and want to ensure they stay consistent over time.

Getting started

Install with:

pip install recx

Then skim the “Getting Started” and “Usage” docs for patterns like regex selection, skipping columns, and writing custom checks. The repo README notes the project is early/experimental, so expect API polish over time.

Contributions and feedback are welcome!

Footnotes

Corrections

If you see any mistakes or room for improvement, please reach out to me on Twitter @DrAdrian.

Citation

Please cite this work as:

Letchford (2025), "DataFrame Rec Tests with Recx", OS Quant.

In BibTeX please use:

@article{Letchford2025,
    author = {Letchford, Adrian},
    title = {DataFrame Rec Tests with Recx},
    journal = {OS Quant},
    year = {2025},
    note = {https://osquant.com/papers/dataframe-rec-tests-with-recx/},
}