Cross-version sweeps¶
To benchmark across installed versions of a package — something pytest-benchmark
has no answer for — pytest_benchmem.sweep provisions one fresh uv venv per
version (with import isolation) and calls your run callback in each. The callback
just runs your pytest suite in that venv, writing a per-version JSON. Everything
package-specific is injected, so it isn't tied to any one library.
import subprocess
from pytest_benchmem.sweep import sweep
failed = sweep(
["1.2.0", "1.3.0", "git+https://github.com/me/pkg@main"],
run=lambda venv: subprocess.run(
[str(venv.python), "-m", "pytest", "benchmarks/",
"--benchmark-only", f"--benchmark-json={venv.version}.json"],
cwd=venv.cwd, env=venv.env,
),
install_spec=lambda v: f"mypkg=={v}",
copy_dir="benchmarks", # copied in so it imports from here, pkg from the venv
import_check="mypkg", # preflight: assert pkg resolved to the venv
)
sweep returns the list of version labels that failed to provision (empty on full
success).
The run callback and the Venv it receives¶
Your run(venv) callback is invoked once per version. The Venv it gets exposes
everything needed to launch the suite in isolation:
| Attribute | Type | What |
|---|---|---|
venv.version |
str |
the version label (use it to name the output JSON) |
venv.python |
Path \| None |
the venv's Python executable |
venv.cwd |
Path \| None |
the isolation dir — holds the copied copy_dir |
venv.env |
dict \| None |
environment with PYTHONPATH cleared |
venv.failed_at |
str \| None |
"venv", "install", "isolation", or None |
sweep / provision parameters¶
sweep(versions, run, **provision_kwargs) forwards its keyword arguments to
provision, which yields the venvs:
| Parameter | Default | What |
|---|---|---|
versions |
— | version labels to sweep (pip specs, tags, or git+… refs) |
install_spec |
lambda v: v |
maps a label to the pip spec for the package under test, e.g. f"mypkg=={v}" |
pins |
() |
extra specs installed alongside (uv pip install) |
copy_dir |
None |
a directory copied into each venv.cwd, so the suite imports from here and the package from the venv |
import_check |
None |
a module name asserted to resolve to the venv before running — a preflight against import leakage |
as_of |
None |
a YYYY-MM-DD date passed to uv's --exclude-newer, for reproducible resolution |
tmp_prefix |
"pytest-benchmem-" |
prefix for the temp venv dirs |
Viewing a sweep¶
For a table, hand the per-version JSONs straight to benchmem compare — it takes
two or more runs (oldest → newest) and renders one row per benchmark, one column
per version, with the lightest value tinted green and the heaviest red:
For a picture, point benchmem plot at the same files — with 3+ it defaults to the
sweep view, a log₂ fold-change heatmap vs the baseline:
Either way, pick --metric time or any memory metric. See Compare &
plot for the other views and how to render them inline.