Reference¶
Every flag, marker, fixture, CLI command, and public function. The benchmem CLI
options are rendered live from --help below (typer is the source of truth, so
they can't drift); everything --help can't express — the pytest flags, the marker,
the fixture, the blob schema, the Python API — is curated here. For the narrative
versions see Getting started, Metrics,
Dims, and Compare & plot.
import os
import sys
from pathlib import Path
os.environ["FORCE_COLOR"] = "1"
os.environ["PATH"] = f"{Path(sys.executable).parent}{os.pathsep}{os.environ['PATH']}"
pytest command-line flags¶
The plugin adds these to any pytest run (alongside pytest-benchmark's own flags):
| Flag | Default | What |
|---|---|---|
--benchmark-memory |
off | record peak memory for every benchmark() call, no test changes. (The benchmark_memory fixture is always measured, with or without this flag.) |
--benchmark-memory-compare[=REF] |
off | compare this run's peak memory against a prior saved run (latest, or a pytest-benchmark storage ref like 0001); folds base + Δ peak columns into the combined table. |
--benchmark-memory-compare-fail=FIELD:THRESHOLD |
— | fail the session on a memory regression (repeatable). Implies --benchmark-memory-compare. Fields: peak, allocated, allocations. |
Memory rides --benchmark-only runs the same as timing. Timing regressions still use
pytest-benchmark's own --benchmark-compare / --benchmark-compare-fail; the
--benchmark-memory-compare* flags are the memory mirror.
The baseline the inline flags compare against comes from pytest-benchmark's
storage (under .benchmarks/) — save one first with --benchmark-save=NAME or
--benchmark-autosave, or the gate finds nothing and passes. See
Gate CI on regressions for the full flow.
The benchmem marker¶
@pytest.mark.benchmem(repeats=3)
def test_build(benchmark_memory):
...
| Kwarg | Default | What |
|---|---|---|
repeats |
1 |
run N memray passes and keep the min — peak memory is noisy (GC timing, lazy imports, page cache), so min-of-N is the cleanest floor. |
The benchmark_memory fixture¶
Depends on pytest-benchmark's benchmark fixture; times via pytest-benchmark, then
measures peak in a separate untimed pass.
Call form — times then measures function(*args, **kwargs):
benchmark_memory(sorted, data)
Pedantic form — explicit control, like pytest-benchmark's pedantic plus a
memory pass:
benchmark_memory.pedantic(target, args=(), kwargs=None, setup=None,
rounds=1, warmup_rounds=0, iterations=1)
setup— a callable run untracked before each measured call; if it returns(args, kwargs), those supply the call's arguments. Use it to rebuild fresh state each round, essential for side-effectful workloads.rounds,warmup_rounds,iterations— as in pytest-benchmark.
Attributes (available after a call):
| Attribute | What |
|---|---|
extra_info |
pytest-benchmark's per-benchmark dict. Set scalars here to attach analysis dims; the memory blob lands here under the benchmem key. |
peak_bytes |
peak memory (bytes) from the last call, or None before any call. |
result |
the full MemoryResult from the last call, or None. |
The extra_info.benchmem blob¶
Each measured benchmark stores this dict under extra_info["benchmem"] — three flat
per-repeat series, one entry per memray pass. Everything else (the headline peak =
min, the worst peak, the representative churn, any --stat) derives from these on read:
| Key | What |
|---|---|
peak_bytes |
per-repeat high-water of live bytes — the peak metric (headline = min) |
allocations |
per-repeat allocation count — the allocations metric |
total_bytes |
per-repeat total bytes allocated — the allocated metric (churn peak hides) |
{"peak_bytes": [800000, 805000], "allocations": [12, 12], "total_bytes": [800000, 805000]}
See Metrics for when to reach for each, and --stat for distributions.
CLI — benchmem¶
Installed with pytest-benchmem[plot]. The two subcommands and their options,
straight from --help:
!benchmem --help
Usage: benchmem [OPTIONS] COMMAND [ARGS]... pytest-benchmem — plot and compare benchmark runs. ╭─ Options ────────────────────────────────────────────────────────────────────╮ │ --install-completion Install completion for the current shell. │ │ --show-completion Show completion for the current shell, to copy │ │ it or customize the installation. │ │ --help Show this message and exit. │ ╰──────────────────────────────────────────────────────────────────────────────╯ ╭─ Commands ───────────────────────────────────────────────────────────────────╮ │ plot Render an interactive plotly view from one or more pytest-benchmark │ │ runs. │ │ compare Print a per-id comparison table across two or more runs (and │ │ optionally gate CI). │ ╰──────────────────────────────────────────────────────────────────────────────╯
benchmem compare¶
A per-id delta table (b − a) with percent change; ids in only one run show —.
!benchmem compare --help
Usage: benchmem compare [OPTIONS] RUNS... Print a per-id comparison table across two or more runs (and optionally gate CI). ╭─ Arguments ──────────────────────────────────────────────────────────────────╮ │ * runs RUNS... Two or more pytest-benchmark runs, oldest → newest │ │ (a sweep is N). │ │ [required] │ ╰──────────────────────────────────────────────────────────────────────────────╯
╭─ Options ────────────────────────────────────────────────────────────────────╮ │ --metric [time|peak|allocated|allocat Metric: time | peak | │ │ ions|memory] allocated | allocations | │ │ memory (memory is an alias of │ │ peak; pair with --stat for a │ │ distribution). │ │ [default: time] │ │ --stat TEXT Distribution stat over each │ │ benchmark's per-repeat series │ │ (min | max | mean | median | │ │ stddev) for │ │ peak/allocated/allocations. │ │ Default: the headline value. │ │ --sort TEXT Row order: name (id) | value │ │ (largest in the last run) | │ │ change. │ │ [default: name] │ │ --csv PATH Also write the raw (unscaled) │ │ comparison to this CSV file. │ │ --fail-on TEXT Exit non-zero on a regression │ │ of the first run vs the last. │ │ FIELD:THRESHOLD, repeatable — │ │ e.g. --fail-on peak:10% │ │ --fail-on peak:5MiB. │ │ --help Show this message and exit. │ ╰──────────────────────────────────────────────────────────────────────────────╯
--metric is one of time, peak, allocated, allocations, or memory (an
alias for peak); pair it with --stat (min/max/mean/median/stddev) for a
distribution over the per-repeat series. --fail-on FIELD:THRESHOLD (repeatable) exits
non-zero past a threshold; FIELD is peak, allocated, allocations, or time,
and THRESHOLD is either a percent (peak:10%) or an absolute:
- bytes fields (
peak,allocated):5MiB(unitsB/KiB/MiB/GiB) allocations: a bare count,5time:1ms(unitss/ms/us/µs/ns)
benchmem plot¶
Writes an interactive plotly view to standalone HTML. The view auto-selects by run
count (1 → scaling, 2 → scatter, 3+ → sweep); override with --view.
!benchmem plot --help
Usage: benchmem plot [OPTIONS] RUNS... Render an interactive plotly view from one or more pytest-benchmark runs. ╭─ Arguments ──────────────────────────────────────────────────────────────────╮ │ * runs RUNS... pytest-benchmark JSON file(s). [required] │ ╰──────────────────────────────────────────────────────────────────────────────╯
╭─ Options ────────────────────────────────────────────────────────────────────╮ │ --metric [time|peak|allocated|al Metric: time | peak | │ │ locations|memory] allocated | allocations │ │ | memory (memory is an │ │ alias of peak; pair with │ │ --stat for a │ │ distribution). │ │ [default: time] │ │ --view TEXT compare | scatter | │ │ sweep | scaling │ │ (default: by count). │ │ --facet TEXT Dim to facet by. │ │ --x TEXT scaling: dim for the │ │ x-axis. │ │ --clip FLOAT Clamp the colour scale. │ │ --label -l TEXT Series label per run, in │ │ order (repeat). Default: │ │ stem. │ │ --output -o PATH HTML out. │ │ --open --no-open [default: no-open] │ │ --help Show this message and │ │ exit. │ ╰──────────────────────────────────────────────────────────────────────────────╯
--facet and --label/-l (a series label per run, repeatable, defaulting to the
file stem) accept the same dims your tests carry.
Public Python API¶
Light to import — pytest_benchmem re-exports only the engine and the readers;
pytest_benchmem.plotting pulls plotly and pytest_benchmem.sweep shells to uv,
so import those submodules directly.
Engine — pytest_benchmem¶
measure_peak(action, repeats=1) -> int
measure_memory(action, repeats=1) -> MemoryResult
action is a zero-arg callable. measure_peak returns the bare peak in bytes;
measure_memory returns the full MemoryResult (peak_bytes, peak_bytes_max,
allocations, total_bytes, repeats).
Readers & loader — pytest_benchmem¶
from_pytest_benchmark(path, *, metric="min") -> (label, [Sample], unit)
memory_from_pytest_benchmark(path, *, field="peak_bytes") -> (label, [Sample], unit)
load_samples(path, *, metric="time", stat="min") -> (label, [Sample], unit)
load_long_df(runs, *, metric="time", stat="min") -> (DataFrame, unit)
discover_runs(root=".benchmarks") -> [Path]
human_bytes(n) -> str
from_pytest_benchmarkreads timing (seconds, fromstats);memory_from_pytest_benchmarkreads memory (bytes, fromextra_info.benchmem).load_samplesis the unified reader —metricis one oftime/peak/allocated/allocations;stat(time only) ismin/median/…load_long_dfstacks runs into the tidy frame the plots pivot — columnssnapshot,id,value, plus one per dim.discover_runs()collects saved runs from.benchmarks/— pytest-benchmark's storage dir, where--benchmark-save/--benchmark-autosavewrite — so you can hand the readers a directory instead of listing files.- A
Sampleis(id, value, dims);dimsis a mapping of dim name →str/int/float.
Plotting — pytest_benchmem.plotting¶
Every plot_* returns (figure, n_ids):
plot_scaling(snapshots, *, metric="time", x=None, color=None, facet=None, log="auto", labels=None)
plot_scatter(snapshots, *, metric="time", facet=None, clip=None, labels=None)
plot_compare(snapshots, *, metric="time", sort="absolute", facet=None, clip=None, labels=None)
plot_sweep(snapshots, *, metric="time", clip=None, labels=None)
snapshots is a list of run JSON paths. labels names the series per run (defaults
to the file stems) — the API behind plot's -l/--label. plot_compare's sort is
"absolute" (native units) or "relative" (percent).
Sweeps — pytest_benchmem.sweep¶
sweep(versions, run, **provision_kwargs) -> [failed_version_label]
See Cross-version sweeps for the parameters and the Venv object.