segauge vs MONAI, Metrics Reloaded, seg-metrics, and surface-distance¶

If you are looking for a Python library to evaluate medical image segmentation, here is an honest comparison of segauge with the common alternatives.

	segauge	MONAI	Metrics Reloaded	seg-metrics	DeepMind surface-distance
Dice / IoU	✅	✅	✅	✅	✅
HD95 / ASSD / NSD	✅	✅	✅	✅	✅
Surface-mesh distances (vs voxel grid)	✅	❌	❌	❌	❌
Confidence intervals	✅	❌	❌	❌	❌
Per-lesion detection F1	✅	partial	✅	❌	❌
Subgroup / fairness slicing	✅	❌	❌	❌	❌
DICOM-SEG / RTSTRUCT input	✅	partial	❌	❌	❌
One-command CLI + HTML report	✅	❌	❌	❌	❌
`pip install`	✅	✅	❌ (git only)	✅	✅

What segauge adds¶

The metric menu is similar everywhere; the standard for which metrics to report is set by Metrics Reloaded. What segauge bundles that the others do not, together, is the reporting layer:

Confidence intervals on every metric, by default.
Per-lesion detection F1, so you can report lesion-level sensitivity.
Subgroup / fairness slicing, to find where a model fails (scanner, site, demographic).
Native DICOM-SEG / RTSTRUCT input, to evaluate what a clinical pipeline produced.
A single segauge eval command and a self-contained HTML report.

On distance metrics¶

segauge computes distance metrics on a surface mesh at true voxel spacing, following the MeshMetrics method, rather than on the voxel grid. In our own benchmark (benchmarks/mesh_vs_grid.py) this measurably reduces HD95 error under anisotropic, thick-slice spacing; for mean distance (ASSD) on smooth shapes the methods are comparable. We report what the benchmark shows, not more: distance metrics are implementation-sensitive on curved surfaces, and segauge is transparent about it.

When to use something else¶

If you only need Dice in a training loop, MONAI's metrics are fine and already in your stack. If you are choosing which metrics are appropriate for your task, read Metrics Reloaded first. segauge is for the step after: producing honest, reportable, DICOM-aware evaluation results.