Anonymization

AI/ML training corpus prep

Build a de-identified training corpus for AI/ML model development.

Problem

You are building a training dataset for an AI/ML model. The DICOM files must be de-identified and optionally converted to a non-DICOM format (e.g. NIfTI) for ingestion into your ML pipeline. Consistent anonymization across thousands of files is critical.

Steps

Open a DICOM file — click Open files… (⌘O) to load a representative sample.
Switch to Anonymization mode — click the Anon tab (⌘2).
Configure anonymization — select a profile that matches your institutional IRB requirements. For ML datasets, the Limited Dataset profile is often appropriate (dates retained, PHI removed).
Apply — click Apply & Export to generate a de-identified copy.
Convert to ML format — use the Export menu to convert the series to NIfTI-1 (.nii.gz). Verify the output preserves spatial geometry (affine matrix) and volume dimensions.
Batch processing — for large corpora, use the watch-folder or CLI automation features to process thousands of files consistently.

Expected Result

De-identified DICOM files with consistent anonymization across the corpus.
NIfTI-1 output preserves spatial orientation (affine matrix is non-zero).
Volume dimensions (X, Y, Z) match the original DICOM series.
Pixel data integrity is maintained through the de-identification and conversion pipeline.