Anonymization
AI/ML training corpus prep
Build a de-identified training corpus for AI/ML model development.
Problem
You are building a training dataset for an AI/ML model. The DICOM files must be de-identified and optionally converted to a non-DICOM format (e.g. NIfTI) for ingestion into your ML pipeline. Consistent anonymization across thousands of files is critical.
Steps
- Open a DICOM file — click Open files… (⌘O) to load a representative sample.
- Switch to Anonymization mode — click the Anon tab (⌘2).
- Configure anonymization — select a profile that matches your institutional IRB requirements. For ML datasets, the Limited Dataset profile is often appropriate (dates retained, PHI removed).
- Apply — click Apply & Export to generate a de-identified copy.
- Convert to ML format — use the Export menu to convert the series to NIfTI-1 (
.nii.gz). Verify the output preserves spatial geometry (affine matrix) and volume dimensions. - Batch processing — for large corpora, use the watch-folder or CLI automation features to process thousands of files consistently.
Expected Result
- De-identified DICOM files with consistent anonymization across the corpus.
- NIfTI-1 output preserves spatial orientation (affine matrix is non-zero).
- Volume dimensions (X, Y, Z) match the original DICOM series.
- Pixel data integrity is maintained through the de-identification and conversion pipeline.