Anonymization

AI/ML training corpus prep

Build a de-identified training corpus for AI/ML model development.

Problem

You are building a training dataset for an AI/ML model. The DICOM files must be de-identified and optionally converted to a non-DICOM format (e.g. NIfTI) for ingestion into your ML pipeline. Consistent anonymization across thousands of files is critical.

Steps

  1. Open a DICOM file — click Open files… (⌘O) to load a representative sample.
  2. Switch to Anonymization mode — click the Anon tab (⌘2).
  3. Configure anonymization — select a profile that matches your institutional IRB requirements. For ML datasets, the Limited Dataset profile is often appropriate (dates retained, PHI removed).
  4. Apply — click Apply & Export to generate a de-identified copy.
  5. Convert to ML format — use the Export menu to convert the series to NIfTI-1 (.nii.gz). Verify the output preserves spatial geometry (affine matrix) and volume dimensions.
  6. Batch processing — for large corpora, use the watch-folder or CLI automation features to process thousands of files consistently.

Expected Result

  • De-identified DICOM files with consistent anonymization across the corpus.
  • NIfTI-1 output preserves spatial orientation (affine matrix is non-zero).
  • Volume dimensions (X, Y, Z) match the original DICOM series.
  • Pixel data integrity is maintained through the de-identification and conversion pipeline.