Computational Radiology Laboratory Harvard Medical School
www.crl.med.harvard.edu
Children’s Hospital Department of Radiology Boston Massachusetts
Evaluation of Image Segmentation
Simon K. Warfield, Ph.D.
Associate Professor of Radiology
Harvard Medical School
Segmentation
• Segmentation
– Identification of structure in images.
– Many different algorithms and a wide range of principles upon which they are based.
• Segmentation is used for:
– Quantitative image analysis – Image guided therapy
– Visualization
Validation of Image Segmentation
• Spectrum of accuracy versus realism in reference standard.
• Digital phantoms.
– Ground truth known accurately.
– Not so realistic.
• Acquisitions and careful segmentation.
– Some uncertainty in ground truth.
– More realistic.
• Autopsy/histopathology.
– Addresses pathology directly; resolution.
• Clinical data ?
Validation of Image Segmentation
• Comparison to digital and physical phantoms:
– Excellent for testing the anatomy, noise and artifact which is modeled.
– Typically lacks range of normal or
pathological variability encountered in practice.
MRI of brain
Comparison To Higher Resolution
MRI Photograph MRI
Comparison To Higher Resolution
Comparison to Autopsy Data
• Neonate gyrification index
– Ratio of length of cortical boundary to length
of smooth contour enclosing brain surface
Staging
Stage 3 Stage 5
Stage 3: at 28 w GA
shallow indentations of inf. frontal and sup. Temp. gyrus
(1 infant at 30.6 w GA,
normal range: 28.6 ± 0.5 w GA)
Stage 4: at 30 w GA
2 indentations divide front. lobe into 3 areas, sup. temp.gyrus clearly detectable
(3 infants, 30.6 w GA ± 0.4 w, normal range: 29.9 ± 0.3 w GA)
Stage 5: at 32 w GA
frontal lobe clearly divided into three parts: sup., middle and inf. Frontal gyrus (4 infants, 32.1 w GA ± 0.7 w,
normal range: 31.6 ± 0.6 w GA)
Stage 6: at 34 w GA
temporal lobe clearly divided into
Neonate GI: MRI Vs Autopsy
GI Increase Is Proportional to Change in Age.
GI Versus Qualitative Staging
Neonate Gyrification
Validation of Image Segmentation
• STAPLE (Simultaneous Truth and Performance Level Estimation):
– An algorithm for estimating performance and ground truth from a collection of
independent segmentations.
– Warfield, Zou, Wells, IEEE TMI 2004.
– Warfield, Zou, Wells, PTRSA 2008.
– Commowick and Warfield, IEEE TMI 2010.
Validation of Image Segmentation
• Comparison to expert performance; to other algorithms.
• Why compare to experts ?
– Experts are currently doing the segmentation tasks that we seek algorithms for:
• Surgical planning.
• Neuroscience research.
• Response to therapy assessment.
• What is the appropriate measure for such
comparisons ?
Measures of Expert Performance
• Repeated measures of volume
– Intra-class correlation coefficient
• Spatial overlap
– Jaccard: Area of intersection over union.
– Dice: increased weight of intersection.
– Vote counting: majority rule, etc.
• Boundary measures
– Hausdorff, 95% Hausdorff.
• Bland-Altman methodology:
– Requires a reference standard.
• Measures of correct classification rate:
– Sensitivity, specificity ( Pr(D=1|T=1), Pr(D=0|T=0) )
– Positive predictive value and negative predictive value
Measures of Expert Performance
• Our new approach:
• Simultaneous estimation of hidden
``ground truth’’ and expert performance.
• Enables comparison between and to experts.
• Can be easily applied to clinical data exhibiting range of normal and
pathological variability.
How to judge segmentations of the peripheral zone?
1.5T MR of prostate Peripheral zone and segmentations
Estimation Problem
• Complete data density:
• Binary ground truth T i for each voxel i.
• Expert j makes segmentation decisions D ij.
• Expert performance characterized by sensitivity p and specificity q.
– We observe expert decisions D. If we knew ground truth T, we could construct
maximum likelihood estimates for each
expert’s sensitivity (true positive fraction)
Expectation-Maximization
• General procedure for estimation problems that would be simplified if some missing data was available.
• Key requirements are specification of:
– The complete data.
– Conditional probability density of the hidden data given the observed data.
• Observable data D
• Hidden data T, prob. density
• Complete data (D,T)
f ( T | D, ˆ θ )
Computational Radiology Laboratory.