• Ingen resultater fundet

Evaluation of Image Segmentation


Academic year: 2022

Del "Evaluation of Image Segmentation"


Indlæser.... (se fuldtekst nu)

Hele teksten


Computational Radiology Laboratory Harvard Medical School


Children’s Hospital Department of Radiology Boston Massachusetts

Evaluation of Image Segmentation

Simon K. Warfield, Ph.D.

Associate Professor of Radiology

Harvard Medical School



•  Segmentation

–  Identification of structure in images.

–  Many different algorithms and a wide range of principles upon which they are based.

•  Segmentation is used for:

–  Quantitative image analysis –  Image guided therapy

–  Visualization


Validation of Image Segmentation

•  Spectrum of accuracy versus realism in reference standard.

•  Digital phantoms.

–  Ground truth known accurately.

–  Not so realistic.

•  Acquisitions and careful segmentation.

–  Some uncertainty in ground truth.

–  More realistic.

•  Autopsy/histopathology.

–  Addresses pathology directly; resolution.

•  Clinical data ?


Validation of Image Segmentation

•  Comparison to digital and physical phantoms:

–  Excellent for testing the anatomy, noise and artifact which is modeled.

–  Typically lacks range of normal or

pathological variability encountered in practice.

MRI of brain


Comparison To Higher Resolution

MRI Photograph MRI


Comparison To Higher Resolution


Comparison to Autopsy Data

•  Neonate gyrification index

–  Ratio of length of cortical boundary to length

of smooth contour enclosing brain surface



Stage 3 Stage 5

Stage 3: at 28 w GA

shallow indentations of inf. frontal and sup. Temp. gyrus

(1 infant at 30.6 w GA,

normal range: 28.6 ± 0.5 w GA)

Stage 4: at 30 w GA

2 indentations divide front. lobe into 3 areas, sup. temp.gyrus clearly detectable

(3 infants, 30.6 w GA ± 0.4 w, normal range: 29.9 ± 0.3 w GA)

Stage 5: at 32 w GA

frontal lobe clearly divided into three parts: sup., middle and inf. Frontal gyrus (4 infants, 32.1 w GA ± 0.7 w,

normal range: 31.6 ± 0.6 w GA)

Stage 6: at 34 w GA

temporal lobe clearly divided into


Neonate GI: MRI Vs Autopsy


GI Increase Is Proportional to Change in Age.


GI Versus Qualitative Staging


Neonate Gyrification


Validation of Image Segmentation

•  STAPLE (Simultaneous Truth and Performance Level Estimation):

–  An algorithm for estimating performance and ground truth from a collection of

independent segmentations.

–  Warfield, Zou, Wells, IEEE TMI 2004.

–  Warfield, Zou, Wells, PTRSA 2008.

–  Commowick and Warfield, IEEE TMI 2010.


Validation of Image Segmentation

•  Comparison to expert performance; to other algorithms.

•  Why compare to experts ?

–  Experts are currently doing the segmentation tasks that we seek algorithms for:

•  Surgical planning.

•  Neuroscience research.

•  Response to therapy assessment.

•  What is the appropriate measure for such

comparisons ?


Measures of Expert Performance

•  Repeated measures of volume

–  Intra-class correlation coefficient

•  Spatial overlap

–  Jaccard: Area of intersection over union.

–  Dice: increased weight of intersection.

–  Vote counting: majority rule, etc.

•  Boundary measures

–  Hausdorff, 95% Hausdorff.

•  Bland-Altman methodology:

–  Requires a reference standard.

•  Measures of correct classification rate:

–  Sensitivity, specificity ( Pr(D=1|T=1), Pr(D=0|T=0) )

–  Positive predictive value and negative predictive value


Measures of Expert Performance

•  Our new approach:

•  Simultaneous estimation of hidden

``ground truth’’ and expert performance.

•  Enables comparison between and to experts.

•  Can be easily applied to clinical data exhibiting range of normal and

pathological variability.


How to judge segmentations of the peripheral zone?

1.5T MR of prostate Peripheral zone and segmentations


Estimation Problem

•  Complete data density:

•  Binary ground truth T i for each voxel i.

•  Expert j makes segmentation decisions D ij.

•  Expert performance characterized by sensitivity p and specificity q.

–  We observe expert decisions D. If we knew ground truth T, we could construct

maximum likelihood estimates for each

expert’s sensitivity (true positive fraction)



•  General procedure for estimation problems that would be simplified if some missing data was available.

•  Key requirements are specification of:

–  The complete data.

–  Conditional probability density of the hidden data given the observed data.

•  Observable data D

•  Hidden data T, prob. density

•  Complete data (D,T)

f ( T | D, ˆ θ )




•  Solve the incomplete-data log likelihood maximization problem

•  E-step: estimate the conditional

expectation of the complete-data log likelihood function.

•  M-step: estimate parameter values

Q( θ | ˆ θ ) = E ln f ( D, T | θ ) | D, θ ˆ

 

 

argmax θ Q ( ) θ | ˆ θ



•  Since we don’t know ground truth T, treat T as a random variable, and solve for the expert

performance parameters that maximize:

•  Parameter values θ j =[p j q j ] T that maximize the

conditional expectation of the log-likelihood function are found by iterating two steps:

–  E-step: Estimate probability of hidden ground truth T given a previous estimate of the expert quality parameters, and take the expectation.

–  M-step: Estimate expert performance parameters by

Q( θ | ˆ θ ) = E ln f ( D, T | θ ) | D, θ ˆ

 

 



•  Consider binary labels:

–  foreground.

–  background.

•  Spatial correlation of the unknown true

segmentation can be modelled with a

Markov Random Field.


To Solve for Expert Parameters:


True Segmentation Estimate


Expert Performance Estimate

Now we seek an expression for the conditional

expectation of the complete-data log likelihood function

that we can maximize.


Expert Performance Estimate

Now, consider each expert separately:


Expert Performance Estimate

p (sensitivity, true positive fraction) : ratio of expert identified class 1 to total class 1 in the image.

q (specificity, true negative fraction) : ratio of expert


Extension to Several Tissue Labels

•  Complete data density:

•  True segmentation T i for each voxel i –  May be binary

–  May be categorical

•  Expert j makes segmentation decisions D ij

•  Expert performance θ s’s characterizes

probability of deciding label s’ when true label


Probability Estimate of True Labels


Expert Performance Estimate

Now, consider each expert separately:


Parameter Estimation

Noting that

We can formulate the constrained optimization



Parameter Estimation


And noting that

We find that


Results: Synthetic Experts

•  Several experiments with known ground truth and known performance parameters.

•  Goal:

–  Determine if STAPLE accurately identifies known ground truth.

–  Determine if STAPLE accurately determines known expert performance parameters.

–  Understand sensitivity of STAPLE with respect to changes in prior hyper-parameters; requirements for number of observations to enable good

estimation; convergence characteristics.


Synthetic Experts

10 observations of segmentation by expert with p=q=0.99

STAPLE p,q estimates:

mean p 0.990237

std. dev p 0.000616

mean q 0.990121

std. dev q 0.00071


Synthetic Experts

10 segmentations by experts with p=0.95, q=0.90

STAPLE p,q estimates:

mean p 0.950104

std. dev p 0.001201

mean q 0.900035

std. dev q 0.001685


Expert and Student Segmentations

Test image Expert consensus Student 1


Phantom Segmentation

Image Expert Students Voting STAPLE

Image Expert





Prostate Peripheral Zone

1 2 3 4 5



.879 .991 .937 .918 .895



.998 .994 .999 .999 .999

Dice .913 .951 .967 .955 .944


A Binary MRF Model for Spatial Homogeneity.

Include a prior probability for the neighborhood configuration:


MAP Estimation With MRF Prior


Synthetic Experts

Only three segmentations by different quality experts.

STAPLE p,q estimates:

p1, q1 0.9505,0.9494 p2, q2 0.9511,0.8987 p3, q3 0.9000,0.8987

p=0.95,q=0.95 p=0.95,q=0.90


Cryoablation of Kidney Tumor

Segmentations before training session with radiologist:

Rater frequency. STAPLE with MRF.

After training session:

Based on the STAPLE


assessment, we found the

training session created a



increase in


Newborn MRI Segmentation


Newborn MRI Segmentation

Summary of segmentation quality (posterior probability

Pr(T=t|D=t) ) for each tissue type for repeated manual



STAPLE Summary

•  Key advantages of STAPLE:

–  Estimates ``true’’ segmentation.

–  Assesses expert performance.

•  Principled mechanism which enables:

–  Comparison of different experts.

–  Comparison of algorithm and experts.

•  Extensions for the future:

–  Can we learn image features that lead to

different levels of expert performance?



•  Neil Weisenfeld.

•  Andrea Mewes.

•  Petra Huppi.

•  Olivier Clatz.

•  William Wells.

•  Olivier Commowick .

This study was supported by:

Colleagues contributing to this work:

•  Arne Hans.

•  Heidelise Als.

•  Lianne Woodward.

•  Frank Duffy.

•  Arne Hans.

•  Kelly Zou.



The Healthy Home project explored how technology may increase collaboration between patients in their homes and the network of healthcare professionals at a hospital, and

The feedback controller design problem with respect to robust stability is represented by the following closed-loop transfer function:.. The design problem is a standard

In a series of lectures, selected and published in Violence and Civility: At the Limits of Political Philosophy (2015), the French philosopher Étienne Balibar

In general terms, a better time resolution is obtained for higher fundamental frequencies of harmonic sound, which is in accordance both with the fact that the higher

Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of

The organization of vertical complementarities within business units (i.e. divisions and product lines) substitutes divisional planning and direction for corporate planning

Driven by efforts to introduce worker friendly practices within the TQM framework, international organizations calling for better standards, national regulations and

If Internet technology is to become a counterpart to the VANS-based health- care data network, it is primarily neces- sary for it to be possible to pass on the structured EDI