Performance Evaluation
All improvements are statistically significant (p<0.001)
MI and EM do not make much difference given our training data
EM needs more training data
MI is more susceptible to noise, so may not scale well