Beware the Overfit Trap in Data Analysis
It
 can be exciting when your data analysis suggests a surprising or 
counterintuitive prediction. But the result might be due to overfitting,
 which occurs when a statistical model describes random noise rather 
than the underlying relationship you need to capture. You can guard 
against this trap by keeping your analysis simple. Be on guard against 
spurious correlations, and look for relationships that measure important
 effects related to clear, logical hypotheses. Test for overfitting by 
randomly dividing the data into a training set, with which you’ll 
estimate the model, and a validation set, with which you’ll test the 
accuracy of the model’s predictions. An overfit model might be great at 
making predictions within the training set but raise warning flags by 
performing poorly in the validation set. You might also consider 
alternative narratives: Is there another story you could tell with the 
same data? If so, you cannot be confident that the relationship you have
 uncovered is the right — or only — one.
 
 

 
No hay comentarios:
Publicar un comentario