Beware the Overfit Trap in Data Analysis
It
can be exciting when your data analysis suggests a surprising or
counterintuitive prediction. But the result might be due to overfitting,
which occurs when a statistical model describes random noise rather
than the underlying relationship you need to capture. You can guard
against this trap by keeping your analysis simple. Be on guard against
spurious correlations, and look for relationships that measure important
effects related to clear, logical hypotheses. Test for overfitting by
randomly dividing the data into a training set, with which you’ll
estimate the model, and a validation set, with which you’ll test the
accuracy of the model’s predictions. An overfit model might be great at
making predictions within the training set but raise warning flags by
performing poorly in the validation set. You might also consider
alternative narratives: Is there another story you could tell with the
same data? If so, you cannot be confident that the relationship you have
uncovered is the right — or only — one.
No hay comentarios:
Publicar un comentario