Overfitting - simplified
- Photo by Glenn Carstens-Peters on Unsplash
Imagine you are in desperate need of a new dress shirt. You visit your favourite tailors and they start measuring for the perfect fit. Every inch of your upper body is measured and since they are incredibly fast with their production you have your new shirt an hour later. It fits perfectly, like a glove. The gala on saturday is prepared.
You get the shirt out of your closet and it shines - perfect for the occasion. However, when you put it on after a hearty dinner the front buttons are under pressure - immense pressure. When you sit down at the gala - PLOPP! BOING! - you lost two buttons, your shirt looks torn and all the money spend was wasted on a perfectly fitted product. Your shirt did fit incredibly well.. once.
This situation occured because the fit was too tight, too perfect. It was amazing for a single day but did not fit for general use. This is what happens when models in data science are overfitting. They try to fit perfectly to the training data but fail to generalize well.
I hope this helped - more will come.