Day 14 - Overfitting and underfitting

 By Jerin Lalichan 


Overfitting and Underfitting

The degree of fitting of data points in a model directly correlates to whether it will give accurate predictions or not.

Overfitting

     In supervised learning, overfitting happens when our model captures the noise along with the underlying pattern in data. It happens when we train our model a lot over the noisy datasets. These models have low bias and high variance. These models are very complex like Decision trees which are prone to overfitting. 

  This occurs when dealing with highly complex models where the model will match almost all the given data points of the training dataset and perform well in training datasets. However, the model would not be able to generalize the data point in the test data set to predict the outcome accurately.



Underfitting

    In supervised learning, underfitting happens when a model is unable to capture the underlying pattern of the data. These models usually have high bias and low variance. It happens when we have a very less amount of data to build an accurate model or when we try to build a linear model with nonlinear data. Also, this kind of model is very simple to capture the complex patterns in data like Linear and logistic regression.

    Underfitting occurs when the model is unable to match the input data to the target data. This happens when the model is not complex enough to match all the available data and performs poorly with the training dataset.





    In the above diagram, the center of the target is a model that perfectly predicts correct values. As we move away from the bulls-eye our predictions become get worse and worse. We can repeat our process of model building to get separate hits on the target.




   
  I am doing a challenge - #66DaysofData  in which I will be learning something new from the Data Science field for 66 days, and I will be posting daily topics on my LinkedIn, On my GitHub repository, and on my blog as well.


Stay Curious!  





Comments

Popular posts from this blog

Day 17 - Ensemble Techniques in ML - Averaging, Weighted average

Day 4 - Performance metrics in Machine Learning - Regression