Day 13 - Bias and Variance

By Jerin Lalichan 


Bias and Variance in Machine Learning

    With larger datasets, various algorithms, implementation techniques, and learning requirements, it has become very complex to create and analyze machine learning models since all those factors directly affect the model's performance and accuracy.

    This is further skewed by incorrect assumptions, outliers, and noises. So it is very important to understand prediction errors (Bias and Variance). Gaining proper knowledge about these will aid us to build accurate models with better performance, without overfitting or underfitting.


Bias

    It is the phenomenon that skews the output of a model in favor or against the idea. It is considered as a systematic error within the model, due to incorrect assumptions in the ML process. A model with high bias pays very little attention to the training data and oversimplifies the model

    In other words, Bias is the error between average model prediction and ground truth. A model with High bias would not match closely with the dataset. A low-bias dataset would match closely with the training dataset.

Characteristics :

  • Fails to capture proper data trends.
  • More prone to underfitting
  • Oversimplified / More generalized
  • High error rate

Variance

    It is a measure of changes in the models when different training data set is used for training the model. Or it is the variability in model prediction. How much the Model is susceptible to change if a different dataset was used for training.

    The model with high variance pays a lot of attention to training data and does not generalize on the data which it hasn’t seen before. As a result, such models will perform well with training data but will have greater error rates with test data.




   
  I am doing a challenge - #66DaysofData  in which I will be learning something new from the Data Science field for 66 days, and I will be posting daily topics on my LinkedIn, On my GitHub repository, and on my blog as well.


Stay Curious!  





Comments

Popular posts from this blog

Day 17 - Ensemble Techniques in ML - Averaging, Weighted average

Day 4 - Performance metrics in Machine Learning - Regression