Cross Validation

Cross-validation is a technique for assessing how the statistical analysis generalizes to an independent data set. It is a technique for evaluating machine learning models by training several models on subsets of the available input data and evaluating them on the complementary subset of the data. We can detect overfitting easily with this technique.

Different types of cross-validation techniques are:-

1. K-Fold Cross Validation

2. Leave P-out Cross Validation

3. Leave One-out Cross Validation

4. Repeated Random Sub-sampling Method

5. Holdout Method

Among these K-Fold cross-validation is most commonly used.

Why do we need cross-validation?

We usually split the dataset into training and testing datasets. But the accuracy and metrics are highly biased on certain factors like how the split is done, depending on the shuffling, which part of the data is used for training, etc.

Hence, it does not represent the model's ability to generalize a dataset. This leads to the need for cross-validation.

K-fold Cross validation

The first step is to separate the test dataset for the final evaluation. Cross-validation is to be performed on the training dataset only.

5-fold cross-validation

The complete training data set is initially divided into k equal parts. The remaining k-1 parts are utilized to train the model while the part which was set aside is to be used as the hold-out (testing) set. The holdout set is then used to test the trained model. This procedure is repeated k times, with the holdout set being changed each time. As a result, each data point has an equal chance of being included in the test set.

The value of K typically equals 3 or 5. Even larger values like 10 or 15 can be used, but doing so requires a lot of calculation and takes a long time duration.

I am doing a challenge - #66DaysofData in which I will be learning something new from the Data Science field for 66 days, and I will be posting daily topics on my LinkedIn, On my GitHub repository, and on my blog as well.

Stay Curious!

By Jerin Lalichan

Search This Blog

The Datamatics

Day 15 - Cross Validation

Cross Validation

Why do we need cross-validation?

K-fold Cross validation

Comments

Post a Comment

Popular posts from this blog

#66DaysOfData ? Here is why you should also accept the challenge.

What exactly is Data Science ? Who is a data scientist ? Explained in a simple way.

Day 17 - Ensemble Techniques in ML - Averaging, Weighted average