Day 15 - Cross Validation
Cross Validation
Cross-validation is a technique for assessing how the statistical analysis generalizes to an independent data set. It is a technique for evaluating machine learning models by training several models on subsets of the available input data and evaluating them on the complementary subset of the data. We can detect overfitting easily with this technique.
1. K-Fold Cross Validation
2. Leave P-out Cross Validation
3. Leave One-out Cross Validation
4. Repeated Random Sub-sampling Method
5. Holdout Method
Among these K-Fold cross-validation is most commonly used.
Why do we need cross-validation?
We usually split the dataset into training and testing datasets. But the accuracy and metrics are highly biased on certain factors like how the split is done, depending on the shuffling, which part of the data is used for training, etc.
Hence, it does not represent the model's ability to generalize a dataset. This leads to the need for cross-validation.
K-fold Cross validation
The first step is to separate the test dataset for the final evaluation. Cross-validation is to be performed on the training dataset only.
The value of K typically equals 3 or 5. Even larger values like 10 or 15 can be used, but doing so requires a lot of calculation and takes a long time duration.
Comments
Post a Comment