Posts

Showing posts from September, 2022

Day 25 - Seaborn - Violinplot

Image
         By  Jerin Lalichan  Violinplot     It is similar to the boxplot except that it provides a higher, more advanced visualization and uses the kernel density estimation to give a better description of the data distribution.            A violin plot plays a similar role as a box and whisker plot. It shows the distribution of quantitative data across several levels of one (or more) categorical variables such that those distributions can be compared. Unlike a box plot, in which all of the plot components correspond to actual data points, the violin plot features a kernel density estimation of the underlying distribution.      This can be an effective and attractive way to show multiple distributions of data at once, but keep in mind that the estimation procedure is influenced by the sample size, and violins for relatively small samples might look misleadingly smooth.

Day 24 - Seaborn - Boxplot

Image
        By  Jerin Lalichan  Boxplot      Box Plot is the visual representation of the depicting groups of numerical data through their quartiles. Boxplot is also used to detect the outlier in the data set.      A box plot (or box-and-whisker plot) shows the distribution of quantitative data in a way that facilitates comparisons between variables or across levels of a categorical variable. The box shows the quartiles of the dataset while the whiskers extend to show the rest of the distribution, except for points that are determined to be “outliers” using a method that is a function of the interquartile range.

Day 23 - Seaborn - Countplot

Image
       By  Jerin Lalichan  Countplot      A countplot basically counts the categories and returns a count of their occurrences. It is one of the simplest plots provided by the seaborn library.

Day 22 - Seaborn - Line Plot, Barplot

Image
     By  Jerin Lalichan  Line Plot      The line plot is one of the most basic plots in the seaborn library. This plot is mainly used to visualize the data in the form of some time series, i.e. in a continuous manner.   Barplot      A barplot is basically used to aggregate the categorical data according to some methods and by default, it’s the mean. It can also be understood as a visualization of the group by action. To use this plot we choose a categorical column for the x-axis and a numerical column for the y-axis, and we see that it creates a plot taking a mean per categorical column.

Day 21 - Plotting in Seaborn

Image
          By  Jerin Lalichan  Different categories of plot in Seaborn      Plots are mainly used to show how different variables relate to one another. These variables can be either wholly numerical or a category like a group, class, or division. Seaborn divides the plot into the below categories. Relational plots: This plot is used to understand the relation between two variables. Categorical plots: This plot deals with categorical variables and how they can be visualized. Distribution plots: This plot is used for examining univariate and bivariate distributions Regression plots: The regression plots in seaborn are primarily intended to add a visual guide that helps to emphasize patterns in a dataset during exploratory data analyses. Matrix plots: A matrix plot is an array of scatterplots. Multi-plot grids: It is a useful approach to draw multiple instances of the same plot on different subsets of the dataset. Some basic plots using seaborn Dist plot :  Seaborn dist plot is used to

Day 20 - Advanced Ensemble Techniques - Bagging

Image
         By  Jerin Lalichan  Advanced Ensemble Techniques Bagging      The idea behind bagging is to combine the results of multiple models to get a generalized result. But, the model created from the same set of data gives the same result. So to solve this problem we use a technique called Bootstrapping. This is one of the few methods to solve this problem.      Bootstrapping is a sampling technique in which we create subsets of observations from the original dataset, with replacement. The size of the subsets is the same as the size of the original set.      The bagging (or Bootstrap Aggregating) technique uses these subsets (bags) to get a fair idea of the distribution (complete set). The size of subsets created for bagging may be less than the original set. Multiple subsets are created from the original dataset, selecting observations with replacements. A base model (weak model) is created on each of these subsets. The models run in parallel and are independent of each other. The fi

Day 19 - Advanced Ensemble techniques - Blending

     By  Jerin Lalichan  Advanced Ensemble techniques Blending      Blending follows the same approach as stacking but uses only a validation set from the train set to make predictions. In other words, unlike stacking, the predictions are made on the holdout set only. The holdout set and the predictions are used to build a model which is run on the test set.       Blending is a technique derived from Stacking Generalization. The only difference is that in Blending, the k-fold cross validation technique is not used to generate the training data of the meta-model.       Blending implements “one-holdout set”, that is, a small portion of the training data (validation) to make predictions which will be “stacked” to form the training data of the meta-model. Also, predictions are made from the test data to form the meta-model test data. Blending is a similar approach to stacking. The train set is split into training and validation sets. We train the base models on the training set. We make pr

Day 18 - Advanced Ensemble techniques - Stacking

Image
    By  Jerin Lalichan  Advanced Ensemble techniques Here are the advanced methods in Ensemble techniques: 1. Stacking      This technique uses prediction from different models (eg. Decision tree, SVM, KNN, etc.) to form a new model. This model is used for making predictions on the test set. The concept used here is that each model can learn different parts of the problem but not the whole problem.       So, you can build multiple different learners and you use them to build an intermediate prediction, one prediction for each learned model. Then you add a new model which learns from the intermediate predictions of the same target. Steps: We split the training data into K-folds like in cross-validation. A base model is fitted on the K-1 parts and predictions are made for Kth part. We do this for each part of the training data. The base model is then fitted on the whole train data set to calculate its performance on the test set. We repeat the last 3 steps for other base models. Predicti

Day 17 - Ensemble Techniques in ML - Averaging, Weighted average

   By  Jerin Lalichan  Ensemble Techniques in ML      Ensemble methods is a machine learning technique that combines several base models in order to produce one optimal predictive model.      Ensemble methods include building multiple models and combining them to achieve better outcomes. To put it another way, they integrate the conclusions drawn from various models to enhance overall performance. Generally speaking, ensemble methods produce more accurate results than a single model would.     F or example, let's consider the case in which you need to decide if you should go to a particular movie or not. You can infer that a diverse group of people are likely to make better decisions as compared to individuals. So it's best to check online reviews since it is an aggregation of reviews of hundreds of people from different backgrounds when compared to asking a few of your friends.            Similar is true for a diverse set of models in comparison to single models. This diversif

Day 16 - Ensemble Techniques in ML - Max Voting

  By  Jerin Lalichan  Ensemble Techniques in ML      Ensemble methods is a machine learning technique that combines several base models in order to produce one optimal predictive model.      Ensemble methods include building multiple models and combining them to achieve better outcomes. To put it another way, they integrate the conclusions drawn from various models to enhance overall performance. Generally speaking, ensemble methods produce more accurate results than a single model would.     F or example, let's consider the case in which you need to decide if you should go to a particular movie or not. You can infer that a diverse group of people are likely to make better decisions as compared to individuals. So it's best to check online reviews since it is an aggregation of reviews of hundreds of people from different backgrounds when compared to asking a few of your friends.            Similar is true for a diverse set of models in comparison to single models. This diversifi

Day 15 - Cross Validation

Image
By  Jerin Lalichan  Cross Validation      Cross-validation is a technique for assessing how the statistical analysis generalizes to an independent data set. It is a technique for evaluating machine learning models by training several models on subsets of the available input data and evaluating them on the complementary subset of the data. We can detect overfitting easily with this technique. Different types of cross-validation techniques are:-      1. K-Fold Cross Validation      2. Leave P-out Cross Validation      3. Leave One-out Cross Validation      4. Repeated Random Sub-sampling Method      5. Holdout Method Among these K-Fold cross-validation is most commonly used. Why do we need cross-validation?     We usually split the dataset into training and testing datasets. But the accuracy and metrics are highly biased on certain factors like how the split is done, depending on the shuffling, which part of the data is used for training, etc. Hence, it does not represent the model's

Day 14 - Overfitting and underfitting

Image
  By  Jerin Lalichan  Overfitting and Underfitting The degree of fitting of data points in a model directly correlates to whether it will give accurate predictions or not. Overfitting        In supervised learning, overfitting happens when our model captures the noise along with the underlying pattern in data. It happens when we train our model a lot over the noisy datasets. These models have low bias and high variance. These models are very complex like Decision trees which are prone to overfitting.    This occurs when dealing with highly complex models where the model will match almost all the given data points of the training dataset and perform well in training datasets. However, the model would not be able to generalize the data point in the test data set to predict the outcome accurately. Underfitting      In supervised learning, underfitting happens when a model is unable to capture the underlying pattern of the data. These models usually have high bias and low variance. It happ

Day 13 - Bias and Variance

By  Jerin Lalichan  Bias and Variance in Machine Learning     With larger datasets, various  algorithms, implementation techniques, and learning requirements, it has become very complex to create and analyze machine learning models since all those factors directly affect the model's performance and accuracy.     This is further skewed by incorrect assumptions, outliers, and noises. So it is very important to understand prediction errors (Bias and Variance). Gaining proper knowledge about these will aid us to build accurate models with better performance, without overfitting or underfitting. Bias      It is the phenomenon that skews the output of a model in favor or against the idea. It is considered as a systematic error within the model, due to incorrect assumptions in the ML process. A model  with high bias pays very little attention to the training data and oversimplifies the model     In other words, Bias is the error between average model prediction and ground truth. A model w

Day 12 - How to find the best K value in K-Means Algorithm - Elbow Curve

Image
            By  Jerin Lalichan  How to do it? Clustering algorithms like K-Means need the user to input the number of clusters to be formed. For this, we need to find the optimum number of clusters to be generated. A commonly used method is using the Elbow Curve. Elbow Curve / Knee Curve      K means works in a way to reduce the Within-cluster sum of squares (WCSS) is minimized. In this method, we vary the value of K from 1 to 10.       For each value of K, the WCSS is calculated. WCSS is nothing but the sum of squares of the distance between each value and their corresponding cluster centroid.       We start with K=1, and the highest value for WCSS  is observed for K=1. When the K goes higher, WCSS decreases. And from the above graph, we can see that WCSS shows a rapid change at a certain point (here K=5), and the line gets parallel to the X axis. And this point is called the Elbow point and is taken as the optimum value of K.      

Day 11 - K-Means Clustering Algorithm

Image
           By  Jerin Lalichan  Clustering     It is a technique used to group together objects with similar characteristics. K-Means is an iterative type unsupervised machine learning algorithm. In their simplest form, clusters are sets of data points that share similar attributes, and clustering algorithms are the methods that group these data points into different clusters based on their similarities.      The purpose of clustering and classification algorithms is to make sense of and extract value from large sets of structured and unstructured data.       If you’re working with huge volumes of unstructured data, it only makes sense to try to partition the data into some sort of logical groupings before attempting to analyze it. K-Means Clustering       One of the popular algorithms used for clustering is the k-means algorithm. K-means uses distance as a measure of similarity. Let’s say we have a data set with two columns. Now, we need to cluster this data set based on similarities o

Day 10 - Overfitting in Predictive Models

Image
          By  Jerin Lalichan  Overfitting      It is a modeling error that occurs when the model aligns very closely to the data points which used to train the model. As a result, the model fails to perform well with other data points. It becomes useful for the training g data set only. This happens when the model is trained for too long or if the model is complex. ( In Regression) (In Classification)      Low bias and High variance are indicators of overfitting. In order to detect the presence of overfitting, a part of the training dataset is set aside as a test set to check for overfitting. The training accuracy will be higher than that of testing accuracy. How to avoid Overfitting Below are a number of techniques that you can use to prevent overfitting: Early Stopping Train with more data Data augmentation Feature selection Regularization Ensemble methods Read more

Day 9 - Hypothesis, Hypothesis Testing

Image
         By  Jerin Lalichan  Hypothesis     A hypothesis is an educated guess about something you can test through observation or experimentation.      Examples of hypotheses are: a new medicine you think might work, sleeping for 8 hours might improve memory power, drinking coffee before bed may cause delayed sleep, etc. It can be anything that can be tested. Hypothesis Statement :      This is how we represent a hypothesis. It contains an 'if ' condition and a  'then' statement. eg:  If I (give exams at noon instead of 7) then (student test scores will improve). Hypothesis Testing      This is a way of testing the outcome of a survey of experiments to see if they are reliable and are of meaningful results. Basically, we are finding the odds that our results have happened by chance. And if the results are happened by chance, it is useless or of little use, since it is not replicable. Steps : Figure out the null hypothesis, State the null hypothesis Choose what kind of t

Day 8 - Bayes Theorem

Image
        By  Jerin Lalichan  Bayes Theorem      Baye's theorem is an important theorem in statistics and probability which gives the probability of occurrence of an event that is biased by another event that has already occurred      In simple words, it determines the conditional probability of event A, given that event B has already occurred. Difference between Conditional probability and Bayes Theorem      Conditional probability:            Conditional Probability is the probability of an event A that is based on the occurrence of another event B.           Bayes Theorem:      Bayes's Theorem is derived using the definition of conditional probability. Baye's theorem formula includes two conditional probabilities.

Day 7 - Central Limit theorem

Image
       By  Jerin Lalichan  Central Limit Theorem     The central limit theorem states that the distribution of means of samples taken from a large population, with replacement, approaches a Normal distribution, even if the population is not normally distributed.      As we increase the number of samples etc., the graph of the sample means will approach a normal distribution. The sample size must be 30 or higher for the central limit theorem to stand.      One of the most important feature of the theorem is that the mean of the sample will be the mean of the entire population itself. If we calculate the mean of multiple samples of the population, add them up, and find their average, the result will be the estimate of the population mean.      The same applies when using standard deviation. If you calculate the standard deviation of all the samples in the population, add them up, and find the average, the result will be the standard deviation of the entire population.

Day 6 - K-Nearest Neighbor (KNN) Algorithm

Image
     By  Jerin Lalichan  KNN Algorithm It is a supervised Machine learning algorithm, which can be used for both regression and classification problems. In this algorithm:  The distance between the test data point and all the training  data points is calculated  The nearest K number of points is selected  In the case of Regression, the average of those points is taken as the predicted value. In the case of Classification, the probability for each class is calculated and the output will be the class with the highest probability.  Euclidean distance  Manhattan distance KNN is a Lazy Learner? Generally, algorithms can be of two types, lazy learners and eager learners. Eager learners make a generalization with the training data set before receiving test data in order to predict the output of test data. But lazy learner, say KNN, it doesn't make a generalization, ie, no model is created with the training data. Instead, it waits until the arrival of test data to do the math. So basically