Posts

Day 25 - Seaborn - Violinplot

Image
         By  Jerin Lalichan  Violinplot     It is similar to the boxplot except that it provides a higher, more advanced visualization and uses the kernel density estimation to give a better description of the data distribution.            A violin plot plays a similar role as a box and whisker plot. It shows the distribution of quantitative data across several levels of one (or more) categorical variables such that those distributions can be compared. Unlike a box plot, in which all of the plot components correspond to actual data points, the violin plot features a kernel density estimation of the underlying distribution.      This can be an effective and attractive way to show multiple distributions of data at once, but keep in mind that the estimation procedure is influenced by the sample size, and violins for relatively small samples might look misleadingly smooth.

Day 24 - Seaborn - Boxplot

Image
        By  Jerin Lalichan  Boxplot      Box Plot is the visual representation of the depicting groups of numerical data through their quartiles. Boxplot is also used to detect the outlier in the data set.      A box plot (or box-and-whisker plot) shows the distribution of quantitative data in a way that facilitates comparisons between variables or across levels of a categorical variable. The box shows the quartiles of the dataset while the whiskers extend to show the rest of the distribution, except for points that are determined to be “outliers” using a method that is a function of the interquartile range.

Day 23 - Seaborn - Countplot

Image
       By  Jerin Lalichan  Countplot      A countplot basically counts the categories and returns a count of their occurrences. It is one of the simplest plots provided by the seaborn library.

Day 22 - Seaborn - Line Plot, Barplot

Image
     By  Jerin Lalichan  Line Plot      The line plot is one of the most basic plots in the seaborn library. This plot is mainly used to visualize the data in the form of some time series, i.e. in a continuous manner.   Barplot      A barplot is basically used to aggregate the categorical data according to some methods and by default, it’s the mean. It can also be understood as a visualization of the group by action. To use this plot we choose a categorical column for the x-axis and a numerical column for the y-axis, and we see that it creates a plot taking a mean per categorical column.

Day 21 - Plotting in Seaborn

Image
          By  Jerin Lalichan  Different categories of plot in Seaborn      Plots are mainly used to show how different variables relate to one another. These variables can be either wholly numerical or a category like a group, class, or division. Seaborn divides the plot into the below categories. Relational plots: This plot is used to understand the relation between two variables. Categorical plots: This plot deals with categorical variables and how they can be visualized. Distribution plots: This plot is used for examining univariate and bivariate distributions Regression plots: The regression plots in seaborn are primarily intended to add a visual guide that helps to emphasize patterns in a dataset during exploratory data analyses. Matrix plots: A matrix plot is an array of scatterplots. Multi-plot grids: It is a useful approach to draw multiple instances of the same plot on different subsets of the dataset. Some ...

Day 20 - Advanced Ensemble Techniques - Bagging

Image
         By  Jerin Lalichan  Advanced Ensemble Techniques Bagging      The idea behind bagging is to combine the results of multiple models to get a generalized result. But, the model created from the same set of data gives the same result. So to solve this problem we use a technique called Bootstrapping. This is one of the few methods to solve this problem.      Bootstrapping is a sampling technique in which we create subsets of observations from the original dataset, with replacement. The size of the subsets is the same as the size of the original set.      The bagging (or Bootstrap Aggregating) technique uses these subsets (bags) to get a fair idea of the distribution (complete set). The size of subsets created for bagging may be less than the original set. Multiple subsets are created from the original dataset, selecting observations with replacements. A base model (weak model) is created on each of thes...

Day 19 - Advanced Ensemble techniques - Blending

     By  Jerin Lalichan  Advanced Ensemble techniques Blending      Blending follows the same approach as stacking but uses only a validation set from the train set to make predictions. In other words, unlike stacking, the predictions are made on the holdout set only. The holdout set and the predictions are used to build a model which is run on the test set.       Blending is a technique derived from Stacking Generalization. The only difference is that in Blending, the k-fold cross validation technique is not used to generate the training data of the meta-model.       Blending implements “one-holdout set”, that is, a small portion of the training data (validation) to make predictions which will be “stacked” to form the training data of the meta-model. Also, predictions are made from the test data to form the meta-model test data. Blending is a similar approach to stacking. The train set is split into trainin...