The Datamatics

Posts

Day 24 - Seaborn - Boxplot

- September 23, 2022

By Jerin Lalichan Boxplot Box Plot is the visual representation of the depicting groups of numerical data through their quartiles. Boxplot is also used to detect the outlier in the data set. A box plot (or box-and-whisker plot) shows the distribution of quantitative data in a way that facilitates comparisons between variables or across levels of a categorical variable. The box shows the quartiles of the dataset while the whiskers extend to show the rest of the distribution, except for points that are determined to be “outliers” using a method that is a function of the interquartile range.

Day 23 - Seaborn - Countplot

- September 23, 2022

By Jerin Lalichan Countplot A countplot basically counts the categories and returns a count of their occurrences. It is one of the simplest plots provided by the seaborn library.

Day 22 - Seaborn - Line Plot, Barplot

- September 23, 2022

By Jerin Lalichan Line Plot The line plot is one of the most basic plots in the seaborn library. This plot is mainly used to visualize the data in the form of some time series, i.e. in a continuous manner. Barplot A barplot is basically used to aggregate the categorical data according to some methods and by default, it’s the mean. It can also be understood as a visualization of the group by action. To use this plot we choose a categorical column for the x-axis and a numerical column for the y-axis, and we see that it creates a plot taking a mean per categorical column.

Day 21 - Plotting in Seaborn

- September 19, 2022

By Jerin Lalichan Different categories of plot in Seaborn Plots are mainly used to show how different variables relate to one another. These variables can be either wholly numerical or a category like a group, class, or division. Seaborn divides the plot into the below categories. Relational plots: This plot is used to understand the relation between two variables. Categorical plots: This plot deals with categorical variables and how they can be visualized. Distribution plots: This plot is used for examining univariate and bivariate distributions Regression plots: The regression plots in seaborn are primarily intended to add a visual guide that helps to emphasize patterns in a dataset during exploratory data analyses. Matrix plots: A matrix plot is an array of scatterplots. Multi-plot grids: It is a useful approach to draw multiple instances of the same plot on different subsets of the dataset. Some ...

Day 20 - Advanced Ensemble Techniques - Bagging

- September 17, 2022

By Jerin Lalichan Advanced Ensemble Techniques Bagging The idea behind bagging is to combine the results of multiple models to get a generalized result. But, the model created from the same set of data gives the same result. So to solve this problem we use a technique called Bootstrapping. This is one of the few methods to solve this problem. Bootstrapping is a sampling technique in which we create subsets of observations from the original dataset, with replacement. The size of the subsets is the same as the size of the original set. The bagging (or Bootstrap Aggregating) technique uses these subsets (bags) to get a fair idea of the distribution (complete set). The size of subsets created for bagging may be less than the original set. Multiple subsets are created from the original dataset, selecting observations with replacements. A base model (weak model) is created on each of thes...

Day 19 - Advanced Ensemble techniques - Blending

- September 17, 2022

By Jerin Lalichan Advanced Ensemble techniques Blending Blending follows the same approach as stacking but uses only a validation set from the train set to make predictions. In other words, unlike stacking, the predictions are made on the holdout set only. The holdout set and the predictions are used to build a model which is run on the test set. Blending is a technique derived from Stacking Generalization. The only difference is that in Blending, the k-fold cross validation technique is not used to generate the training data of the meta-model. Blending implements “one-holdout set”, that is, a small portion of the training data (validation) to make predictions which will be “stacked” to form the training data of the meta-model. Also, predictions are made from the test data to form the meta-model test data. Blending is a similar approach to stacking. The train set is split into trainin...

Search This Blog

The Datamatics

Posts

Day 25 - Seaborn - Violinplot

Day 24 - Seaborn - Boxplot

Day 23 - Seaborn - Countplot

Day 22 - Seaborn - Line Plot, Barplot

Day 21 - Plotting in Seaborn

Day 20 - Advanced Ensemble Techniques - Bagging

Day 19 - Advanced Ensemble techniques - Blending