Posts

Showing posts with the label Data Science

K Means Clustering Mathematics & Elbow Method to find optimal value of K | Data Science | Machine Learning | Explanation |

Image
K Means Clustering Mathematics & Elbow Method to find optimal value of K | Data Science | Machine Learning | Explanation | K means clustering(KMC) is an unsupervised machine learning algorithm. This directly means the supervision is not here to help the model to learn. In KMC the all process of classification of data is done by the model itself, it recognises the features of the data points and then with more likely features data it put them all in a group called clusters . Basically, it makes the multiple clusters of the related data by identifying the features in the dataset. Credits: giphy.com It is an unsupervised machine learning algorithm, which use to classify the data into cluster form. Each cluster contains similar type of data points. For example, you have some apple, orange and banana then you have to classify them then if you feed it to KMC. KMC will make a group of fruits which looks like same like banana are long and yellow. Orange is spherical and the colour is orang...

Concept of Random Forest | Mathematics | Machine Learning | ML Algorithm | Data Science

Image
Concept of Random Forest | Mathematics | Machine Learning | ML Algorithm | Data Science Photo by David Kovalenko on Unsplash Trees don't have the same level of accuracy as the other prediction algorithm, so the random forest came up in the limelight, it uses trees as a building block to form a more powerful algorithm. In the random forest, the process of finding the root node and leaf node runs randomly and it is made of more than one decision trees. So, it is called Random Forest. Ensemble Technique:  Basically, sometimes we use more than one model together to increase the efficiency of model and accuracy of predictions. So, it is called Ensemble Technique . It has further two types i.e. Bagging and Boosting . The bagging is also known as the bootstrap aggregation. In bagging the different base models feed with the different sample of data from the main dataset for the purpose of training of the models. After training of all models, a test dataset is fed to all the trained models...

Concept of Decision Tree Classification | Machine Learning | Data Science | Mathematics

Image
Concept of Decision Tree Algorithm | Machine Learning | Data Science | Mathematics Decision Tree Algorithm for Classification Decision Tree Algorithm is one of the most popular algorithms and widely used in machine learning. It is a type of supervised learning-based algorithm, can be used for both classification and regression. Photo by Fabrice Villard on Unsplash Let's see first how it works? A simple decision tree example So, now we are enough aware of the decision tree, so let's get deeper. Impurity It is a measurement that how much our data is impure, means how much homogeneity is present in your data. Image Source: Research Gate For measuring impurity we have several measures from which we will learn these two:  1. Entropy: Entropy is nothing but the randomness in your dataset. Which increase predictability. It is directly proportional to the non-homogeneity in your dataset. It measures the purity of the split. Use:  We analyse the entropy on every node in the decision tr...

Linear regression to Logistic regression a conversion | Maths Intuition | Data Science | Machine Learning | Mathematics

Image
Linear regression to Logistic regression a conversion | Maths Intuition | Data Science | Machine Learning | Mathematics Photo by  Chris Lawton  on  Unsplash Logistic regression is a machine learning algorithm, which is used for : 1. Binary Classification 2. Multiclass Classification Yes, it is a classification algorithm, not a regression algorithm as the name is suggesting. Basically, it is used to classify between the Yes/No , 0/1 or True/False . It uses the sigmoid function to predict the result. Problem with the linear regression: In linear regression, we can find the value of a dependent variable with the help of other independent variables. It works with continuous values, basically, it draws a regression line over the graph so that the distance between the regression line and the points should be minimum. But when it comes to the binary classification then it is impossible to make a regression line through the data points, so that it can give you the best results a...

Plotly & Cufflinks | A Data Visualisation Library with Modern Features | Python | Data Science | Data Visualisation

Image
Plotly & Cufflinks | A Data Visualisation Library with Modern Features | Python | Data Science | Data Visualisation Plotly Plotly gives you lots of interactive and dynamic data visualisation & UI tools for data science, Machine Learning & Engineering. It has some great features which we will discuss in this post. So, be ready for the tutorial. Installation Guide on your Machine: For Plotly put this on your terminal and run. pip install plotly or conda install -c plotly plotly Cufflinks For using Plotly library we have to configure it with the pandas, so the cufflinks library giver us a privilege to do it. Because the Plotly is built on the top of d3.js . Installation Guide on your Machine: For cufflinks put this on your terminal and run. pip install cufflinks or conda install -c conda-forge cufflinks-py Let's begin... Import all the Libraries needed. import pandas as pd import numpy as np import cufflinks as cf from plotly.offline import download_plotlyjs,init_notebook_...

Concept of Support Vector Regression(SVR) | SVM | Mathematics | Machine Learning

Image
Concept of Support Vector Regression(SVR) | SVM | Mathematics | Machine Learning SVM stands for Support vectors machine, it is a famous classification and regression algorithm. We will today talk about Support vector regression more. So, let's begin... Look at this graph... Suppose you have to classify the elements, how you will you do? You will do it something like this... Now, let's suppose you have this graph... So, for the separation for this, we don't have any simple line separation method, so we will add one more axis to it i.e. z-axis. From upside, it will look like this... This task which we have done in previous graphs, this is actually SVM does. Support Vector Machine is a supervised machine learning algorithm, which is used for classification and regression challenges. Terminologies: There are a few points to learn before going to further lesson... Hyperplane: This is a line we draw ago for classification of data classes in SVM. And in support vector regression ...

Simple and Multiple Linear regression in just 10 lines of code | Python | SciKit Lib | Machine Learning | Sklearn

Image
Simple and Multiple Linear regression in just 10 lines of code | Python | SciKit Lib | Machine Learning | Sklearn 10-20 years ago, machine learning, data science or artificial intelligence were not a topic which was needed to be added in our daily talk. But today we have lots of data collected, then it is becoming in the limelight and already taken lots of space in the IT Industry as well. So, when we have lots of data then we can use it to predict the future context which can take place in coming time. Or even we can use these data to let the machine learn that, how a human works. So, the machine can take part in humans daily routine. With the help of machine learning, we can do such things and transform the way of living for humans, where the machine will work for humans. In this post, we will learn machine learning algorithms, which is currently in use for prediction by the top IT companies and top businesses already.  Linear regression is the algorithm which major business uses...

A Quick Guide to Data pre-processing for Machine Learning | Python | IMPUTATION | STANDARDISATION | Data Analysis | Data Science

Image
Data Pre-processing | Imputation | Standardisation | Rescaling | Python Before feeding your data to the machines you have to prepare this data before inserting it into your Machine Learning Algorithm. Quality of your data has to be good, it doesn't contain null values or out of the constraint values. Because the quality of your data is directly related to the quality of training of your model. For more info about Data-preprocessing's need and requirements,  Click Here. Photo by  Mika Baumeister  on  Unsplash 1. Imputation Imputation's simply meant the "change", this process will help you to change the missing value from your table. There are lots of algorithms which can't deal with the null values and might give you errors or badly trained model. Let's have a look at the data below, this data is about the salary of several domains... THE DATASET This data has some null values, which is unbearable by the machine learning ...