Something you don't know about the Support Vector Machine(SVM) is here | Full Mathematics Intuition| Machine Learning | Data Science

- June 17, 2020

Something you don't know about the Support Vector Machine(SVM) is here.

Key Terms:

Hyperplane:

A hyperplane is a flat subspace of (n-1) dimension in an "n" dimension space. For a 2D plane, hyperplane will be of 1D means a line. For 3D space, the hyperplane plane will consist of 2D, means a flat plane and furthermore dimensions can't visualise but it can be on paper. Let's have a look at how it is.

The equation for hyperplane in two dimensional (2D) is as follow:

β0 + β1X1 + β2X2 = 0 ... (i)

This as simple as the equation of a line, right? Because this hyperplane is actually a line. But in case of the higher dimensional plane, we have a generalised equation for the n-dimensional plane...

β0 + β1X1 + β2X2 + ... + βnXn = 0 ... (ii)

Again if the point is in this n-dimensional space then only be the equations((i) & (ii)) are equal to 0 if the point X is lying on the hyperplane. But if...

β0 + β1X1 + β2X2 + ... + βnXn > 0 ... (iii)

then, in this case, the point X is not on the plane instead, the point is on one side of the hyperplane. In the following equation, the point is on the other hand of the hyperplane...

β0 + β1X1 + β2X2 + ... + βnXn < 0 ... (iv)

Maximal Marginal Classifier VS Support Vector Classifier VS Support Vector Machine

Maximal marginal Classifier(MMC) makes a maximal marginal hyperplane on the graph if our data is perfectly separable. It makes the separating hyperplane that is farthest from the training observation. Then we can classify the data points that based on which side it belongs.

The Solid line is Maximal Marginal Hyperplane, and the dashed line is the minimum margin, those overlapping points on the dashed line are support vectors.

But there is an issue with the MMC, show this figure...

In the left figure, there is a Maximal Marginal hyperplane which is classifying the two classes successfully, but in the left one, there is an extra data point, due to which our hyperplane shifted drastically.

The Maximal Marginal hyperplane is highly sensitive, it can be affected dramatically due to only one point itself. As in the above figure on adding just one point the hyperplane shifted and as a result, it begins to show the tiny margin between some points which is not acceptable because the distance between the observation and the hyperplane measures our confidence that the model is classified successfully. This sensitivity of hyperplane can overfit the training data.

So, here comes the concept of Support vector classifier. It could be worthy and productive for our algorithm to misclassify or ignore some observations on the space to get a stable hyperplane. The Support Vector Classifier can do the same for us.

By using Support Vector Classifier(SVC) we can actually define how many points we want to allow on incorrect side of hyperplane so that at last we can have a stable hyperplane with the marginal line.

It also called the soft marginal classified because the margin is soft enough to ignore some observations.

But there is also, some other issues with the Support Vector Classifier(SVC), i.e. when it comes to non-linear class boundaries then our SVC doesn't work properly there. Let's have a look at this graph below...

Here we applied the Support Vector Classifier, but we got a linear classification hyperplane however it needs a non-linear classification hyperplane to classify it, so the Support Vector Machine(SVM) allows us to determine non-linear boundaries for n-dimensional space, with the help of kernel.

By using Support Vector Machine(SVM) we can actually use the kernel to make a non-linear hyperplane which will work in case the data points are not able to linearly separable.

Let's have a look at the graph below...

This is the outcome of SVM after using kernels. Polynomial kernel(left) and Radial Kernal(Right) is used.

The Inner product of the two observation is given by:

The Linear Support Vector Classifier can be expressed as:

So the generalisation of the inner product for support vector Classifier is:

Finally, the generalise expression for the Support Vector Classifier is:

Here K represents the kernel. A kernel is a function that quantifies the similarities of the observations. In the case of SVC, the kernel is linear. But anyone can use another kernel for better results. This is a polynomial kernel below:

Here d is the degree of the polynomial, and d > 1. It usually fits the SVC in the higher dimensional space. Now this is called Support Vector Machine(SVM), because

When the support vector classifier is combined with a non-linear kernel, then it is called Support Vector Machine(SVM).

Now the function is represented as for polynomial kernel:

With Polynomial Kernel

Apart from it, it can be represented for the non-linear radial kernel:

With Radial Kernel

So, it is all about the Support Vector Machine Mathematics and concept.

If you love my work then you can connect to me on LinkedIn and Github

Search This Blog

| INNOVITRONICS |