Simple and Multiple Linear regression in just 10 lines of code | Python | SciKit Lib | Machine Learning | Sklearn

10-20 years ago, machine learning, data science or artificial intelligence were not a topic which was needed to be added in our daily talk. But today we have lots of data collected, then it is becoming in the limelight and already taken lots of space in the IT Industry as well.

So, when we have lots of data then we can use it to predict the future context which can take place in coming time. Or even we can use these data to let the machine learn that, how a human works. So, the machine can take part in humans daily routine.

With the help of machine learning, we can do such things and transform the way of living for humans, where the machine will work for humans.

In this post, we will learn machine learning algorithms, which is currently in use for prediction by the top IT companies and top businesses already.

Linear regression is the algorithm which major business uses to predict the future stock prices, bitcoin price after one year or it can be used to predict the future sales of the company.

So, Let's begin with the learning...

Regression:

It is a statistical process, by which we can determine the relationship between a dependent variable and one or more independent variable. With the help of the formation of a regression line, we can do this.

Why Regression?

Technically, we need regression to show the relationship between the variables. It is also useful when we want to forecast a result in some cases.

Linear Regression:

Regression is of several types like Linear Regression, Logistic Regression, Stepwise Regression or ElasticNet Regression etc. Today we will cover the LINEAR REGRESSION.

Linear Regression is the algorithm to determine the relationship between the dependent and independent variable, linearly. By a linear regression line. It is one of the most used algorithms in the business sector and too simple to implement.

Types of Linear Regression:

1. Simple Linear Regression

In this type of Linear regression, we work with one independent and one dependent variable, for example, how much crop will be left depends on the amount of locust who will attack crop. So, the crop is dependent and locust is independent.

So, let's begin with the implementation:

import pandas as pd  
import numpy as np  
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression

Line 1: Importing Pandas for making data frame.

Line 2: Importing NumPy

Line 3: Importing Data Visualisation Library for printing graph

Line 4: From SciKit library, importing the class LinearRegression.

df= pd.read_csv('C://Users//Vicky//Downloads//MyLinearData.txt')
print(df)

Line 5: Making of a data frame by reading MyLinearData CSV file.

Line 6: Have a look at the data, that we have. ("MyLinearData.csv".)

plt.ylabel('SALARY')
plt.xlabel('AGE')
plt.scatter(df.Age,df.Salary,color='red')
plt.show()

Line 7: Naming Y_Label = "SALARY"

Line 8: Naming X_Label = "AGE"

Line 9: Plotting a scatter plot of Age VS Salary

Line 10: Printing plot.

regmod = LinearRegression()
regmod.fit(df[['Age']],df.Salary)

Line 11: Making an object "regmod" from the class LinearRegression()

Line 12: Fitting the object regmod in the datagram. we have used "df[['Age']]" because Sklearn doesn't support the scalar or 1D array.

Now, our Linear regression Model is ready to predict.

plt.ylabel('SALARY')
plt.xlabel('AGE')
plt.scatter(df.Age,df.Salary,color='red')
plt.plot(df.Age,regmod.predict(df[['Age']]))
plt.show()

Line 16: Putting the regression line.

predicted_salary=regmod.predict(np.array([47]).reshape(1, 1))
print(predicted_salary)

Line 17: We are predicting the "Salary" on the "Age = 47", again we have made the scalar value '47' to the 2d array with the help of NumPy.

Output: [85104.84608351]

r_sq = regmod.score(df[['Age']],df.Salary)
print(r_sq)

Line 18: Here, in this line, we find out the R^2 Value, it is the goodness of fit, which tell us that how much accurate our algorithm is. Its value lies between range 0 to 1. O means bad accuracy and 1 means best accuracy. And value more than 0.5 is believed as a good value.

Output: 0.7520801994669682

2. Multiple Linear Regression

In Multiple Regression, we use more than one independent variable and only one dependent variable. It can be used there, where one factor depends on the multiple independent factors. For example, the height of a child can depend on the height of father, mother, nutrition and environment factor.

Let's look at the implementation...

df2= pd.read_csv('C://Users//Vicky//Downloads//Refactored_Py_DS_ML_Bootcamp-master//11-Linear-Regression//HousingData.csv')
print(df2.head())

Line 1: We have read our dataset which consists of more than one independent columns(called features) and only one dependent column which is "Price". After the end, we will predict the price of a house with some random features.

Line 2: Printing the first five lines of data.

print(df2.columns)

Line 3: Printing all columns names.

Output: Index(['Avg. Area Income', 'Avg. Area House Age', 'Avg. Area Number of Rooms', 'Avg. Area Number of Bedrooms', 'Area Population', 'Price', 'Address'], dtype='object')

from sklearn.model_selection import train_test_split

Line 4: *Important*We have already imported important library already but this time we are using test data, to test our algorithm against the trained model. So, be conscious because it is too much important.

X=df2[['Avg. Area Income', 'Avg. Area House Age', 'Avg. Area Number of Rooms', 'Area Population']]
y=df2['Price']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=101)

Line 5: Our all independent Variable stored into X.

Line 6: Our dependent variable i.e. "Price", which our target variable also stored in y.

Line 7: We have used train_test_split() to split our training data and test data, so that we will use our test data later to determine the accuracy of our trained model and trained data will be used to train the data before the testing. test_size represents the proportion of the dataset to include on the test_split. random_split controls the shuffling applied to the data before applying the split. Pass an int for reproducible output across multiple function calls.

regmod2 = LinearRegression()
regmod2.fit(X_train,y_train)

Line 8: Making an instance(object) of class LinearRegression().

Line 9: Fitting the independent and dependent variable training dataset into the object of LinearRegression().

predicted_Price=model.predict(X_test)
print(predicted_Price)

Line 10: Printing all the predictions regarding our X_test data

Output: [1259416.93245187 819912.40235391 1745007.23656468 . . . 1188931.9609227

  869356.79699607  696362.65724594]

I will update this one with the accuracy of our model, so stay tuned.

So, Congratulations🧨you are done with the Linear Regression, and if you have any doubt then let me know in the comment.

If you love my work then you can connect with me on LinkedIn and Github.

Search This Blog

| INNOVITRONICS |