Decision Tree Regression Full Concepts | Machine Learning | Data Science
Decision Tree Regression with Example | Machine Learning Algorithm | Data Science
sal = mean log salary |
We are predicting the player's salary by years he played and no hits he made in his whole career. So, the above figure shows the regression tree to fit it and it consists of a series of splitting rules and stating at the top of a tree and ending at the last second level of the tree.
As in this regression tree, the splitting is based on the experience in years then with the more than 4.5 years experience the tree is further split into two parts on the basis of hits they scored. After two splits the player are split into three leaf nodes or three regions in prediction space. Those three regions R1, R2 & R3 are shown in this prediction space.
From the decision tree, we can observe that the salary depends on the year of experience first when the years of experience extends from 4.5 years then the hits made by those players matters and it does affect their salary.
Prediction through the regression tree
It contains two steps:
1. We will divide the predictor space into the different non-overlapping regions consists of the set of possible values.
The possible set of distinct values: X1, X2, X3, X4, X5,... Xn
j-distinct non-overlapping regions: R1, R2, R3,...Rj
2. For every prediction fall into any region Rj, which is the mean of the response values for the training values.
How it works?
Let's suppose we got three regions R1, R2 & R3 and the response mean of the all observation in R1 is 10, in R2 is 20 and in R3 is 30. Then for any given observation, for which we required to predict the best region, if its belong to the R1 then we will predict the value 10 for this, if its belong to the R3 then value 20 will be predicted for this and so on.
In the above figure, we are getting the three rectangular shape regions(R1, R2 & R3), but we actually obtain any randomly shaped regions. The motive is to minimize the RSS value (Residual Sum of Squares) by finding the regions (R1, R2, R3,... Rj) which is given by...
RSS Values |
It is a greedy algorithm because it is computationally infeasible to consider every possible partition of the feature space into j regions, that's why we take top-down greedy approach so that is known as recursive binary splitting.
It is a greedy algorithm because at every step of splitting it finds for the best split on that step only, it doesn't care about picking that split which will give the best tree in future steps. It works on the present state.
In order to perform a recursive binary splitting, we first select the predictor(Xi) and a cutpoint s, you can say s is a threshold after which the predictor will belong to a region and before this, the predictor will belong to another region.
Cutpoint : s
Region 1 : { X|Xj < s }
Region 2 : { X|Xj ≥ s }
Basically, it performs the best possible splitting so that it can lead to the greatest reduction in RSS values. Such that we choose all the predictor (X1,..., Xp) and the cutpoint value 's' for each predictor. Then we select that value of predictor 'Xp' and cutpoint 's' at which the lowest possible value of RSS can be found. At last, our goal is to minimise the RSS value.
So, this is all about the concept of Decision Tree regression.
Thank you for sharing such a useful article. I had a great time. This article was fantastic to read. Continue to publish more articles on
ReplyDeleteAI Services
Data Engineering Services