11.08.2021
Decision Tree Regression
To know deeper about it, we need to learn about Information entropy to have a mathematical insight into it. But the whole of this algorithm is to add more information into our system to achieve better prediction. If we go for the default option, it would be taking all of the options and taking the average across all of the points, then whatever we assign, it is always the average for all of the points that we had existing previously. So we just split our diagram up into these terminal leaves, the machine learning algorithm has added information to our system, we can more accurately predict the coming element.
- Training the Decision Tree Regression model on the whole dataset
from sklearn.tree import DecisionTreeRegressor
regressor = DecisionTreeRegressor(random_state = 0)
regressor.fit(X, y)
- Predicting a new result
regressor.predict([[6.5]])
- Visualising the Decision Tree Regression results (higher resolution)
X_grid = np.arange(min(X), max(X), 0.01)
X_grid = X_grid.reshape((len(X_grid), 1))
plt.scatter(X, y, color = 'red')
plt.plot(X_grid, regressor.predict(X_grid), color = 'blue')
plt.title('Truth or Bluff (Decision Tree Regression)')
plt.xlabel('Position level')
plt.ylabel('Salary')
plt.show()
* Suggestion:
1. If you have a big dataset, just apply for Taking care of missing data, because it will just leave the dataset like it was even if there's not missing data, so you don't have to carefully check wether there are missing data or not.
2. Check if there is any categorical data, apply for one-hot encoding if there's no order relationship in your categorical variable or the label.
3. Check if you have to apply for Feature scaling
4. If you have several features, you won't be able to plot this figure with 2 dimensions