- Pruning is a method to prevent overfitting in decision trees. With one type of pruning, one first builds a decision tree allowing overfitting to occur, then uses a pruning data set to assess whether a more accurate tree would result if each subtree were replaced by a leaf. This method is called:
a. forward-pruning.
b. backwards-pruning.
c. pre-pruning.
d. post-pruning.
Correct answer: d
Analysis:
Using the subtree to replace the leaf is the special points of post-pruning. - A company decided to use decision tree induction to predict which one of two items customers were more likely to buy based on their demographics. The following tree was obtained using some training data:
Using the fitted tree, predict the item sold to the following customers:
male, rural, old ()
male, rural, young ()
female, urban, young ()
Correct answer: iteam2; iteam2; iteam2;
Analysis:
We can use the attributes of each model to find the leaf of this tree. - A botanist is trying to build an algorithm to predict whether a given mushroom is edible, given some of its attributes. She decides to do so by applying decision tree induction to some data that she collected:
The botanist needs to choose the attribute to be used for the initial split. Help her by computing the information gain corresponding to each attribute (please report at least 3 significant digits):
Information gain for Colour ()
Information gain for Pattern ()
Information gain for Cap ()
Correct answer: 0.0161; 0 ; 0.0147;
Analysis:
First, we could calculate the information of the whole examples of the set.
Second, we could divide the colour attribute into two parts. There are 9 examples for brown and 5 examples for red.
So, we calculate the information below brown. In the brown examples, there are 4 no and 5 yes. The information is :
In the red examples, there are 2 yes and 3 no. The information is :
So, the information of colour is :
The information gain is :
Because there are 9 brown examples and 5 red examples.
Third, we could calculate the information of patten. There are 6 solid examples and 8 spotted examples.
The information is :
So, the information of patten is that:
The information gain is :
Finally, we could calculate the information of cap. There are 7 flat examples and 7convex examples.
The information is :
The information of cap is:
The information gain of cap is that:
- Pruning is based on the assumption that peripheral splits are more likely than those closer to the root to be the result of overfitting. Which observation(s) justifies this assumption?
a. Splits at deeper nodes are supported by weaker statistical evidence since they are based on smaller samples.
b. All other answers .
c. As the tree gets deeper, the major relationships will have already been discovered.
Correct answer: b
Analysis:
When we make the tree deeper, the more clear relationship we get. - A company produces gadgets of different shapes, colour and material. They are interested in finding whether the attributes of the different gadgets can be used to predict their sales. The data collected is reported in the following table:
Two employees decide to apply decision tree induction to these data to learn a rule that predicts the sales of each gadgets. One of them decides to use the ID3 algorithm (the one that we have seen during the lecture) while the other one decides to try exhaustively all possible trees and choose the best one. To their surprise, the two trees look different:
Which one is the three obtained with ID3?()
What is the Information Gain of the attribute Material if used as first split?()
It seems like in this case IDT has built a suboptimal tree. Why did it happen? ()
Coeerct answer: A;
Analysis:
Because ID3 is a greedy algorithm that chooses a single attribute at the time, it cannot exploit the interaction between attributes. In this case, although Material and Shape when taken together classify the data perfectly, they are discarded on the basis that each one, taken independently, is useless. Their Information gain, in fact, is zero (because they split the data into 2 subsets with the same proportion of high vs low Sales as the original set of examples) while that of Colour is slightly greater than zero. Therefore ID3 (and most DT algorithms) chooses Colour as first split. The choice made by ID3 corresponds to tree A.
Machine learning(2): Quiz2
最新推荐文章于 2022-02-11 11:50:57 发布