Chapter 1.5 : Decision Theory
PRML, OXford University Deep Learning Course, Machine Learning, Pattern Recognition
Christopher M. Bishop, PRML, Chapter 1 Introdcution
1. PRML所需要的三论:
- Probability theory: provides us with a consistent mathematical framework for quantifying and manipulating uncertainty.
- Decision theory: allows us to make optimal decisions in situations involving uncertainty such as those encountered in pattern recognition.
- Information theory:
Inference step & Decision step
- The joint probability distribution p(x,t) provides a complete summary of the uncertainty associated with these variables. Determination of p(x,t) from a set of training data is an example of inference and is typically a very difficult problem whose solution forms the subject of much of this book.
- In a practical application, however, we must often make a specific prediction for the value of t, or more generally take a specific action based on our understanding of the values t is likely to take, and this aspect is the subject of decision theory.
2. An example
Problem Description:
Consider, for example, a medical diagnosis problem in which we have taken an X-ray image of a patient, and we wish to determine whether the patient has cancer or not.
- Representation: choose
t to be a binary variable such that t=0 corresponds to class C1 and t=1 corresponds to class C2 .- Inference Step: The general inference problem then involves determining the joint distribution p(x,Ck) , or equivalently p(x,t) , which gives us the most complete probabilistic description of the situation.
- Decision Step: In the end we must decide either to give treatment to the patient or not, and we would like this choice to be optimal in some appropriate sense. This is the decision step, and it is the subject of decision theory to tell us how to make optimal decisions given the appropriate probabilities.
How to predict?
Using Bayes’ theorem, these probabilities can be expressed in the form
Posteriorp(Ck∣x)=Likelihood⋅PriorEvidence⟺=p(x∣Ck)p(Ck)p(x)=p(x∣Ck)p(Ck)∑2j=1p(x∣Cj)p(Cj)=p(x∣Ck)p(Ck)p(x∣C1)p(C1)+p(x∣C2)p(C2)
If our aim is to minimize the chance of assigning x to the wrong classCk,k=1,2 , then intuitively we would choose the class having the higher posterior probability. We now show that this intuition is correct, and we also discuss more general criteria for making decisions.Our objectives vary among those:
- Minimizing the misclassification rate;
- Minimizing the expected loss;补充:Criteria for making decisions【Ref -1】
1) Minimizing the misclassification rate.
2) minimizing the expected loss: 两类错误的后果可能是不同的,例如“把癌症诊断为无癌症”的后果比“把无癌症诊断为癌症”的后果更严重,又如“把正常邮件诊断为垃圾邮件”的后果比“把垃圾邮件诊断为正常邮件”的后果更严重;这时候,少犯前一错误比少犯后一错误更有意义。为此需要 loss function 对不同的错误的代价做量化。
设集合 A={