Week3_1Logistic Regression
第 1 题
Suppose that you have trained a logistic regression classifier, and it outputs on a new example x a prediction hθ(x) = 0.4. This means (check all that apply):
- Our estimate for P(y=0|x;θ) is 0.4.
- Our estimate for P(y=1|x;θ) is 0.6.
- Our estimate for P(y=0|x;θ) is 0.6.
- Our estimate for
P(y=1|x;θ)
is 0.4.
* 答案: 3 4 *
解析: hθ(x) will give us the probability that our output is 1. 0.4是y=1时的概率
第 2 题
Suppose you have the following training set, and fit a logistic regression classifier
hθ(x)=g(θ0+θ1x1+θ2x2)
Which of the following are true? Check all that apply.
- Adding polynomial features (e.g., instead using hθ(x)=g(θ0+θ1x1+θ2x2+θ3x21+θ4x1x2+θ5x22) ) could increase how well we can fit the training data.
- At the optimal value of θ (e.g., found by fminunc), we will have J(θ)≥0.
- Adding polynomial features (e.g., instead using hθ(x)=g(θ0+θ1x1+θ2x2+θ3x21+θ4x1x2+θ5x22) ) would increase J(θ) because we are now summing over more terms.
- If we train gradient descent for enough iterations, for some examples x(i) in the training set it is possible to obtain
hθ(x(i))>1
.
* 答案: 1 2 *
* 当有一个feature时是一条直线,当有两个feature时一条曲线,有更多的feature时是一条弯七弯八的曲线 *
* 当feature越来越多时,曲线越来越拟合,即损失函数越来越小 *
* 选项1: 当增加feature时,拟合的更好. 正确 **
* 选项2: 找到最佳的 θ , J(θ) 有可能为0,但一般情况下会大于0. 正确 **
* 选项3: 跟1正好相反. 不正确 **
* 选项4: 0<hθ(x(i))<1 的0到1之间不可能大于1. 不正确 **
第 3 题
第 3 个问题
For logistic regression, the gradient is given by
∂∂θjJ(θ)=1m∑mi=1(hθ(x(i))−y(i))x(i)j
. Which of these is a correct gradient descent update for logistic regression with a learning rate of α? Check all that apply.
- θ:=θ−α1m∑mi=1(θTx−y(i))x(i)
- θj:=θj−α1m∑mi=1(11+e−θTx(i)−y(i))x(i)j (simultaneously update for all j ).
θj:=θj−α1m∑mi=1(hθ(x(i))−y(i))x(i) (simultaneously update for all j ).θj:=θj−α1m∑mi=1(hθ(x(i))−y(i))x(i)j (simultaneously update for all j ).
* 答案: 2 4 *
* 选项1:
* 选项2: 正确 **
* 选项3: 与4的区别是
x(i)
与
x(i)j
,不明白的话需要看一下推导过程. 不正确 **
* 选项4: 正确 **
第 4 题
Which of the following statements are true? Check all that apply.
- For logistic regression, sometimes gradient descent will converge to a local minimum (and fail to find the global minimum). This is the reason we prefer more advanced optimization algorithms such as fminunc (conjugate gradient/BFGS/L-BFGS/etc).
- The sigmoid function g(z)=11+e−z is never greater than one (>1).
- The cost function J(θ) for logistic regression trained with m≥1 examples is always greater than or equal to zero.
- Linear regression always works well for classification if you classify by using a threshold on the prediction made by linear regression.
* 答案: 2 3 *
* 选项1: 梯度下降法是能找到全局最小值的,因为损失函数是一个凸函数.用更高级的算法的目的是”no need to pick α ”,同时更快速的找到全局最小值 **
* 选项2: sigmoid函数的取值范围是(0,1). 正确 **
* 选项3: costFunction 大于等于0. 正确 **
* 选项4: 分类问题,要么0, 要么1, 没有什么threshold一说 **
第 5 题
第 5 个问题
Suppose you train a logistic classifier
hθ(x)=g(θ0+θ1x1+θ2x2)
. Suppose θ0=6,θ1=−1,θ2=0. Which of the following figures represents the decision boundary found by your classifier?
* 答案: 刷了几次没有选项,只有题干,随便蒙了一个竟然对了 *