1 Suppose that you have trained a logistic regression classifier, and it outputs on a new example xxx a prediction hθ(x)h_\theta(x)hθ(x) = 0.4. This means (check all that apply):
Our estimate for P(y=0∣x;θ)P(y=0|x;\theta)P(y=0∣x;θ) is 0.4.
Our estimate for P(y=0∣x;θ)P(y=0|x;\theta)P(y=0∣x;θ) is 0.6.
预测的是y=1的概率,所以选择ad
2
Suppose you have the following training set, and fit a logistic regression classifier hθ(x)=g(θ0+θ1x1+θ2x2)h_\theta(x) = g(\theta_0 + \theta_1x_1 + \theta_2 x_2)hθ(x)=g(θ0+θ1x1+θ2x2).
假设你有一个训练集,逻辑回归分类器的h,选择正确的
Which of the following are true? Check all that apply.
Adding polynomial features (e.g., instead using hθ(x)=g(θ0+θ1x1+θ2x2+θ3x12+θ4x1x2+θ5x22)h_\theta(x) = g(\theta_0 + \theta_1x_1 + \theta_2 x_2 + \theta_3 x_1^2 + \theta_4 x_1 x_2 + \theta_5 x_2^2)hθ(x)=g(θ0+θ1x1+θ2x2+θ3x12+θ4x1x2+θ5x22) ) could increase how well we can fit the training data.
增加特征数,可以更好的拟合训练集
At the optimal value of θ\thetaθ (e.g., found by fminunc), we will have J(θ)≥0J(\theta) \geq 0J(θ)≥0.
在最优值theta值的情况下,我们会得到j>=0
Adding polynomial features (e.g., instead using hθ(x)=g(θ0+θ1x1+θ2x2+θ3x12+θ4x1x2+θ5x22)h_\theta(x) = g(\theta_0 + \theta_1x_1 + \theta_2 x_2 + \theta_3 x_1^2 + \theta_4 x_1 x_2 + \theta_5 x_2^2)hθ(x)=g(θ0+θ1x1+θ2x2+θ3x12+θ4x1x2+θ5x22) ) would increase J(θ)J(\theta)J(θ) because we are now summing over more terms.
增加特征数会导致J(theta)变大,因为对更多的特征数求和,是不对的,我们可以通过设置theta的大小来控制J(theta)
If we train gradient descent for enough iterations, for some examples x(i)x^{(i)}x(i) in the training set it is possible to obtain hθ(x(i))>1h_\theta(x^{(i)}) > 1hθ(x(i))>1.
h是一个0到1的数
选择AB
3
- For logistic regression, the gradient is given by ∂∂θjJ(θ)=1m∑mi=1(hθ(x(i))−y(i))x(i)j
. Which of these is a correct gradient descent update for logistic regression with a learning rate of α? Check all that apply.
- θ:=θ−α1m∑mi=1(θTx−y(i))x(i)
- θj:=θj−α1m∑mi=1(11+e−θTx(i)−y(i))x(i)j (simultaneously update for all
4 Which of the following statements are true? Check all that apply.
The one-vs-all technique allows you to use logistic regression for problems in which each y(i)y^{(i)}y(i) comes from a fixed, discrete set of values.
For logistic regression, sometimes gradient descent will converge to a local minimum (and fail to find the global minimum). This is the reason we prefer more advanced optimization algorithms such as fminunc (conjugate gradient/BFGS/L-BFGS/etc).因为是凸函数
Since we train one classifier when there are two classes, we train two classifiers when there are three classes (and we do one-vs-all classification). 由于我们在有两个类时训练一个分类器,因此当有三个类时我们训练两个分类器(并且我们进行一对一分类)。一般都只训练一个分类器
The cost function J(θ)J(\theta)J(θ) for logistic regression trained with m≥1m \geq 1m≥1 examples is always greater than or equal to zero.
选AD