Which of the following are true? Check all that apply.
J(θ) will be a convex function, so gradient descent should converge to the global minimum.
convex function is a quandratic function.
Adding polynomial features (e.g., instead using hθ(x)=g(θ0+θ1x1+θ2x2+θ3x21+θ4x1x2+θ5x22) ) could increase how well we can fit the training data.
The positive and negative examples cannot be separated using a straight line. So, gradient descent will fail to converge.
[EX] the positive and negative examples cannot be separeted using a straight line, but when using the polynomial models , the gradient descent will still effective to converge
Because the positive and negative examples cannot be separated using a straight line, linear regression will perform as well as logistic regression on this data.
[EX ]linear regression often do not work well in classification problems.
2. Which of the following statements are true? Check all that apply.
The cost function J(θ) for logistic regression trained with m≥1 examples is always greater than or equal to zero.
For logistic regression, sometimes gradient descent will converge to a local minimum (and fail to find the global minimum). This is the reason we prefer more advanced optimization algorithms such as fminunc (conjugate gradient/BFGS/L-BFGS/etc).
[] not for this reason, those three ads faster than gradient descent and you don't need to manully pick alpha.
The one-vs-all technique allows you to use logistic regression for problems in which each y(i) comes from a fixed, discrete set of values.
Since we train one classifier when there are two classes, we train two classifiers when there are three classes (and we do one-vs-all classification).
[]we train one classifier for each class
3. You are training a classification model with logistic regression. Which of the following statements are true? Check all that apply.
Adding a new feature to the model always results in equal or better performance on examples not in the training set.
Introducing regularization to the model always results in equal or better performance on the training set.
【解析】Adding
more features might result in a model that overfits the training set, and thus can lead to worse performs for examples which are not in the training set.
Adding many new features to the model makes it more likely to overfit the training set.
Introducing regularization to the model always results in equal or better performance on examples not in the training set.
【解析】If
we introduce too much regularization, we can underfit the training set and have worse performance on the training set.