Logistic Regression
Suppose you have the following training set, and fit a logistic regression classifier hθ(x)=g(θ0+θ1x1+θ2x2).
Which of the following are true? Check all that apply.
- Adding polynomial features (e.g., instead using hθ(x)=g(θ0+θ1x1+θ2x2+θ3x12+θ4x1x2+θ5x22) ) could increase how well we can fit the training data. T
- At the optimal value of θ (e.g., found by fminunc), we will have J(θ)≥0. T
The cost functionJ(θ) is always non-negative for logistic regression.
- The positive and negative examples cannot be separated using a straight line. So, gradient descent will fail to converge. F
While it is true they cannot be separated, gradient descent will still converge to the optimal fit. Some examples will remain misclassified at the optimum.
- J(θ) will be a convex function, so gradient descent should converge to the global minimum. T
The cost function for logistic regression is convex, so gradient descent will always converge to the global minimum.
The cost function J(θ) for logistic regression trained with m≥1 examples is always greater than or equal to zero. T
For logistic regression, sometimes gradient descent will converge to a local minimum (and fail to find the global minimum). This is the reason we prefer more advanced optimization algorithms such as fminunc (conjugate gradient/BFGS/L-BFGS/etc). F
The cost function for logistic regression is convex, so gradient descent will always converge to the global minimum. We still might use a more advanded optimization algorithm since they can be faster and don't require you to select a learning rate.