The cost function J(θ) is guaranteed to be convex for logistic regression.
Adding polynomial features (e.g., instead using
hθ(x)=g(θ0+θ1x1+θ2x2+θ3x21+θ4x1x2+θ5x22)
) could increase how well we can fit the training data
Adding new features can only improve the fit on the training set: since setting
θ3=θ4=θ5=0
makes the hypothesis the same as the original one, gradient descent will use those features (by making the corresponding
θj
non-zero) only if doing so improves the training set fit.
For logistic regression, the gradient is given by ∂∂θjJ(θ)=∑mi=1(hθ(x(i))−y(i))x(i)j . Which vectorized form
|
Regularized logistic regression and regularized linear regression are both convex, and thus gradient descent will still converge to the global minimum.
θ:=θ−α1m∑mi=1(
θ:=θ−α1m∑mi=1(hθ(x(i))−y(i))x(i)
.hθ(x(i))−y(i))x(i).
2. If a neural network is overfitting the data, one solution would be to increase the regularization parameter
λ
.
A larger value of
λ
will shrink the magnitude of the parameters
Θ
, thereby reducing the chance of overfitting the data.