Q1. What Linear Regression training algorithm can you use if you have a training set with millions of features?
A1:Stochastic Gradient Descent or Mini-Batch Gradient Descent.
Q2. Suppose the features in your training set have very different scales. What algorithms might suffer from this, and how? What can you do about it?
A2: The normal Function will work well. We can scale the training set before training the model, use StandardScaler class.
Q3. Can Gradient Descent get stuck in a local minimum when training a Logistic Rregression model?
A3: Logistic Rregression's cost function is convex, so it cannot get stuck in a local minimum.
Q4. Do all Gradient Descent algorithms lead to the same model provided you let them run long enough?
A4: Theoratically, if the optimization problem is convex, all Gradient Descent algorithms will approach the global optimum and end up producing fairly similiar models. But unless you gradually reduce the learning rate, Stochastic GD and Mini-Batch GD will never truly converge, because they will keep jumping back and forth around the global optimum without reaching it, even if you let them run long enough. So they will end up producing slightly different.
Q5. Suppose you use Batch Gradient Descent and you plot the validation error at every epoch. If you notice that the validation error consistently goes up, what is likely going on? How can you fix this?
A5: The learning rate is too high, and the model is diverging. Or you have overfitting the training set. We need to check the training error, if training error continues to go down, it's overfitting, we need to plus regularization or change to a more simple model; if training error also goes up, that means the learning rate is too high, we need to reduce the learninig rate.
Q6. Is it a good idea to stop Mini-Batch Gradient Descent immediately when the validation error goes up?
A6: No, because it's too early, the Mini-Batch GD is not guaranteed to make progress every epoch. We can save the model at regular intervals, and when it has not improved for a really long time, then we can stop it can come back the best sav