【Machine Learning】【Andrew Ng】- Quiz(Week 10)

梯度下降与在线学习解析

最新推荐文章于 2020-03-09 15:43:57 发布

原创最新推荐文章于 2020-03-09 15:43:57 发布 · 1.6w 阅读

11 ·

CC 4.0 BY-SA版权

文章标签：

#机器学习

Machine Learning - Andrew Ng 专栏收录该内容

25 篇文章

订阅专栏

本文探讨了随机梯度下降、批量梯度下降及在线学习等算法的特点与应用场景，并对比了它们之间的优缺点，强调了不同算法在大规模数据集上的表现差异。

1、Suppose you are training a logistic regression classier using stochastic
gradient descent. You find that the cost (say,cost(θ,(x(i) ,y(i))), averaged over the last 500 examples), plotted as a function of the number of
iterations, is slowly increasing over time. Which of the following changes are
likely to help?
A. This is not an issue, as we expect this to occur with stochastic gradient descent.
B. Try using a larger learning rate α.
C. Try averaging the cost over a larger number of examples (say 1000 examples instead of 500) in the plot.
D. Try using a smaller learning rate α.
答案：D。
减小学习速率，从而避免发散。

2、Which of the following statements about stochastic gradient descent are true? Check all that apply.
A. Before running stochastic gradient descent, you should randomly shuffle (reorder) the training set.
B. In order to make sure stochastic gradient descent is converging, we typically compute Jtrain(θ) after each iteration (and plot it) in order to make sure that the cost function is generally decreasing.
C. One of the advantages of stochastic gradient descent is that it uses parallelization and thus runs much faster than batch gradient descent.
D. If you have a huge training set, then stochastic gradient descent may be much faster than batch gradient descent.
答案：AD
A正确
B错误，cost function并不会每次都减小，但是大趋势是减小。
C错误，随机梯度法并不能使用并行，因为每一次结果都与上一次迭代有关啊，运行速度快是因为每一次只计算了一个example，而batch梯度像个“白痴”，每次计算所有的example
D正确，这就是存在的意义

3、Which of the following statements about online learning are true? Check all
that apply.
A. Online learning algorithms are usually best suited to problems were we have a continuous/non-stop stream of data that we want to learn from.
B. Online learning algorithms are most appropriate when we have a fixed training set of size m that we want to train on.
C. One of the advantages of online learning is that if the function we’re modeling changes over time (such as if we are modeling the probability of users clicking on different URLs, and user
tastes/preferences are changing over time), the online learning algorithm will automatically adapt to these changes.
D. When using online learning, you must save every new training example you get, as you will need to reuse past examples to retrain the model even after you get new training examples in the future.
答案：AC。
A正确
B训练样本固定的时候用其他算法比较合适，收敛得更快更精确
C正确
D错误。实时学习算法主张用一个丢一个。

4、Assuming that you have a very large training set, which of the following algorithms do you think can be parallelized using map-reduce and splitting the training set across different machines? Check all that apply.
A. Logistic regression trained using stochastic gradient descent.
B. A neural network trained using batch gradient descent
C. Linear regression trained using batch gradient descent.
D. An online learning setting, where you repeatedly get a single example , and want to learn from that single example (x,y) before moving on.
答案：BC。
能并行的就是样本之间相互无关联计算，但是随机梯度法是根据上一个样本算出来的参数为初值，用当前样本继续学习的。实时学习算法一样，参数与上一个样本有关。

5、Which of the following statements about map-reduce are true? Check all that apply
A. Running map-reduce over N computers requires that we split the training set into N^2 pieces.
B. In order to parallelize a learning algorithm using map-reduce, the first step is to figure out how to express the main work done by the algorithm as computing sums of functions of training examples.
C. When using map-reduce with gradient descent, we usually use a single machine that accumulates the gradients from each of the map-reduce machines, in order to compute the parameter update for that iteration.
D. If you have just 1 computer, but your computer has multiple CPUs or multiple cores, then map-reduce might be a viable way to parallelize your learning algorithm.
答案：BCD
A错误，怎么可能分成N^2份呢，N/4啊这种还差不多。（不过为什么不可以分成N^2份呢？计算机越多不是算得越快吗？）
B 正确
C 正确啊
D 正确