吴恩达机器学习Course2——高级学习算法
Week 1
为什么要使用neural network
The network demonstrated the ability of neural networks to handle complex decisions by dividing the decisions between multiple units.
该网络展示了神经网络通过在多个单元之间划分决策来处理复杂决策的能力。
神经元激活的表达式
tensorflow
构建神经网络判断咖啡是不是好咖啡
增加训练集的数据规模以减少训练轮数
为什么在最后一层应用sigmoid激活函数不认为是最好的操作?
如何判断每一层的参数数量
batch and epoch
Units in the first layer
图中的阴影部分展示了每个单元负责不同的“bad roast”区域
the second layer
对多个神经元的计算进行向量化
Tensorflow and Keras
Tensorflow is a machine learning package developed by Google. In 2019, Google integrated Keras into Tensorflow and released Tensorflow 2.0. Keras is a framework developed independently by François Chollet that creates a simple, layer-centric interface to Tensorflow. This course will be using the Keras interface.
Element-Wise operation
Week 2
Train a neural network in TensorFlow
- epoch: number of steps in gradient decent
code snippet: 代码片段
binary cross entropy
- binary reemphasize it is a bianary classification question——手写数字识别(0/1)
指定loss function
back propagation
- tensorflow通过
modle.fit(X,y,iteration = 100)
进行back propagation
ReLU vs sigmoid
- 以awareness为例,引出新的激活函数——ReLU(rectified linear union)
三个激活函数
How to choose activation function for Output Layer
How to choose activation function for hidden layer
multiclass classification
Logistic Regression vs Softmax Regression
Compare Cost function between logistic regression and softmax regression
legible : 易读的
more numerically accurate implementation of softmax
classfication with multile outputs
two solutions to multi-lable classification
SparseCategoricalCrossentropy or CategoricalCrossEntropy
softmax 和 sigmoid 与 Relu的区别
- Recognized that unlike ReLU and Sigmoid, the softmax spans multiple outputs.
adam algorithm
convolutional layer
What is computation graph
gradient decent in computation graph
why we tell that backprop is an efficient way
Cost 随着 epoch的增加而下降
Week 3
what is diagnostic
training set and test set
training set & cross validation set & test set
略
variance and bias
J_cv 和 J_train的关系展示出bias和variance
lamda 和 variance/bias 的关系
learning curve
- 在 bias的情况下,增加training set size 不会带来性能的提升,即error的下降
- 在high variance的情况下,增加training set 可以带来性能的提升
debuging a learning algorithm
more features——too much flexibility to fit very complicated model
neural network and bias variance
neural network regulazation
iterative loop of ML development
error analysis
data augmentation
data centric approach
transfer learning
full cycle of the machine learning project
deployment
skewed dataset
precision and recall
F1 score
Week 4
Entropy as a measure of entropy
有点像logistic 回归的损失函数
the reduction of entropy is called information gain
The steps of building a decision tree
recursive splitting
How to deal with one feature with multiple values
通过 one-hot encoding 进行处理,所有的这些数据可以作为input 提供给神经网络、线性回归或者是逻辑回归
Splitting on a continuous variable
regression tree
和一般的决策树相似,不过information gain 变成了 reduction in variance
decision tree ensemble
单个的决策树对于数据会比较敏感,所以为了让算法less sensetive more rebust,我们会构建一个tree ensemble
sampling with replacement
random forest algorithm
XG boost(Extreme Gradient Boost)
- it use a classic thought: diliberate practice