12.4Homework#4_Deep Learning

这是我的Advanced data science and architecture的一次作业,也恰巧是CNN人脸识别项目的一部分。偷懒使用csdn的markdown来编辑,索性直接贴出来吧。代码没有贴出来。

Option B: Use Deep Learning for analysis of your project data.

Part A - Deep Learning model (40 points)
  • For this project, we applied Convolutional Neural Network to make machine recognition W’s face and distinguish from other people’s.
  • The data included contains 11,500 face pictures of W capturesd and processed using Python library Dlib and OpenCV. The noise pictures are mainly contributed by UMass Amherst and University of Science and Technology of China.
  • The method is Convoluntional Neural Network realized with Tensorflow-gpu which is developed by Google.
  • The best accuracy so far is beyond 96%.

For the adjustment below, if not specially notified, we use a basic model of conditions that: RELU for activation function, softmax_cross_entropy for loss function, 20 epochs, Adam for gradient estimation optimizer,random_normal to initialize the parameters. And the basic architecture is three pair of convolution(filter 3x3, stride is [1,1,1,1]) and pooling layers (max_pooling, batch is 2x2 and stride is [1,2,2,1]), followed by two full connection layers, among which the output of last layer is to classify yes and no(if W’s face or not)

Part B - Activation function (10 points)

This part is to show how activation affect the accuracy and training time (time to plateaus). This part contains a accuracy table and accuracy as shown below.

Activation FunctionAccuracy(%)
ReLU96.20
ELU95.05
TanH52.00
Sigmoid51.60
Softplus51.40

And the plots.

accuracy_along_with_activation_function

Accuracy: From the plot and table above, it is found that RELU brings highest accuracy, 96.20%, but is similar with that of ELU activation Function. At the same time, TanH, Sigmoid, and Softplus function are not suitable for our work because the accuracy is similar with that of naive rule(50%).
Plateauing time: We can find RELU plateaus very fast.

Part C - Cost function (10 points)

For this part we change the loss function as follows.

Loss FunctionAccuracy(%)
softmax_cross_entropy95.35
cosine_distance48.00
hinge93.65
sigmoid_cross_entropy94.55
mean_squared_error90.95

accuracy_along_with_loss_functions
Accuracy:
From the table and plot above, we can conclude that:
- Except cosine distance, the other loss functions creates high accuracy, among which softmax_cross_entropy and sigmoid_cross_entropy creats highest accuracy under given condition.
Plateauing time:
From the plots above, we can find except cosine distance, the networks with the left loss functions all have the trend of plateauing. And sigmoid_cross_entropy uses relatively lee time to plateau.

Part D - Epochs (10 points)

For this part we have two main checkpoints. Here we pick epoch numbers as 1,3,5,10,20,50 (batch size is 200) and generates accuracy table and plot as below.

Number of epochAccuracy(%)
185.00
393.00
591.80
1092.80
2093.00
5096.76
10096.12

Accuracy along with epoches(batch size is 200)

Accuracy:
From the table and plot above, we can conclude that:
- More epochs brings higher accuracy
- At the early part, each epoch brings large improvement of accuracy, while in the latter part of training, each can bring relatively smaller improvement of accuracy.
Plateauing time:
From the exact data for each epoch, we apply early stopping. When can find that if we stop when accuracy improvement by one epoch is less than 0.1%, this network plateaus at epoch #31.

Part E - Gradient estimation (10 points)

For this part, we tried different method as gradient estimation optimizer, i.e. Adam, Momentum, .The accuracy table and plots are shown below.

OptimizerAccuracy(%)
Adadelta59.00
Adam94.35
GradientDescent56.00
Adagrad59.00
RMSProp89.45
Momentum49.00

accuracy_along_with_gradient_method

Accuracy:
From the results above, we can find Adam(Adaptive Moment Estimation) and RMSprop can bring the highest accuracy, 0.9435 and 89.45 under learning rate of 0.01. While Momentum, Adagrad, GradientDescent and Adadelta can not leads to low accuracy.
Plateauing time:
Adam is fastest, and RMSprop is the second, while Momentum, Adagrad, GradientDescent and Adadelta plots are horizontal line.
Adam_different_learning_rate_plot

Also, even for the Adam method, if the learning rate is a bit high, say 0.08 , it will also not plateau under given number epochs(20). And generally smaller learning rate get better accuracy, say both learning rates of 0.005 generate very high accuracy(98.35%).
Part F - Network Architecture (10 points)¶
On your Deep Learning model data
Change the network architecture. How does it effect the accuracy?
How does it effect how quickly the network plateaus?

Part F - Network Architecture (10 points)

We have three pair of layers of convolution and pooling. Here we keep the stride of filters is [1,1,1,1] and of pooling is [1,2,2,1]. To change the architecture of this network, here we change the size for the kernels/filters(1st plot) and the channels of full connection layers(2nd plot).

filter size(50 epoch)Accuracy(%)
architecture_77786.10
architecture_55590.96
architecture_35795.28
architecture_35592.54
architecture_33397.72
architecture_22295.18

change filter sizes

full connection channelAccuracy(%)
25690.30
51296.80
102496.55
204894.30

change full connection channel
Accuracy:
For this part, we can find filter size of 3x3 for each layer and 3x3,5x5,7x7 for the corresponding layer give best accuracy.
And under the condition of filter size is 3x3, full connection layer with 512 and 1024 give the highest accuracy.
Plateauing time:
This part is similar, filter size of 3x3 for each layer and 3x3,5x5,7x7 for the corresponding layer plateau fastest and under the condition of filter size is 3x3, full connection layer with 512 uses least time to plateau.

Part G - Network initialization (10 points)

For this part we have two main checkpoints. Here we pick initialization methods of zeros, random_uniform, random_gaussian/normal (and with different standard deviations). The below is the accuracy number and plateauing plots.

initializationAccuracy(%)
random_uniform54.00
zeros49.00
random_gamma57.4
random_normal_0.0195.7
random_normal_0.01585.80
random_normal_0.00892.85

accuracy along with different initialization method
Accuracy:
From the results above, we can find initialization of random_normal(Gaussian) brings the highest accuracy, over 90% given 20 epochs. While the left leads to relatively low accuracy given 20 epochs.
Plateauing time:
Among these method, Gaussian is the only method that can plateau with the given number of epochs, the left may plateaus if given more epochs. And we can find stand deviation around 0.01 is very good for this network.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值