>- **🍨 本文为[🔗365天深度学习训练营](https://mp.weixin.qq.com/s/rbOOmire8OocQ90QM78DRA) 中的学习记录博客**
>- **🍖 原作者:[K同学啊 | 接辅导、项目定制](https://mtyjkh.blog.csdn.net/)**
我的环境:
- 系统环境:Ubuntu22.04
- 语言环境:Python3.9.18
- 编译器:vscode+jupyter notebook
- 深度学习环境:TensorFlow2.15.0
- 显卡:NVIDIA GeForce RTX 2080
这周搞了一张N卡2080,装系统配环境浪费了很多时间。
感觉T后续没什么特别的,我就把T系列全部过一遍吧
T7咖啡豆识别
🍺 要求:
- 自己搭建VGG-16网络框架(完成)
- 调用官方的VGG-16网络框架(完成)
🍻 拔高(可选):
- 验证集准确率达到100%
- 使用PPT画出VGG-16算法框架图(发论文需要这项技能)
🔎 探索(难度有点大)
- 在不影响准确率的前提下轻量化模型
○ 目前VGG16的Total params是134,276,932
代码很简单
但验证集准确率一直在0.98-0.99,一直做不到1:
Epoch 43/100
30/30 [==============================] - ETA: 0s - loss: 1.3314e-05 - accuracy: 1.0000
Epoch 43: val_accuracy did not improve from 0.99167
30/30 [==============================] - 8s 275ms/step - loss: 1.3314e-05 - accuracy: 1.0000 - val_loss: 0.1212 - val_accuracy: 0.9875
Epoch 44/100
30/30 [==============================] - ETA: 0s - loss: 1.0177e-05 - accuracy: 1.0000
Epoch 44: val_accuracy did not improve from 0.99167
30/30 [==============================] - 8s 268ms/step - loss: 1.0177e-05 - accuracy: 1.0000 - val_loss: 0.1221 - val_accuracy: 0.9875
Epoch 45/100
30/30 [==============================] - ETA: 0s - loss: 7.2904e-06 - accuracy: 1.0000
Epoch 45: val_accuracy did not improve from 0.99167
30/30 [==============================] - 8s 272ms/step - loss: 7.2904e-06 - accuracy: 1.0000 - val_loss: 0.1292 - val_accuracy: 0.9875
Epoch 45: early stopping
不知道怎么办
轻量化模型我没搞过,只是知道可以用剪枝,通道重排,知识蒸馏等方法做
以后再说吧
T8猫狗识别
🍺 要求:
了解model.train_on_batch()并运用(完成)
了解tqdm,并使用tqdm实现可视化进度条(完成)
🍻 拔高(可选):
本文代码中存在一个严重的BUG,请找出它并配以文字说明
🔎 探索(难度有点大)
修改代码,处理BUG
本次任务使用了新的模型训练的代码:
model.train_on_batch(image,label)
相对于之间的训练函数model.fit(),他一次只进行一个周期的训练。
优点是可以每个周期结束后按照自己的想法做出调整
BUG我没找到
T9猫狗识别2
要求:
- 找到并处理第8周的程序问题(本文给出了答案)
🍻 拔高(可选):
- 请尝试增加数据增强部分内容以提高准确率
- 可以使用哪些方式进行数据增强?(下一周给出了答案)
🔎 探索(难度有点大)
- 本文中的代码存在较大赘余,请对代码进行精简
出现了两个报错:
第一个是文件夹的名字,代码里写的是:“365-9”实际上是“365-7”
第二个:
"name": "ResourceExhaustedError",
"message": "Graph execution error:
Detected at node Adam/StatefulPartitionedCall_26 defined at (most recent call last):
File \"/home/wjh/anaconda3/envs/tensorflow/lib/python3.9/runpy.py\", line 197, in _run_module_as_main
File \"/home/wjh/anaconda3/envs/tensorflow/lib/python3.9/runpy.py\", line 87, in _run_code
File \"/home/wjh/anaconda3/envs/tensorflow/lib/python3.9/site-packages/ipykernel_launcher.py\", line 17, in <module>
File \"/home/wjh/anaconda3/envs/tensorflow/lib/python3.9/site-packages/traitlets/config/application.py\", line 992, in launch_instance
File \"/home/wjh/anaconda3/envs/tensorflow/lib/python3.9/site-packages/ipykernel/kernelapp.py\", line 701, in start
File \"/home/wjh/anaconda3/envs/tensorflow/lib/python3.9/site-packages/tornado/platform/asyncio.py\", line 195, in start
File \"/home/wjh/anaconda3/envs/tensorflow/lib/python3.9/asyncio/base_events.py\", line 601, in run_forever
File \"/home/wjh/anaconda3/envs/tensorflow/lib/python3.9/asyncio/base_events.py\", line 1905, in _run_once
File \"/home/wjh/anaconda3/envs/tensorflow/lib/python3.9/asyncio/events.py\", line 80, in _run
File \"/home/wjh/anaconda3/envs/tensorflow/lib/python3.9/site-packages/ipykernel/kernelbase.py\", line 534, in dispatch_queue
File \"/home/wjh/anaconda3/envs/tensorflow/lib/python3.9/site-packages/ipykernel/kernelbase.py\", line 523, in process_one
File \"/home/wjh/anaconda3/envs/tensorflow/lib/python3.9/site-packages/ipykernel/kernelbase.py\", line 429, in dispatch_shell
File \"/home/wjh/anaconda3/envs/tensorflow/lib/python3.9/site-packages/ipykernel/kernelbase.py\", line 767, in execute_request
File \"/home/wjh/anaconda3/envs/tensorflow/lib/python3.9/site-packages/ipykernel/ipkernel.py\", line 429, in do_execute
File \"/home/wjh/anaconda3/envs/tensorflow/lib/python3.9/site-packages/ipykernel/zmqshell.py\", line 549, in run_cell
File \"/home/wjh/anaconda3/envs/tensorflow/lib/python3.9/site-packages/IPython/core/interactiveshell.py\", line 3048, in run_cell
File \"/home/wjh/anaconda3/envs/tensorflow/lib/python3.9/site-packages/IPython/core/interactiveshell.py\", line 3103, in _run_cell
File \"/home/wjh/anaconda3/envs/tensorflow/lib/python3.9/site-packages/IPython/core/async_helpers.py\", line 129, in _pseudo_sync_runner
File \"/home/wjh/anaconda3/envs/tensorflow/lib/python3.9/site-packages/IPython/core/interactiveshell.py\", line 3308, in run_cell_async
File \"/home/wjh/anaconda3/envs/tensorflow/lib/python3.9/site-packages/IPython/core/interactiveshell.py\", line 3490, in run_ast_nodes
File \"/home/wjh/anaconda3/envs/tensorflow/lib/python3.9/site-packages/IPython/core/interactiveshell.py\", line 3550, in run_code
File \"/tmp/ipykernel_148167/135579695.py\", line 37, in <module>
File \"/home/wjh/anaconda3/envs/tensorflow/lib/python3.9/site-packages/keras/src/engine/training.py\", line 2787, in train_on_batch
File \"/home/wjh/anaconda3/envs/tensorflow/lib/python3.9/site-packages/keras/src/engine/training.py\", line 1401, in train_function
File \"/home/wjh/anaconda3/envs/tensorflow/lib/python3.9/site-packages/keras/src/engine/training.py\", line 1384, in step_function
File \"/home/wjh/anaconda3/envs/tensorflow/lib/python3.9/site-packages/keras/src/engine/training.py\", line 1373, in run_step
File \"/home/wjh/anaconda3/envs/tensorflow/lib/python3.9/site-packages/keras/src/engine/training.py\", line 1154, in train_step
File \"/home/wjh/anaconda3/envs/tensorflow/lib/python3.9/site-packages/keras/src/optimizers/optimizer.py\", line 544, in minimize
File \"/home/wjh/anaconda3/envs/tensorflow/lib/python3.9/site-packages/keras/src/optimizers/optimizer.py\", line 1223, in apply_gradients
File \"/home/wjh/anaconda3/envs/tensorflow/lib/python3.9/site-packages/keras/src/optimizers/optimizer.py\", line 652, in apply_gradients
File \"/home/wjh/anaconda3/envs/tensorflow/lib/python3.9/site-packages/keras/src/optimizers/optimizer.py\", line 1253, in _internal_apply_gradients
File \"/home/wjh/anaconda3/envs/tensorflow/lib/python3.9/site-packages/keras/src/optimizers/optimizer.py\", line 1345, in _distributed_apply_gradients_fn
File \"/home/wjh/anaconda3/envs/tensorflow/lib/python3.9/site-packages/keras/src/optimizers/optimizer.py\", line 1340, in apply_grad_to_update_var
Out of memory while trying to allocate 822083716 bytes.
BufferAssignment OOM Debugging.
BufferAssignment stats:
parameter allocation: 1.53GiB
constant allocation: 8B
maybe_live_out allocation: 1.15GiB
preallocated temp allocation: 784.00MiB
preallocated temp fragmentation: 124B (0.00%)
total allocation: 2.30GiB
Peak buffers:
\tBuffer 1:
\t\tSize: 392.00MiB
\t\tXLA Label: fusion
\t\tShape: f32[25088,4096]
\t\t==========================
\tBuffer 2:
\t\tSize: 392.00MiB
\t\tXLA Label: fusion
\t\tShape: f32[25088,4096]
\t\t==========================
\tBuffer 3:
\t\tSize: 392.00MiB
\t\tOperator: op_name=\"XLA_Args\"
\t\tEntry Parameter Subshape: f32[25088,4096]
\t\t==========================
\tBuffer 4:
\t\tSize: 392.00MiB
\t\tOperator: op_name=\"XLA_Args\"
\t\tEntry Parameter Subshape: f32[25088,4096]
\t\t==========================
\tBuffer 5:
\t\tSize: 392.00MiB
\t\tOperator: op_name=\"XLA_Args\"
\t\tEntry Parameter Subshape: f32[25088,4096]
\t\t==========================
\tBuffer 6:
\t\tSize: 392.00MiB
\t\tOperator: op_name=\"XLA_Args\"
\t\tEntry Parameter Subshape: f32[25088,4096]
\t\t==========================
\tBuffer 7:
\t\tSize: 24B
\t\tOperator: op_type=\"AssignSubVariableOp\" op_name=\"AssignSubVariableOp\" source_file=\"/home/wjh/anaconda3/envs/tensorflow/lib/python3.9/site-packages/tensorflow/python/framework/ops.py\" source_line=1160
\t\tXLA Label: fusion
\t\tShape: (f32[25088,4096], f32[25088,4096], f32[25088,4096])
\t\t==========================
\tBuffer 8:
\t\tSize: 8B
\t\tOperator: op_name=\"XLA_Args\"
\t\tEntry Parameter Subshape: s64[]
\t\t==========================
\tBuffer 9:
\t\tSize: 4B
\t\tOperator: op_type=\"Pow\" op_name=\"Pow_1\" source_file=\"/home/wjh/anaconda3/envs/tensorflow/lib/python3.9/site-packages/tensorflow/python/framework/ops.py\" source_line=1160 deduplicated_name=\"fusion.4\"
\t\tXLA Label: fusion
\t\tShape: f32[]
\t\t==========================
\tBuffer 10:
\t\tSize: 4B
\t\tOperator: op_type=\"Pow\" op_name=\"Pow\" source_file=\"/home/wjh/anaconda3/envs/tensorflow/lib/python3.9/site-packages/tensorflow/python/framework/ops.py\" source_line=1160 deduplicated_name=\"fusion.4\"
\t\tXLA Label: fusion
\t\tShape: f32[]
\t\t==========================
\tBuffer 11:
\t\tSize: 4B
\t\tOperator: op_type=\"Pow\" op_name=\"Pow_1\" source_file=\"/home/wjh/anaconda3/envs/tensorflow/lib/python3.9/site-packages/tensorflow/python/framework/ops.py\" source_line=1160
\t\tXLA Label: constant
\t\tShape: f32[]
\t\t==========================
\tBuffer 12:
\t\tSize: 4B
\t\tOperator: op_name=\"XLA_Args\"
\t\tEntry Parameter Subshape: f32[]
\t\t==========================
\tBuffer 13:
\t\tSize: 4B
\t\tOperator: op_type=\"Pow\" op_name=\"Pow\" source_file=\"/home/wjh/anaconda3/envs/tensorflow/lib/python3.9/site-packages/tensorflow/python/framework/ops.py\" source_line=1160
\t\tXLA Label: constant
\t\tShape: f32[]
\t\t==========================
\t [[{{node Adam/StatefulPartitionedCall_26}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.
[Op:__inference_train_function_3565]",
"stack": "---------------------------------------------------------------------------
ResourceExhaustedError Traceback (most recent call last)
Cell In[12], line 37
30 \"\"\"
31 训练模型,简单理解train_on_batch就是:它是比model.fit()更高级的一个用法
32
33 想详细了解 train_on_batch 的同学,
34 可以看看我的这篇文章:https://www.yuque.com/mingtian-fkmxf/hv4lcq/ztt4gy
35 \"\"\"
36 # 这里生成的是每一个batch的acc与loss
---> 37 history = model.train_on_batch(image,label)
39 train_loss.append(history[0])
40 train_accuracy.append(history[1])
File ~/anaconda3/envs/tensorflow/lib/python3.9/site-packages/keras/src/engine/training.py:2787, in Model.train_on_batch(self, x, y, sample_weight, class_weight, reset_metrics, return_dict)
2783 iterator = data_adapter.single_batch_iterator(
2784 self.distribute_strategy, x, y, sample_weight, class_weight
2785 )
2786 self.train_function = self.make_train_function()
-> 2787 logs = self.train_function(iterator)
2789 logs = tf_utils.sync_to_numpy_or_python_type(logs)
2790 if return_dict:
File ~/anaconda3/envs/tensorflow/lib/python3.9/site-packages/tensorflow/python/util/traceback_utils.py:153, in filter_traceback.<locals>.error_handler(*args, **kwargs)
151 except Exception as e:
152 filtered_tb = _process_traceback_frames(e.__traceback__)
--> 153 raise e.with_traceback(filtered_tb) from None
154 finally:
155 del filtered_tb
File ~/anaconda3/envs/tensorflow/lib/python3.9/site-packages/tensorflow/python/eager/execute.py:53, in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
51 try:
52 ctx.ensure_initialized()
---> 53 tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
54 inputs, attrs, num_outputs)
55 except core._NotOkStatusException as e:
56 if name is not None:
ResourceExhaustedError: Graph execution error:
Detected at node Adam/StatefulPartitionedCall_26 defined at (most recent call last):
File \"/home/wjh/anaconda3/envs/tensorflow/lib/python3.9/runpy.py\", line 197, in _run_module_as_main
File \"/home/wjh/anaconda3/envs/tensorflow/lib/python3.9/runpy.py\", line 87, in _run_code
File \"/home/wjh/anaconda3/envs/tensorflow/lib/python3.9/site-packages/ipykernel_launcher.py\", line 17, in <module>
File \"/home/wjh/anaconda3/envs/tensorflow/lib/python3.9/site-packages/traitlets/config/application.py\", line 992, in launch_instance
File \"/home/wjh/anaconda3/envs/tensorflow/lib/python3.9/site-packages/ipykernel/kernelapp.py\", line 701, in start
File \"/home/wjh/anaconda3/envs/tensorflow/lib/python3.9/site-packages/tornado/platform/asyncio.py\", line 195, in start
File \"/home/wjh/anaconda3/envs/tensorflow/lib/python3.9/asyncio/base_events.py\", line 601, in run_forever
File \"/home/wjh/anaconda3/envs/tensorflow/lib/python3.9/asyncio/base_events.py\", line 1905, in _run_once
File \"/home/wjh/anaconda3/envs/tensorflow/lib/python3.9/asyncio/events.py\", line 80, in _run
File \"/home/wjh/anaconda3/envs/tensorflow/lib/python3.9/site-packages/ipykernel/kernelbase.py\", line 534, in dispatch_queue
File \"/home/wjh/anaconda3/envs/tensorflow/lib/python3.9/site-packages/ipykernel/kernelbase.py\", line 523, in process_one
File \"/home/wjh/anaconda3/envs/tensorflow/lib/python3.9/site-packages/ipykernel/kernelbase.py\", line 429, in dispatch_shell
File \"/home/wjh/anaconda3/envs/tensorflow/lib/python3.9/site-packages/ipykernel/kernelbase.py\", line 767, in execute_request
File \"/home/wjh/anaconda3/envs/tensorflow/lib/python3.9/site-packages/ipykernel/ipkernel.py\", line 429, in do_execute
File \"/home/wjh/anaconda3/envs/tensorflow/lib/python3.9/site-packages/ipykernel/zmqshell.py\", line 549, in run_cell
File \"/home/wjh/anaconda3/envs/tensorflow/lib/python3.9/site-packages/IPython/core/interactiveshell.py\", line 3048, in run_cell
File \"/home/wjh/anaconda3/envs/tensorflow/lib/python3.9/site-packages/IPython/core/interactiveshell.py\", line 3103, in _run_cell
File \"/home/wjh/anaconda3/envs/tensorflow/lib/python3.9/site-packages/IPython/core/async_helpers.py\", line 129, in _pseudo_sync_runner
File \"/home/wjh/anaconda3/envs/tensorflow/lib/python3.9/site-packages/IPython/core/interactiveshell.py\", line 3308, in run_cell_async
File \"/home/wjh/anaconda3/envs/tensorflow/lib/python3.9/site-packages/IPython/core/interactiveshell.py\", line 3490, in run_ast_nodes
File \"/home/wjh/anaconda3/envs/tensorflow/lib/python3.9/site-packages/IPython/core/interactiveshell.py\", line 3550, in run_code
File \"/tmp/ipykernel_148167/135579695.py\", line 37, in <module>
File \"/home/wjh/anaconda3/envs/tensorflow/lib/python3.9/site-packages/keras/src/engine/training.py\", line 2787, in train_on_batch
File \"/home/wjh/anaconda3/envs/tensorflow/lib/python3.9/site-packages/keras/src/engine/training.py\", line 1401, in train_function
File \"/home/wjh/anaconda3/envs/tensorflow/lib/python3.9/site-packages/keras/src/engine/training.py\", line 1384, in step_function
File \"/home/wjh/anaconda3/envs/tensorflow/lib/python3.9/site-packages/keras/src/engine/training.py\", line 1373, in run_step
File \"/home/wjh/anaconda3/envs/tensorflow/lib/python3.9/site-packages/keras/src/engine/training.py\", line 1154, in train_step
File \"/home/wjh/anaconda3/envs/tensorflow/lib/python3.9/site-packages/keras/src/optimizers/optimizer.py\", line 544, in minimize
File \"/home/wjh/anaconda3/envs/tensorflow/lib/python3.9/site-packages/keras/src/optimizers/optimizer.py\", line 1223, in apply_gradients
File \"/home/wjh/anaconda3/envs/tensorflow/lib/python3.9/site-packages/keras/src/optimizers/optimizer.py\", line 652, in apply_gradients
File \"/home/wjh/anaconda3/envs/tensorflow/lib/python3.9/site-packages/keras/src/optimizers/optimizer.py\", line 1253, in _internal_apply_gradients
File \"/home/wjh/anaconda3/envs/tensorflow/lib/python3.9/site-packages/keras/src/optimizers/optimizer.py\", line 1345, in _distributed_apply_gradients_fn
File \"/home/wjh/anaconda3/envs/tensorflow/lib/python3.9/site-packages/keras/src/optimizers/optimizer.py\", line 1340, in apply_grad_to_update_var
Out of memory while trying to allocate 822083716 bytes.
BufferAssignment OOM Debugging.
BufferAssignment stats:
parameter allocation: 1.53GiB
constant allocation: 8B
maybe_live_out allocation: 1.15GiB
preallocated temp allocation: 784.00MiB
preallocated temp fragmentation: 124B (0.00%)
total allocation: 2.30GiB
Peak buffers:
\tBuffer 1:
\t\tSize: 392.00MiB
\t\tXLA Label: fusion
\t\tShape: f32[25088,4096]
\t\t==========================
\tBuffer 2:
\t\tSize: 392.00MiB
\t\tXLA Label: fusion
\t\tShape: f32[25088,4096]
\t\t==========================
\tBuffer 3:
\t\tSize: 392.00MiB
\t\tOperator: op_name=\"XLA_Args\"
\t\tEntry Parameter Subshape: f32[25088,4096]
\t\t==========================
\tBuffer 4:
\t\tSize: 392.00MiB
\t\tOperator: op_name=\"XLA_Args\"
\t\tEntry Parameter Subshape: f32[25088,4096]
\t\t==========================
\tBuffer 5:
\t\tSize: 392.00MiB
\t\tOperator: op_name=\"XLA_Args\"
\t\tEntry Parameter Subshape: f32[25088,4096]
\t\t==========================
\tBuffer 6:
\t\tSize: 392.00MiB
\t\tOperator: op_name=\"XLA_Args\"
\t\tEntry Parameter Subshape: f32[25088,4096]
\t\t==========================
\tBuffer 7:
\t\tSize: 24B
\t\tOperator: op_type=\"AssignSubVariableOp\" op_name=\"AssignSubVariableOp\" source_file=\"/home/wjh/anaconda3/envs/tensorflow/lib/python3.9/site-packages/tensorflow/python/framework/ops.py\" source_line=1160
\t\tXLA Label: fusion
\t\tShape: (f32[25088,4096], f32[25088,4096], f32[25088,4096])
\t\t==========================
\tBuffer 8:
\t\tSize: 8B
\t\tOperator: op_name=\"XLA_Args\"
\t\tEntry Parameter Subshape: s64[]
\t\t==========================
\tBuffer 9:
\t\tSize: 4B
\t\tOperator: op_type=\"Pow\" op_name=\"Pow_1\" source_file=\"/home/wjh/anaconda3/envs/tensorflow/lib/python3.9/site-packages/tensorflow/python/framework/ops.py\" source_line=1160 deduplicated_name=\"fusion.4\"
\t\tXLA Label: fusion
\t\tShape: f32[]
\t\t==========================
\tBuffer 10:
\t\tSize: 4B
\t\tOperator: op_type=\"Pow\" op_name=\"Pow\" source_file=\"/home/wjh/anaconda3/envs/tensorflow/lib/python3.9/site-packages/tensorflow/python/framework/ops.py\" source_line=1160 deduplicated_name=\"fusion.4\"
\t\tXLA Label: fusion
\t\tShape: f32[]
\t\t==========================
\tBuffer 11:
\t\tSize: 4B
\t\tOperator: op_type=\"Pow\" op_name=\"Pow_1\" source_file=\"/home/wjh/anaconda3/envs/tensorflow/lib/python3.9/site-packages/tensorflow/python/framework/ops.py\" source_line=1160
\t\tXLA Label: constant
\t\tShape: f32[]
\t\t==========================
\tBuffer 12:
\t\tSize: 4B
\t\tOperator: op_name=\"XLA_Args\"
\t\tEntry Parameter Subshape: f32[]
\t\t==========================
\tBuffer 13:
\t\tSize: 4B
\t\tOperator: op_type=\"Pow\" op_name=\"Pow\" source_file=\"/home/wjh/anaconda3/envs/tensorflow/lib/python3.9/site-packages/tensorflow/python/framework/ops.py\" source_line=1160
\t\tXLA Label: constant
\t\tShape: f32[]
\t\t==========================
\t [[{{node Adam/StatefulPartitionedCall_26}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.
[Op:__inference_train_function_3565]"
}
原因是内存不足,将批处理个数降为16就正常了
T10数据增强
要求:
- 学会在代码中使用数据增强手段来提高acc
- 请探索更多的数据增强手段并记录
很基础的一些数据增强手法,没什么好说的。
T11优化器对比实验
本次调用了一个用于人脸识别的预训练模型
不过好像和自己搭建的VGG16没什么区别,验证准确率还是在0.6徘徊