【避坑记录】小白的yolov5和face_recognition环境配置

madderscientist

已于 2023-09-29 11:14:07 修改

阅读量2.2k

点赞数 9

文章标签： python

于 2022-09-16 21:48:04 首次发布

本文链接：https://blog.csdn.net/madderscientist/article/details/126898094

版权

【注意，本文创建于2022，最近一次更新在2023。2022主要围绕环境配置，2023关注yolo具体使用】

抱着试试水的想法参加了robocup校赛，想借此试试人工智能识别。比赛给了诸多实现方案，我选择了其中需要自己搭网络的方案。不料配置环境一路坎坷，特此记录
比赛分为人脸识别和物体识别。人脸识别是face_recognition，物体识别是yolov5。出于电脑洁癖和项目本身要求，我租了个ubuntu服务器，在上面折腾，随心所欲下环境。

服务器配置：

apt-get update
apt install git
按照百度到的方法下载了anaconda

物体识别

https://github.com/ultralytics/yolov5/wiki/Train-Custom-Data
项目地址如上。跟着readme走，出了一个问题:pytorch装了，但是就是import不了。
在租服务器之前，我本来想在在线编程网站（Log In - Replit）配置好环境的，但是torch好不容易安装好，一import就是no modual。装了4次甚至把磁盘装满了。。。
百度到的解决方法都是要conda的，而在线编程网站没有conda也没有sudo，所以买服务器，装anaconda（说实话我对ananconda印象不好，那么多没用的包占我空间。我心中的的py就是方便日常的小工具，大项目还是用别的语言。不过买了服务器就可以放心随便霍霍）
然后。。。就行了。。。
顺便不得不去学了一些conda的知识

人脸识别：

https://github.com/ageitgey/face_recognition
↑face_recognition项目所在。
首先pip install cmake和boost，然后按照readme所说直接pip install face_recognition
这个折腾了好久，dlib一直安装不成功，百度都说和py版本有关。搞了两天，不知道怎么回事就用3.10的py配置好了
以下是我的做法：
先是conda create了一个3.10的py虚拟环境。
pip install face_recognition，其他都很顺利，就是dlib不行。
发现dlib一直装不好，首先怀疑py版本（用的py3.10）。于是新开虚拟空间，用py3.6安装，发现还是报错：
C++: fatal error: Killed signal terminated program cc1plus（之前也有这种报错，但是没在意也没往这想）
于是发现是运行内存不够，按照 https://blog.csdn.net/weixin_44796670/article/details/121234446 开空间给内存（插一句，服务器2G的内存，我之前跑这个都把远程连接卡掉了，开了nohup也照样卡掉，没想到运行内存可以这样拓容），跑成了，截取日志如下：

[ 14%] Building CXX object dlib_build/CMakeFiles/dlib.dir/logger/logger_config_file.cpp.o
[ 15%] Building CXX object dlib_build/CMakeFiles/dlib.dir/misc_api/misc_api_kernel_1.cpp.o
make[2]: *** wait: No child processes.  Stop.
make[2]: *** Waiting for unfinished jobs....
make[2]: *** wait: No child processes.  Stop.
make[1]: *** [CMakeFiles/Makefile2:144: dlib_build/CMakeFiles/dlib.dir/all] Error 2
make: *** [Makefile:84: all] Hangup
SIGHUP
CMake Error: Generator: execution of make failed. Make command was: /usr/bin/make -j1
... # 省略超多报错，虽然报错了但是还在挣扎
ERROR: Failed building wheel for dlib
Running setup.py clean for dlib
Failed to build dlib
Installing collected packages: dlib, Click, face-recognition
Running setup.py install for dlib: started
Running setup.py install for dlib: still running...
Running setup.py install for dlib: finished with status 'done'
DEPRECATION: dlib was installed using the legacy 'setup.py install' method, because a wheel could not be built for it. A possible replacement is to fix the wheel build issue reported above. You can find discussion regarding this at https://github.com/pypa/pip/issues/8368.
Successfully installed Click-8.0.4 dlib-19.24.0 face-recognition-1.3.0
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
# 然后就成功啦

运行期间（大概是以上日志第一次出现error的时候）按照网上的建议install了boost，不知道和成功有没有关系（也许是我之前一直忘了装导致之前一直失败？但是我没看到有指向这个的报错。。。官方文档也没说啊啊啊
然后发现。。。3.10的py环境也可以import face_recognition了。。。把3.6的虚拟环境删了也可以。。。挺秃然的

再记录一点其他的：第一次用服务器没有装anaconda，dlib装了好几次浪费了许多内存，于是重置了服务器，但是ssh连不上了。sh-keygen -R [服务器IP] 之后才能重新ssh（把之前的记录删掉）

linux的root可以同时多人登诶

2022/10/1 更新

robocup过预赛啦。更新一下用yolo训练自己的模型的过程。

mytrain.yaml

train: /root/beshar/yolov5/trainSetting/train.txt
val: /root/beshar/yolov5/trainSetting/val.txt

#number of classes
nc: 26

#class names
names: ['shampoo', 'coffee', 'laundry_detergent', 'chocolate', 'orion_friends', 'cola', 'water_glass', 'folder', 'AD_calcium_milk', 'slippers', 'fruit_knife', 'dish_soap', 'dish soap', 'prawn_crackers', 'book', 'biscuits', 'paper_napkin', 'soda', 'toilet_water', 'potato_chips', 'water', 'melon_seeds', 'pen', 'hammer', 'toothpaste', 'fan']

这是训练数据配置文件，train后面是训练集，val后面是验证集，txt文件中，一行就是一个图片的地址；names后面是标签列表，nc后面是names列表的length。txt文件的内容在后面说。

yolov5s.yaml

# YOLOv5 🚀 by Ultralytics, GPL-3.0 license

# Parameters
nc: 26  # number of classes
depth_multiple: 0.33  # model depth multiple
width_multiple: 0.50  # layer channel multiple
anchors:
  - [10,13, 16,30, 33,23]  # P3/8
  - [30,61, 62,45, 59,119]  # P4/16
  - [116,90, 156,198, 373,326]  # P5/32

# YOLOv5 v6.0 backbone
backbone:
  # [from, number, module, args]
  [[-1, 1, Conv, [64, 6, 2, 2]],  # 0-P1/2
   [-1, 1, Conv, [128, 3, 2]],  # 1-P2/4
   [-1, 3, C3, [128]],
   [-1, 1, Conv, [256, 3, 2]],  # 3-P3/8
   [-1, 6, C3, [256]],
   [-1, 1, Conv, [512, 3, 2]],  # 5-P4/16
   [-1, 9, C3, [512]],
   [-1, 1, Conv, [1024, 3, 2]],  # 7-P5/32
   [-1, 3, C3, [1024]],
   [-1, 1, SPPF, [1024, 5]],  # 9
  ]

# YOLOv5 v6.0 head
head:
  [[-1, 1, Conv, [512, 1, 1]],
   [-1, 1, nn.Upsample, [None, 2, 'nearest']],
   [[-1, 6], 1, Concat, [1]],  # cat backbone P4
   [-1, 3, C3, [512, False]],  # 13

   [-1, 1, Conv, [256, 1, 1]],
   [-1, 1, nn.Upsample, [None, 2, 'nearest']],
   [[-1, 4], 1, Concat, [1]],  # cat backbone P3
   [-1, 3, C3, [256, False]],  # 17 (P3/8-small)

   [-1, 1, Conv, [256, 3, 2]],
   [[-1, 14], 1, Concat, [1]],  # cat head P4
   [-1, 3, C3, [512, False]],  # 20 (P4/16-medium)

   [-1, 1, Conv, [512, 3, 2]],
   [[-1, 10], 1, Concat, [1]],  # cat head P5
   [-1, 3, C3, [1024, False]],  # 23 (P5/32-large)

   [[17, 20, 23], 1, Detect, [nc, anchors]],  # Detect(P3, P4, P5)
  ]

是训练的权重。可以直接从yolov5项目里已经训练好的模型里面复制一份。注意要改nc的值，和mytrain.yaml一样。

文件组织

trainSetting
 ├---images
 |    ├---1.jpg
 |    ├---2.jpg
 |    ├---……
 ├---labels
 |    ├---1.txt
 |    ├---2.txt
 |    ├---……
 ├---train.txt
 ├---val.txt
 ├---mytrain.yaml
 ├---yolov5s.yaml

“images”和“labels”两个文件夹分别存放所有图片和所有标签，文件夹名最好不要变。最好像这样组织文件，因为所有配置文件中似乎都没有提及标签的地址，应该是训练时程序自己查找的。如果改了文件夹名或者放在其他地方，很可能找不到标签。注意，标签名和对应文件名应该相同。最主要的是train.txt和val.txt的文件内容，每一行放的是图片地址，举例如下：

train.txt中：
/root/beshar/yolov5/trainSetting/images/12597.jpg
/root/beshar/yolov5/trainSetting/images/7840.jpg
/root/beshar/yolov5/trainSetting/images/11908.jpg
/root/beshar/yolov5/trainSetting/images/7975.jpg
/root/beshar/yolov5/trainSetting/images/10270.jpg
/root/beshar/yolov5/trainSetting/images/3818.jpg
/root/beshar/yolov5/trainSetting/images/9384.jpg
/root/beshar/yolov5/trainSetting/images/6641.jpg
/root/beshar/yolov5/trainSetting/images/4746.jpg
/root/beshar/yolov5/trainSetting/images/11053.jpg
…… # 省略了后面的很多图片

val.txt中：
/root/beshar/yolov5/trainSetting/images/3041.jpg
/root/beshar/yolov5/trainSetting/images/8569.jpg
/root/beshar/yolov5/trainSetting/images/6977.jpg
/root/beshar/yolov5/trainSetting/images/4770.jpg
/root/beshar/yolov5/trainSetting/images/5245.jpg
/root/beshar/yolov5/trainSetting/images/8169.jpg
…… # 省略了后面的图片地址

训练集和验证集的数目我用的比例是9:1。两个文件的制作写个python就解决了：

# 生成测试集和验证集
import os
import random
list = os.listdir('/root/beshar/yolov5/trainSetting/images')
train = open('/root/beshar/yolov5/trainSetting/train.txt','w')
val = open('/root/beshar/yolov5/trainSetting/val.txt','w')

for i in range(0,len(list)) :
    file = os.path.join('/root/beshar/yolov5/trainSetting/images',list[i])
    if random.random() < 0.1:
        val.write(file+'\n')
    else:
        train.write(file+'\n')

train.close()
val.close()

开始训练

python /root/beshar/yolov5/train.py --img 640 --batch 16（一次输入几张图片给网络） --epochs 300（训练几趟） --data {mytrain.yaml的地址} --cfg {yolov5s.yaml的地址} --weights {.pt文件的地址}

说一下最后一个“.pt文件”：如果是第一次训练，可以从yolov5的github项目上下载一个.pt；在一次训练后，会生成新的.pt文件，一般在yolov5/runs/train/exp/weights/best.pt，这个是本次训练的结果，如果要在这个基础上继续训练，应该以这个best.pt的地址为参数。某次训练我实际使用的参数如下：【注意，此处有误。见2023/9/24更新】

sysctl vm.swappiness=8
python /root/beshar/yolov5/train.py --img 640 --batch 1 --epochs 2 --data /root/beshar/yolov5/trainSetting/mytrain.yaml --cfg /root/beshar/yolov5/trainSetting/yolov5s.yaml --weights /root/beshar/yolov5/runs/train/exp7/weights/best.pt

（之所以有第一行，而且batch值是1，是因为服务器只有2G内存。经过测试，只有这一个参数直接决定用多少内存）

训练的时候可以在开头加个nohup ，在结尾加个 &，挂服务器后台训练

使用模型

import os
# besharImg是待识别的图片所在文件夹
root = '/root/beshar/besharImg'
list=os.listdir(root)
for i in range(0,len(list)):
    os.system(f"python /root/beshar/yolov5/detect.py --source {os.path.join(root,list[i])} --weights /root/beshar/yolov5/runs/train/exp2/weights/best.pt --project /root/beshar/output --conf-thres 0.5 --exist-ok")

说一下几个参数：

--source：识别的图片地址

--weights：用哪个模型

--project：自定义结果输出位置。我的服务器上默认在/root/beshar/yolov5/runs/detect

--conf-thres：最低置信度，低于这个的值的就不标在图上

--exist-ok：不要传参，写了就表示：每识别一张，结果放在已经有的文件夹下面（默认是识别一张新建一个子文件夹）

face_recognition的使用

import face_recognition
from PIL import Image, ImageDraw
import numpy as np
import os

knownbase = '/root/beshar/faces/known'
unknowbase = '/root/beshar/input'
knowlist = os.listdir(knownbase)
known_face_encodings = []
known_face_names = []

for i in knowlist:
    known_face_encodings.append(face_recognition.face_encodings(face_recognition.load_image_file(os.path.join(knownbase,i)))[0])
    known_face_names.append(i[:-4])

unknowlist = os.listdir(unknowbase)
print(unknowlist)
for i in range(0,len(unknowlist)):
    unknown_image = face_recognition.load_image_file(os.path.join(unknowbase,unknowlist[i]))
    face_locations = face_recognition.face_locations(unknown_image)
    face_encodings = face_recognition.face_encodings(unknown_image, face_locations)
    pil_image = Image.fromarray(unknown_image)
    draw = ImageDraw.Draw(pil_image)
    name = "Unknown"
    for (top, right, bottom, left), face_encoding in zip(face_locations, face_encodings):
        matches = face_recognition.compare_faces(known_face_encodings, face_encoding)
        face_distances = face_recognition.face_distance(known_face_encodings, face_encoding)
        best_match_index = np.argmin(face_distances)
        if matches[best_match_index]:
            name = known_face_names[best_match_index]
        draw.rectangle(((left, top), (right, bottom)), outline=(0, 0, 255))
        text_width, text_height = draw.textsize(name)
        draw.rectangle(((left, bottom - text_height - 10), (right, bottom)), fill=(0, 0, 255), outline=(0, 0, 255))
        draw.text((left + 6, bottom - text_height - 5), name, fill=(255, 255, 255, 255))
    del draw
    pil_image.save(f"/root/beshar/besharImg/{unknowlist[i]}")
    print(i)

为什么不详细解释face_recognition的使用呢，因为完全比不过百度的人脸识别api。其实物体识别也试过百度的easydl，但是比赛的时候掉链子了，一张图只框一个东西，类别还是错的。实际比赛用的是百度人脸识别api+yolov5。

2023/9/24 更新

又打了一次。有了上次的基础这次轻轻松松。写上面的内容时我忽略了很多地方，给这次带来了一点小麻烦。还是把未来的自己当傻子比较好，所以此次记录得更加全面。多亏了上学期做了模式识别的实验，对训练AI有了更深的认识，才发现之前训练yolo有许多不对的地方。

wsl使用

这次没有用服务器训练，用的自己电脑的wsl2。先记录wsl的基本使用（2022年总结的）：

I有关wsl（在管理员运行的Powershell里面用）：
1.查看现有子系统：
wsl -l -v
2.删子系统：
wsl --unregister 名字
其中，名字是1中结果的第一列
3.装子系统
wsl --import 自定义名字 自定义文件夹(放哪) .tar文件路径
比如
wsl --import Ubuntu-20.04 D:\Ubuntu20.04 D:\Ubuntu20.04\Ubuntu.tar
------------
综上，如果要重装ubuntu，比如robocup比赛结束，先unregister，再import，就可以得到一个全新的ubuntu。注意那个.tar不要删。
如果要装新的版本：
1.wsl --list --online
2.找到想要的版本，认准想要的版本的名字
3.wsl --install -d 名字
4.Ubuntu里面问你要用户名的时候可以直接关掉，下面是迁移出C盘，在powershell里面操作
5.wsl -l -v找要迁移的子系统的名字
6.wsl --export 名字 压缩包地址(比如D:\WSL\Ubuntu.tar，那迁移出来的系统就在.tar里面
7.删子系统，装子系统，同上。不过这次删是删掉在C盘的那个，装的时候位置自定。
为什么用win11装gazebo：因为win11的wsl2自带w(windows)s(subsystem)l(linux)g(GUI)，可以弹出linux软件的图形界面。

II有关ubuntu配置（在Unbuntu里面用）：
下午一直connection err是因为文件源是ubuntu官方，在中国连不上。建议先换源，操作如下：
找到/etc/apt/sources.list，先复制一份以防万一，然后把里面内容换成国内available的源，比如清华源：
https://mirror.tuna.tsinghua.edu.cn/help/ubuntu/
要注意不同的Ubuntu有不同的源，不同源的区别在于版本名字不同：
Ubuntu版本	版本名字
ubuntu22.04	jammy
ubuntu20.04	focal
ubuntu18.04	bionic
ubuntu16.04	xenial
ubuntu14.04	trusty
所以用清华源，一定要选对版本，然后用里面的东西替换掉source.list本来的东西。
换完内容还没完，输入sudo apt update，运行完再输入sudo apt upgrade，就好了。
其中，apt update 的作用是从/etc/apt/sources.list 文件中定义的源中获取的最新的软件包列表。即运行 apt update 并没有更新软件，而是相当于 windows 下面的检查更新，获取的是软件的状态。
而apt upgrade是下载。在中国从清华源下载比ubuntu源稳。
所以，每次换源就要重新update、upgrade。
apt可以看成应用管理，可以下载、更新、卸载应用程序或者包。只有被收录的软件才能这样下载。其他的软件可以从github上clone下来。

III有关gazebo（在Ubuntu里面用）
打开：输入gazebo，运行
关闭：ctrl+C或者关闭窗口
如果发现输入gazebo之后不弹出窗口，解决如下：
输入gazebo --verbose
如果报错，运行：killall gzserver和killall gzclient，然后再gazebo就可以打开了。

关于wsl再记录一点：装在移动硬盘上的wsl如何迁移到另一台电脑？硬盘上只有vhdx。答案是创建一个reg后缀的文件并在另一台电脑上执行。没有必要改地址里的id，按照别的帖子里说的改了反而用不了。记得改BasePath，为vhdx所在目录。

Windows Registry Editor Version 5.00

[HKEY_CURRENT_USER\SOFTWARE\Microsoft\Windows\CurrentVersion\Lxss\{b9a51057-c168-4798-a89e-483c97278f3c}]
"State"=dword:00000001
"DistributionName"="Ubuntu20.04"
"Version"=dword:00000002
"BasePath"="\\\\?\\E:\\Ubuntu20.04"
"Flags"=dword:0000000f
"DefaultUid"=dword:00000000

数据集处理

观察数据，得到训练集里所有的类别

import xml.etree.ElementTree as ET
import os
from os import listdir, getcwd
from os.path import join

classes = []  # 类别
fromFile = '/home/beshar/robocup/Annotations'

def convert_annotation(image_id):
    in_file = open(f'{fromFile}/{image_id}.xml', encoding='UTF-8')
    tree = ET.parse(in_file)
    root = tree.getroot()
    for obj in root.iter('object'):
        cls = obj.find('name').text
        if cls not in classes:
            classes.append(cls)

# xml list
img_xmls = os.listdir(fromFile)
for img_xml in img_xmls:
    label_name = img_xml.split('.')[0]
    convert_annotation(label_name)
print(classes)

得到的类别和要求的类别比较，得到以下结论：

已有的、重合的：['folder','pen','paper napkin','fan','toilet water','water','Nescafe','water glass','Oreo','Leshi potato chips','Xiaoxiaosu','Prawn Crackers','Snickers']

缺的：['Fanta','Sprite','Instant noodles']

所以去补了数据，打了标签。

XML标注转TXT

import xml.etree.ElementTree as ET
import os
from os import listdir, getcwd
from os.path import join

classes = ['folder','pen','paper napkin','fan','toilet water','water','Nescafe','water glass','Oreo','Leshi potato chips','Xiaoxiaosu','Prawn Crackers','Snickers','Fanta','Sprite','Instant noodles']

def convert(size, box):
    dw = 1. / size[0]
    dh = 1. / size[1]
    x = (box[0] + box[1]) / 2.0
    y = (box[2] + box[3]) / 2.0
    w = box[1] - box[0]
    h = box[3] - box[2]
    x = x * dw
    w = w * dw
    y = y * dh
    h = h * dh
    return (x, y, w, h)

fromFile = '/home/beshar/robocup/Annotations'
toFile = '/home/beshar/robocup/labels'
def convert_annotation(image_id):
    in_file = open(f'{fromFile}/{image_id}.xml', encoding='UTF-8')
    out_file = open(f'{toFile}/{image_id}.txt', 'w')  # 生成txt格式文件
    tree = ET.parse(in_file)
    root = tree.getroot()
    size = root.find('size')
    w = int(size.find('width').text)
    h = int(size.find('height').text)
    for obj in root.iter('object'):
        cls = obj.find('name').text
        if cls not in classes:
            continue
        cls_id = classes.index(cls)
        xmlbox = obj.find('bndbox')
        b = (float(xmlbox.find('xmin').text), float(xmlbox.find('xmax').text), float(xmlbox.find('ymin').text),
             float(xmlbox.find('ymax').text))
        bb = convert((w, h), b)
        out_file.write(str(cls_id) + " " + " ".join([str(a) for a in bb]) + '\n')

img_xmls = os.listdir(fromFile)
print(len(img_xmls))
for img_xml in img_xmls:
    label_name = img_xml.split('.')[0]
    convert_annotation(label_name)

文件结构

和2022年的一样。

训练

和2022年一样，改参数。

python /home/beshar/beshar/yolov5/train.py --img 640 --batch 16 --epochs 300 --data /home/beshar/robocup/trainSetting/mytrain.yaml --cfg /home/beshar/robocup/trainSetting/yolov5l.yaml --weights /home/beshar/robocup/trainSetting/yolov5l.pt

过滤warning，添加这行

import warnings
warnings.filterwarnings('ignore')

用默认的gpu训练，会报错，如下batch=2，训练了几个就报错。

最终推测是显存爆了。而去年我是用服务器cpu训练，内存2G，batch只能设为1。而我的GPU内存也只有2G。一试……果然这里设为1就行了。

根据做模式识别实验的经验，batch小了有利于收敛，但容易过拟合。我认为前期小batch合适，后期大batch合适。前期用gpu跑，后期用cpu跑。第一晚着急睡觉，给指令添加"--device cpu"，用batch12跑了8小时：

经过测试，32G内存适合batchsize=22,最少剩下506M内存。训练时报警内存炸了（但是不影响训练），所以换成了batch20，不过一轮还是会报3次。

按照我2022的博客继续训练，结果越训越差，才发现继续训练没那么简单。

正确的继续训练的方法

才发现要继续训练不能直接用last.pt操作，因为各种参数都会改，比如带动量的梯度下降，参数和历史迭代有关。正确的操作得

先改yolov5/utils/torch_utils.py中的start_epoch(为上次的epoch值。比如上次用的epoch 10，那么就改成10，因为结束于epoch9，下一轮就是10)
改yolov5/train.py中的resume True
改yolov5/runs/train/exp/opt.yaml的epochs值（改成目的轮数，比如20）

然后命令行要让epochs为opt.yaml设置的epochs值：

python /home/beshar/beshar/yolov5/train.py --img 640 --batch 20 --epochs 10 --weights {ptpath} --device cpu

参数不用传就能继续训练（注意epochs值要改，得比之前大）。上次比赛就是错误的继续训练的方法，但是胜在次数多。这次发现问题后一共只跑了10轮。平均一轮要11h。（后来发现img640是缩小了图片(原来是1280)，索性先提前把图片长宽减半，一轮只要8h）

分了三次，第一次训练了5轮，然后2轮，然后3轮。

训练结果：

训练前5轮的时候发现第五轮效果差了，以为刚好过拟合，于是训练完就怠惰了。后来发现效果不行，才加训，一开始加训发现效果一落千丈，才去找的正确的加训方法，有了后5轮。后5轮不是连续的，所以波折大了点。每次重新训练的第一轮效果总是差，但是后来逐步上升。

发现得太晚，预赛只能用这10轮的结果了。在测试集上效果不错，但是放到去年的赛题上，三个只能检测出一个……但是预赛竟然过了。赶紧又训练了10轮，效果还是不好。怎么会呢？明明训练结果这么好？还好摸索出了正确的检测方法，复赛拿到了好名次。见下。

检测

上次比赛因为标签遮挡导致丢了一些分。这次使用如下命令：

python /home/beshar/beshar/yolov5/detect.py --source {file_path} --weights {YOLOmodel} --project /home/beshar/robocup/output --conf-thres 0.51 --exist-ok --device cpu --save-txt --line-thickness 1

注意最后的line-thickness 1和save-txt，前者防止遮挡（把标注改小），后者在遮挡时提供被遮挡的类别的信息。

关于人脸识别，还是用了百度。这次没有分为两个py文件，而是揉到了一起。之前是先人脸框出来再物品；这次先物品，再用原始图片人脸，但是标人脸的时候标在物品的结果上。代码在最后。

有关训练结果很好但实战结果差，我猜测是实战的物体太小，而训练用的是大图（尽管img 640压缩了）。所以我决定检测的时候放大图片，果真有用。为了找到最合适的size，我用了以下代码：

import os

def list_files_in_folder(folder_path):
    files = []
    for file_name in os.listdir(folder_path):
        file_path = os.path.join(folder_path, file_name)
        if os.path.isfile(file_path):
            files.append(file_name)
    return files

# besharImg是待识别的图片所在文件夹
root = '/home/beshar/robocup/input'
model = '/home/beshar/robocup/pts/best_20.pt'
list = list_files_in_folder(root)
for i in range(0,len(list)):
    os.system(f"python /home/beshar/beshar/yolov5/detect.py --source {os.path.join(root,list[i])} --weights {model} --project /home/beshar/robocup/output --conf-thres 0.05 --iou-thres 0.2 --exist-ok --device cpu --save-txt --line-thickness 1 --img-size 1600")

from PIL import Image

# 打开五张图片并确定它们的大小
image1 = Image.open("/home/beshar/robocup/output/exp/Image1.jpg")
image2 = Image.open("/home/beshar/robocup/output/exp/Image2.jpg")
image3 = Image.open("/home/beshar/robocup/output/exp/Image3.jpg")
image4 = Image.open("/home/beshar/robocup/output/exp/Image4.jpg")
image5 = Image.open("/home/beshar/robocup/output/exp/Image5.jpg")

# 确定每张图片的宽度和高度
width, height = image1.size

# 创建一个新的画布，高度为五张图片高度的总和，宽度为一张图片的宽度
result_width = width
result_height = height * 5
result = Image.new("RGB", (result_width, result_height))

# 将五张图片粘贴到画布上
result.paste(image1, (0, 0))
result.paste(image2, (0, height))
result.paste(image3, (0, height * 2))
result.paste(image4, (0, height * 3))
result.paste(image5, (0, height * 4))

# 保存结果图像
result.save("/home/beshar/robocup/output/exp/pt.jpg")

每次检测完拼成一张图，然后比较每次的结果。最后选出来的是1600最佳。用的置信度为0.05，目的是找到能滤掉误框的置信度。由于复赛是ros机器人实拍，所以发现加augment参数更更佳。最后得到以下检测命令：

python /home/beshar/beshar/yolov5/detect.py --source {os.path.join(root,list[i])} --weights {model} --project /home/beshar/robocup/output --conf-thres 0.3 --iou-thres 0.2 --exist-ok --device cpu --save-txt --line-thickness 1 --img-size 1600 --augment

但是有一个人总是被识别成雀巢咖啡。于是根据图片内容特点（即人脸在上，物体在下），我决定yolo检测的时候只截取下半部分。最后去掉敏感信息的代码如下：

import sys
from PIL import Image, ImageEnhance, ImageOps
import os

root = '/root/prog/input'
to = '/root/prog/output'
weights = '/root/prog/best_2023.pt'

# ===== 图像处理 ===== #
def Brighten(input,output,depth):
    img = Image.open(input)
    brg = ImageEnhance.Brightness(img)
    factor = depth-1
    enhance_image = brg.enhance(factor)
    enhance_image.save(output)
    
def lashen(inf, outf, w, h):
    im = Image.open(inf)
    image=im.resize((w,h))
    image.save(outf)

# 裁掉白边
def crop_margin(img_fileobj,output_bf, padding=(25, 5, 25, 25)):
    image = Image.open(img_fileobj).convert('RGB')
    # # getbbox实际上检测的是黑边，所以要先将image对象反色 顺便二值化
    gray_image = image.convert('L')
    threshold  = 100 
    table  =  []
    for  i  in  range( 256 ):
        if  i  <  threshold:
            table.append(255)
        else :
            table.append(0)
    #  convert to binary image by the table 
    gray_image = gray_image.point(table)

    bbox = gray_image.getbbox()
    left = bbox[0] - padding[0]
    top = bbox[1] - padding[1]
    right = bbox[2] + padding[2]
    bottom = bbox[3] + padding[3]
    cropped_image = image.crop([left, top, right, bottom])
    cropped_image.save(output_bf)

# ===== 检测 ===== #
from PIL import Image, ImageFont, ImageDraw
import datetime
import json
import base64
import requests
from aip import AipFace
import os

fontAddr = '/root/prog/api/font/SimHei.ttf'
YOLOmodel = '/root/prog/best_2023.pt'
#####################人脸识别####################
access_token = "['00.a00v000a00ddc000b0000aa0000000db.0000000.0000000000.000000-00000000']"    # 换成自己的token

# 把图片变成base64
def changeToBase64(file_path):
    with open(file_path, 'rb') as f:
        img = base64.b64encode(f.read()).decode('utf-8')
        return img

# 调用detect函数，主要是用来get性别的
def baidu_face(file_path):
    request_url = "https://aip.baidubce.com/rest/2.0/face/v3/detect"
    params = {"image": changeToBase64(file_path), "image_type": "BASE64",
              "face_field": "gender", "face_type": "LIVE", "max_face_num": 10}

    request_url = request_url + "?access_token=" + access_token
    headers = {'content-Type': 'application/json'}
    response = requests.post(request_url, data=params, headers=headers)
    json = response.json()
    errorcode = json['error_code']
    if (errorcode != 0):
        print("failed to recognition")
        print(json)
        return False
    else:
        return json

def detect(file_path, out_file, imgname, onlyYolo=False):
    # 进行yolo！结果会保存到out_file
    # 先剪裁掉上面的0.4再识别
    temp_img = Image.open(file_path)
    temp_box = (0,int(temp_img.height/2.5),temp_img.width, temp_img.height)
    tempcrop_img = temp_img.crop(temp_box)
    tempcrop_img.save(f'/root/prog/{imgname}')
    os.system(f"python /root/yolov5/detect.py --source /root/prog/{imgname} --weights {YOLOmodel} --project {to} --conf-thres 0.30 --iou-thres 0.2 --exist-ok --device cpu --save-txt --line-thickness 3 --img-size 1600 --augment")
    tempcrop_img = Image.open(out_file)
    temp_img.paste(tempcrop_img, temp_box)
    temp_img.save(out_file)
    os.remove(f'/root/prog/{imgname}')
    if onlyYolo:
        return
    # 人脸识别
    print(f"start face recognition {file_path}")
    APP_ID = '00000000'
    API_KEY = 'B0gaLVulvGM5QAQYgwtbYQWY'
    SECRET_KEY = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ123456'
    client = AipFace(APP_ID, API_KEY, SECRET_KEY)

    imageType = "BASE64"
    groupIdList = "组名啊啊啊"

    image = changeToBase64(file_path=file_path)
    options = {}
    options["max_face_num"] = 5
    options["match_threshold"] = 60
    options["max_user_num"] = 5
    list = client.multiSearch(image, imageType, groupIdList, options)   # 获得人脸名字
    print(list)

    img = Image.open(out_file)      # 在yolo的结果上修改，但是用原图检测
    if list['error_msg'] == 'SUCCESS':
        face_num = list['result']['face_num']
        result = baidu_face(file_path)      # 获得人脸性别
        draw = ImageDraw.Draw(img)
        for i in range(face_num):
            x1 = int(list['result']['face_list'][i]['location']['left'])
            y1 = int(list['result']['face_list'][i]['location']['top'])
            x2 = x1 + int(list['result']['face_list'][i]['location']['width'])
            y2 = y1 + int(list['result']['face_list'][i]['location']['height'])
            print(x1, y1, x2, y2)
            if list['result']['face_list'][i]['user_list'] != []:
                if list['result']['face_list'][i]['user_list'][0]['score'] > 60:
                    draw.rectangle(((x1, y1), (x2, y2)),
                                   fill=None, outline='red', width=5)
                    fontStyle = ImageFont.truetype(fontAddr, 35, encoding="utf-8")
                    draw.text((x1-20, y1-20), list['result']['face_list'][i]
                              ['user_list'][0]['user_id'], fill="white", font=fontStyle)
                    draw.text((x2-20, y2-20), result['result']['face_list'][i]['gender']['type'], fill="white", font=fontStyle)

    img.save(out_file)

# 标记时间和组名
def Printtimeandname(k):
    font_style = ImageFont.truetype(fontAddr, 22, encoding="utf-8")
    current_time = datetime.datetime.now()
    img = Image.open(k)
    draw = ImageDraw.Draw(img)
    draw.text((1,1),'队名:undefined 9', fill=(167, 45, 176), font=font_style)
    draw.text((1,25), str(current_time), fill=(167, 45, 176), font=font_style)
    img.save(k)
    


# ===== 程序入口 ===== #
imgname = sys.argv[1]
print(imgname)
finalImg = ''
originImg = ''

# 根据ros拍到的不同位置的图片，执行不同的操作（调参调出来的）
if imgname == '8':
    imgname = 'Image1.jpg'
    finalImg = f'{root}/{imgname}'
    originImg = f'{root}/8.jpg'
    Brighten(originImg,finalImg,3.7)
    lashen(finalImg,finalImg,2300,1080)

elif imgname == 'over':
    imgname = 'Image2.jpg'
    finalImg = f'{root}/{imgname}'
    originImg = f'{root}/over.jpg'
    Brighten(originImg,finalImg,3.5)

elif imgname == '5':
    imgname = 'Image3.jpg'
    finalImg = f'{root}/{imgname}'
    originImg = f'{root}/5.jpg'
    Brighten(originImg,finalImg,2.8)

elif imgname == '3':
    imgname = 'Image4.jpg'
    finalImg = f'{root}/{imgname}'
    originImg = f'{root}/3.jpg'
    Brighten(originImg,finalImg,2.3)

elif imgname == '11':
    imgname = 'Image5.jpg'
    finalImg = f'{root}/{imgname}'
    originImg = f'{root}/11.jpg'
    Brighten(originImg,finalImg,2.4)


if finalImg != '':
    crop_margin(finalImg, finalImg)
    os.remove(originImg)
    pic_out = os.path.join(to+'/exp', imgname)
    detect(finalImg, pic_out, imgname, False)     # 测试时用ture
    Printtimeandname(pic_out)

比赛用到了ros，但是环境是2022年配的，我已经忘了……