[姿态估计]LightWeightOpenpose如何训练自己的数据

最新推荐文章于 2024-10-11 07:29:57 发布

莫克_Cheney

最新推荐文章于 2024-10-11 07:29:57 发布

阅读量2.8k

点赞数 3

分类专栏：深度学习文章标签：深度学习人工智能

本文链接：https://blog.csdn.net/herocheney/article/details/125311938

版权

深度学习专栏收录该内容

18 篇文章

订阅专栏

# Train on custom dataset

训练定制的数据集

It will require some :clock1:, but here is the guide from :zero: to :muscle:. In the end you will feel :shipit: :neckbeard: :godmode:, I guarantee that!

这将会花些时间,但是这个指导书是从零到NB的过程,最终会感觉到很NB,很爽,我保证.

## Preface

序言

### What is `BODY_PARTS_KPT_IDS` and `BODY_PARTS_PAF_IDS`?

什么是BODY_PARTS_KPT_IDS 和BODY_PARTS_PAF_IDS?

Both lists are related to grouping keypoints into person instances. Network predicts two tensors: the first with keypoint heatmaps, to localize all possible keypoints of each type (neck, left shoulder, right shoulder, left elbow, etc.) and the second with connections between keypoints of predefined type.

两个列表都关系到如何将关键点分组到人体实例.网络预测两个张亮,第一个是关键点的热力图,去定位所有可能的各种类型的关键点(脖子,左肩,右肩,左肘,等等),第二个是关键点预定义类型的连接.

From the heatmaps we can extract coordinates of all keypoints, which network was able to find. Now these keypoints need to be grouped into persons. It is very easy to do if only one person can be inside an image: beacuse we have already knew keypoints coordinates and their type, so all found keypoints belong to the desired person. Situation becomes harder if multiple persons may be present inside an image. What we can do in this case? For example, network finds two keypoints of right shoulder and only one neck. One neck keypoint is good, possibly can extract pose of one person. But there are two right shoulder keypoints. We know, that single pose contain at most one right shoulder. Which one we should choose? It is tricky, but let network help us.

从热力图我们能够抽取到所有关键点的坐标,这些关键点的网络能够被找到.现在这些关键点需要被分组到人体实例中,一个人那当然相当的简单了,因为我们已经知道关键点的坐标和他们的类型,所以,所有发现的关键点都是属于那个人的.如果有很多人的话,这个情景就比较困难了,怎么办怎么办???例如,网络发现两个右肩的关键点,一个脖子的关键点,一个脖子的关键点还好说,兴许是一个人的姿态,但是两个肩膀子怎么办,我们知道,单人最多一个右肩膀子,废话.选哪个肩膀子呢.严格的,网络救了我们....

To group keypoints into persons instances, network learns to predict connections between keypoints of each person. Like a bones of skeleton. So once we know, which keypoints are connected between each other, the full pose can be read, starting from the first keypoint and checking if it is connected with other keypoints. Once connections between the fist keypoint and its neighbouring keypoints are established, we continue to assemble keypoints into pose by exploring neighbouring keypoints and keypoints with which they are connected, and so on. Pairs of keypoint indices, between which network should predict connection, are exactly what is defined in [`BODY_PARTS_KPT_IDS`](https://github.com/Daniil-Osokin/lightweight-human-pose-estimation.pytorch/blob/master/modules/keypoints.py#L5-L6) list. Let's check the pose scheme image:

为了去给关键点分组到人体实例,网络学了去预测每个人的关键点的链接,像是骨架一样.所以一旦我们知道哪两个关键点相互连接,全部的姿态就能被解读,从第一个关键点开始,看看他是否和其他关键点连接,一旦第一个关键点和他相邻的关键点的链接建立起来,我们就能继续链接.关键点的索引对,在索引对之间网络预测链接,在BODY_PARTS_KPT_IDS里可以提取出来.

You see, pair `[1, 5]` corresponds to connection between keypoints with indices `1` and `5`, which is neck and left shoulder. Pair `[14, 16]` corresponds to right eye and right ear keypoints. These pairs are defined (by you) before the training, because network needs to know, connection between which keypoints it should learn. [`BODY_PARTS_PAF_IDS`](https://github.com/Daniil-Osokin/lightweight-human-pose-estimation.pytorch/blob/master/modules/keypoints.py#L7-L8) list defines indices of network output channels which encodes connection between corresponding keypoints pair. PAF stands for part affinity field, it is a term from the [original paper](https://arxiv.org/pdf/1611.08050.pdf), which describes connection between keypoints pair.

你看看,点对[1,5],代表着索引1和索引5的这两个关键点,是脖子和左肩,点对[14,16]是右眼睛和右耳朵的关键点,这些点对,在训练前被定义.BODY_PARTS_PAF_IDS定义网络输出的通道,相关关键点对的链接编码.PAF表示亲和力想两场,代表点对的链接.

### How to choose pairs of keypoints to connect?

如何选择点对去链接.?????

One may select all-to-all connection scheme, thus having `(number keypoints) * (number keypoints - 1)` keypoint pairs (keypoint connection with itself is skipped as useless for grouping into instances purpose). If number of keypoints is 18, and all-to-all connection scheme is used, then network needs to learn `18 * 17 = 306` connections between keypoints. Large number of connections makes network more complex and slower, but gives more accurate grouping: because each keypoint is connected to any other keypoint of this person, and, for example, if network fails to detect connection between right elbow and right shoulder, we may group right elbow into pose by checking connection between right elbow and neck or with other keypoints.

一种是选择全连接的框架,这样有(关键点个数) * (关键点个数 -1)个关键点对（关键点和自己的链接被跳过因为对分组无用）如果关键点是１８个，全链接框架使用，网络需要学习１８＊１７　＝　３０６个关键点的链接．大量的关键点链接使得网络复杂而又慢，但是能够更好的精确的分组．因为这个人的每个关键点与其他关键点的链接，例如，如果网络没有检测到右肘和右肩的链接，我们可以把右肘，分组检查，右肘和脖子或者其他关键点的链接．

Actual number of keypoints pairs is a trade-off between network inference speed and accuracy. In this work there are 19 keypoint pairs. However, there is **best practice:** it makes sence to define a special root keypoint, which is connected with the rest keypoints for the better accuracy (as discussed above). Usually the most robust keypoint, which is rarely occluded and easy to detect, is a good candidate for root keypoint. The root keypoint serves as the first keypoint to start grouping. For persons it is usually neck or pelvis (or both, or even more, it is a trade-off).

实际的点对是网络推理速度和经度的权衡．在这个工作里有１９个点对．然而，有一个最好的实践，使得定义一个根关键点，这个关键点和其他的关键点的链接有着很好的经度．通常最鲁邦的关键点，很少被遮挡，很好被检测，是很好的根关键点，这个根关键点作为第一个关键点去开始分组，对于人来说，通常是脖子或者是骨盆，这个得去权衡．

### How connections between keypoints pairs are implemented at network level?

如何实现网络级别的关键点对链接．？？？？？？？？？？？？？？？？？？？

Connection between keypoints pair is represented as a unit vector between these keypoints. So for given keypoint `a` with coordinates (xa, ya) and keypoint `b` with coordinates (xb, yb) such unit vector cba is computed as: (xb-xa, yb-ya), then normalized by its length. All pixels between keypoints from the pair are contain this vector. Network predicts two separate channels: one for `x` component and one for `y` component of connection vector for each keypoints pair as its output. So for 19 keypoints pairs the network will predict `19 * 2 = 38` channels for connections between keypoints. At inference time, we do exaustive search between all keypoints of specific types from keypoints pair, and compare the vector formed by these keypoints with the learned one by network. If vectors are matched, then these keypoints are connected. Indices of the network output channels for `x` and `y` components of connection vector for corresponding keypoints pair are stored in [`BODY_PARTS_PAF_IDS`](https://github.com/Daniil-Osokin/lightweight-human-pose-estimation.pytorch/blob/master/modules/keypoints.py#L7-L8) list.

关键点对之间的链接被表示城一个关键点间的单元向量．所以对于给定的关键点ａ坐标［ｘａ，ｙａ］和关键点ｂ［xb,yb］，根据长度进行归一化，所有的关键点对之间的像素包含这个向量．网络预测两个分别的通道，一个是ｘ组件，一个是ｙ组件的每个关键点向量的链接和他的输出．所以１９个关键点有３８个通道输出．推理的时候，我们确定系统详尽的在所有关键点的所有点对的所有类型进行搜索，如果向量匹配，然后这些关键点就被链接．网络输出通道的索引和存储有关．

### How persons keypoints are grouped into instances?

人体关键点如何被分组到实例中？？？？？？？？？？？？？？？

As we discussed above, the network outputs two tensors: keypoints and connections between predefined keypoints pairs. We will start from the first such pair `[1, 2]` from [`BODY_PARTS_KPT_IDS`](https://github.com/Daniil-Osokin/lightweight-human-pose-estimation.pytorch/blob/master/modules/keypoints.py#L5-L6) list, which is neck and right shoulder. Lines [63-92](https://github.com/Daniil-Osokin/lightweight-human-pose-estimation.pytorch/blob/master/modules/keypoints.py#L63-L92) handle corner cases, when one or both keypoints types from pair are missed. In these cases all existed poses instances (`pose_entries`) are checked if they contain current keypoint. So if network does not find any right shoulder keypoint, all found neck keypoints will be checked if they already belong to existed poses, if not a new pose instance with this keypoint is created.

正如我们上述讨论的，网络输出两个张亮，一个是关键点和链接点对．我们从第一个点对［１，２］开始，脖子和右肩膀，连线［６３－９２］处理角落案例，当一个或者点对中的所有点被丢失了．所有的姿态实例被检查，如果他们包含当前关键点，所以，如果网络步能找到任何右肩的关键点，所有脖子的关键点将会被检查，如果他们已经属于存在的姿态，如果没有一个新的实例被穿件出来

Lines [94-141](https://github.com/Daniil-Osokin/lightweight-human-pose-estimation.pytorch/blob/master/modules/keypoints.py#L94-L141) verify which of found keypoints (of particular type from the pair) are connected by doing exhaustive search between them and checking if learned connection vector corresponds to the vector between keypoints locations.

Lines [159-193](https://github.com/Daniil-Osokin/lightweight-human-pose-estimation.pytorch/blob/master/modules/keypoints.py#L159-L193) assign connected keypoints to one of existed pose instances. If it is the first keypoint pair ([`part_id == 0`](https://github.com/Daniil-Osokin/lightweight-human-pose-estimation.pytorch/blob/master/modules/keypoints.py#L159)) a new pose instance is created, containing both keypoints. Else current keypoints pair will be assigned to that pose instance, which already [contain the first from these keypoints](https://github.com/Daniil-Osokin/lightweight-human-pose-estimation.pytorch/blob/master/modules/keypoints.py#L159), e.g. if current pair of keypoints is right shoulder and right elbow, then right elbow will be assigned to pose instance, which already has right shoulder with particular coordinates (assigned at previous step with neck and right shoulder pair). If no pose instance found, which contains the first keypoint from pair, then a new pose instance [is created](https://github.com/Daniil-Osokin/lightweight-human-pose-estimation.pytorch/blob/master/modules/keypoints.py#L159). And one by one, all keypoints pairs are processed. As you can see, if keypoints pairs order (and order of keypoints in a pair) in [`BODY_PARTS_KPT_IDS`](https://github.com/Daniil-Osokin/lightweight-human-pose-estimation.pytorch/blob/master/modules/keypoints.py#L5-L6) list is random, then multiple pose instances from disjoint keypoints pairs will be created instead of one instance with all keypoints. That is why the order of keypoints pairs matters and root keypoint is useful to connect keypoints more robustly.

We have talked here about person poses, however the same considerations may be applied for different object types.

## Dataset format

The easiest way is to use annotation in [COCO](http://cocodataset.org/#format-data) format. So if you need to label dataset, consider [coco-annotator](https://github.com/jsbroks/coco-annotator) tool (possibly there are alternatives, but I am not aware of it). If there is already annotated dataset, just convert it to COCO [format](http://cocodataset.org/#format-data).

Now convert dataset from COCO format into [internal](https://github.com/Daniil-Osokin/lightweight-human-pose-estimation.pytorch#training) format:

```

python scripts/prepare_train_labels.py --labels custom_dataset_annotation.json

```

## Modifications of the training code

1. Original COCO keypoins order are converted to internal one. It is not necessary for training on new data, so [`_convert`](https://github.com/Daniil-Osokin/lightweight-human-pose-estimation.pytorch/blob/master/datasets/transformations.py#L36) can be safely removed.

2. Modify keypoints indices to properly [swap](https://github.com/Daniil-Osokin/lightweight-human-pose-estimation.pytorch/blob/master/datasets/transformations.py#L252) left and right sides of object.

3. Set own [`BODY_PARTS_KPT_IDS`](https://github.com/Daniil-Osokin/lightweight-human-pose-estimation.pytorch/blob/master/datasets/coco.py#L13) to define keypoints pairs for grouping.

4. Set output channels number for keypoints `num_heatmaps` as number of keypoints to detect + 1 for a background and connections between keypoints `num_pafs` for [network object](https://github.com/Daniil-Osokin/lightweight-human-pose-estimation.pytorch/blob/master/train.py#L26). For example, if new object has 5 keypoints and defined 4 keypoints pairs for grouping, then network object is created as:

```

net = PoseEstimationWithMobileNet(num_refinement_stages, num_heatmaps=6, num_pafs=8)

```

`num_pafs` is 8 because each connection encoded as 2 output channels for `x` and `y` component of vector between keypoints from pair.

5. For proper network inference and validation set new keypoints indices pairs in [`BODY_PARTS_KPT_IDS`](https://github.com/Daniil-Osokin/lightweight-human-pose-estimation.pytorch/blob/master/modules/keypoints.py#L5-L6) and corresponding indices of network output channels for connections between keypoints in pairs in [`BODY_PARTS_PAF_IDS`](https://github.com/Daniil-Osokin/lightweight-human-pose-estimation.pytorch/blob/master/modules/keypoints.py#L7-L8).

6. To run a standalone validation, modify [network object creation](https://github.com/Daniil-Osokin/lightweight-human-pose-estimation.pytorch/blob/master/val.py#L174) accordingly to new number of learned keypoints and connections between them.

## Congratulations

My congratulations, now you are pose estimation master :sunglasses:! May the force be with you! :accept: