[姿态估计]LightWeightOpenpose如何训练自己的数据

# Train on custom dataset

训练定制的数据集

It will require some :clock1:, but here is the guide from :zero: to :muscle:. In the end you will feel :shipit: :neckbeard: :godmode:, I guarantee that!

这将会花些时间,但是这个指导书是从零到NB的过程,最终会感觉到很NB,很爽,我保证.

## Preface

序言

### What is `BODY_PARTS_KPT_IDS` and `BODY_PARTS_PAF_IDS`?

什么是BODY_PARTS_KPT_IDS 和BODY_PARTS_PAF_IDS?

Both lists are related to grouping keypoints into person instances. Network predicts two tensors: the first with keypoint heatmaps, to localize all possible keypoints of each type (neck, left shoulder, right shoulder, left elbow, etc.) and the second with connections between keypoints of predefined type.

两个列表都关系到如何将关键点分组到人体实例.网络预测两个张亮,第一个是关键点的热力图,去定位所有可能的各种类型的关键点(脖子,左肩,右肩,左肘,等等),第二个是关键点预定义类型的连接.

From the heatmaps we can extract coordinates of all keypoints, which network was able to find. Now these keypoints need to be grouped into persons. It is very easy to do if only one person can be inside an image: beacuse we have already knew keypoints coordinates and their type, so all found keypoints belong to the desired person. Situation becomes harder if multiple persons may be present inside an image. What we can do in this case? For example, network finds two keypoints of right shoulder and only one neck. One neck keypoint is good, possibly can extract pose of one person. But there are two right shoulder keypoints. We know, that single pose contain at most one right shoulder. Which one we should choose? It is tricky, but let network help us.

从热力图我们能够抽取到所有关键点的坐标,这些关键点的网络能够被找到.现在这些关键点需要被分组到人体实例中,一个人那当然相当的简单了,因为我们已经知道关键点的坐标和他们的类型,所以,所有发现的关键点都是属于那个人的.如果有很多人的话,这个情景就比较困难了,怎么办怎么办???例如,网络发现两个右肩的关键点,一个脖子的关键点,一个脖子的关键点还好说,兴许是一个人的姿态,但是两个肩膀子怎么办,我们知道,单人最多一个右肩膀子,废话.选哪个肩膀子呢.严格的,网络救了我们....

To group keypoints into persons instances, network learns to predict connections between keypoints of each person. Like a bones of skeleton. So once we know, which keypoints are connected between each other, the full pose can be read, starting from the first keypoint and checking if it is connected with other keypoints. Once connections between the fist keypoint and its neighbouring keypoints are established, we continue to assemble keypoints into pose by exploring neighbouring keypoints and keypoints with which they are connected, and so on. Pairs of keypoint indices, between which network should predict connection, are exactly what is defined in [`BODY_PARTS_KPT_IDS`](https://github.com/Daniil-Osokin/lightweight-human-pose-estimation.pytorch/blob/master/modules/keypoints.py#L5-L6) list. Let's check the pose scheme image:

为了去给关键点分组到人体实例,网络学了去预测每个人的关键点的链接,像是骨架一样.所以一旦我们知道哪两个关键点相互连接,全部的姿态就能被解读,从第一个关键点开始,看看他是否和其他关键点连接,一旦第一个关键点和他相邻的关键点的链接建立起来,我们就能继续链接.关键点的索引对,在索引对之间网络预测链接,在BODY_PARTS_KPT_IDS里可以提取出来.

<p align="center">

<img src="data/shake_it_off.jpg" />

</p>

You see, pair `[1, 5]` corresponds to connection between keypoints with indices `1` and `5`, which is neck and left shoulder. Pair `[14, 16]` corresponds to right eye and right ear keypoints. These pairs are defined (by you) before the training, because network needs to know, connection between which keypoints it should learn. [`BODY_PARTS_PAF_IDS`](https://github.com/Daniil-Osokin/lightweight-human-pose-estimation.pytorch/blob/master/modules/keypoints.py#L7-L8) list defines indices of network output channels which encodes connection between corresponding keypoints pair. PAF stands for part affinity field, it is a term from the [original paper](https://arxiv.org/pdf/1611.08050.pdf), which describes connection between keypoints pair.

你看看,点对[1,5],代表着索引1和索引5的这两个关键点,是脖子和左肩,点对[14,16]是右眼睛和右耳朵的关键点,这些点对,在训练前被定义.BODY_PARTS_PAF_IDS定义网络输出的通道,相关关键点对的链接编码.PAF表示亲和力想两场,代表点对的链接.

### How to choose pairs of keypoints to connect?

如何选择点对去链接.?????

One may select all-to-all connection scheme, thus having `(number keypoints) * (number keypoints - 1)` keypoint pairs (keypoint connection with itself is skipped as useless for grouping into instances purpose). If number of keypoints is 18, and all-to-all connection scheme is used, then network needs to learn `18 * 17 = 306` connections between keypoints. Large number of connections makes network more complex and slower, but gives more accurate grouping: because each keypoint is connected to any other keypoint of this person, and, for example, if network fails to detect connection between right elbow and right shoulder, we may group right elbow into pose by checking connection between right elbow and neck or with other keypoints.

一种是选择全连接的框架,这样有(关键点个数) * (关键点个数 -1)个关键点对(关键点和自己的链接被跳过因为对分组无用)如果关键点是18个,全链接框架使用,网络需要学习18*17 = 306个关键点的链接.大量的关键点链接使得网络复杂而又慢,但是能够更好的精确的分组.因为这个人的每个关键点与其他关键点的链接,例如,如果网络没有检测到右肘和右肩的链接,我们可以把右肘,分组检查,右肘和脖子或者其他关键点的链接.

Actual number of keypoints pairs is a trade-off between network inference speed and accuracy. In this work there are 19 keypoint pairs. However, there is **best practice:** it makes sence to define a special root keypoint, which is connected with the rest keypoints for the better accuracy (as discussed above). Usually the most robust keypoint, which is rarely occluded and easy to detect, is a good candidate for root keypoint. The root keypoint serves as the first keypoint to start grouping. For persons it is usually neck or pelvis (or both, or even more, it is a trade-off).

实际的点对是网络推理速度和经度的权衡.在这个工作里有19个点对.然而,有一个最好的实践,使得定义一个根关键点,这个关键点和其他的关键点的链接有着很好的经度.通常最鲁邦的关键点,很少被遮挡,很好被检测,是很好的根关键点,这个根关键点作为第一个关键点去开始分组,对于人来说,通常是脖子或者是骨盆,这个得去权衡.

### How connections between keypoints pairs are implemented at network level?

如何实现网络级别的关键点对链接.???????????????????

Connection between keypoints pair is represented as a unit vector between these keypoints. So for given keypoint `a` with coordinates (x<sub>a</sub>, y<sub>a</sub>) and keypoint `b` with coordinates (x<sub>b</sub>, y<sub>b</sub>) such unit vector c<sub>ba</sub> is computed as: (x<sub>b</sub>-x<sub>a</sub>, y<sub>b</sub>-y<sub>a</sub>), then normalized by its length. All pixels between keypoints from the pair are contain this vector. Network predicts two separate channels: one for `x` component and one for `y` component of connection vector for each keypoints pair as its output. So for 19 keypoints pairs the network will predict `19 * 2 = 38` channels for connections between keypoints. At inference time, we do exaustive search between all keypoints of specific types from keypoints pair, and compare the vector formed by these keypoints with the learned one by network. If vectors are matched, then these keypoints are connected. Indices of the network output channels for `x` and `y` components of connection vector for corresponding keypoints pair are stored in [`BODY_PARTS_PAF_IDS`](https://github.com/Daniil-Osokin/lightweight-human-pose-estimation.pytorch/blob/master/modules/keypoints.py#L7-L8) list.

关键点对之间的链接被表示城一个关键点间的单元向量.所以对于给定的关键点a坐标[xa,ya]和关键点b[xb,yb],根据长度进行归一化,所有的关键点对之间的像素包含这个向量.网络预测两个分别的通道,一个是x组件,一个是y组件的每个关键点向量的链接和他的输出.所以19个关键点有38个通道输出.推理的时候,我们确定系统详尽的在所有关键点的所有点对的所有类型进行搜索,如果向量匹配,然后这些关键点就被链接.网络输出通道的索引和存储有关.
 

### How persons keypoints are grouped into instances?

人体关键点如何被分组到实例中???????????????

As we discussed above, the network outputs two tensors: keypoints and connections between predefined keypoints pairs. We will start from the first such pair `[1, 2]` from [`BODY_PARTS_KPT_IDS`](https://github.com/Daniil-Osokin/lightweight-human-pose-estimation.pytorch/blob/master/modules/keypoints.py#L5-L6) list, which is neck and right shoulder. Lines [63-92](https://github.com/Daniil-Osokin/lightweight-human-pose-estimation.pytorch/blob/master/modules/keypoints.py#L63-L92) handle corner cases, when one or both keypoints types from pair are missed. In these cases all existed poses instances (`pose_entries`) are checked if they contain current keypoint. So if network does not find any right shoulder keypoint, all found neck keypoints will be checked if they already belong to existed poses, if not a new pose instance with this keypoint is created.

正如我们上述讨论的,网络输出两个张亮,一个是关键点和链接点对.我们从第一个点对[1,2]开始,脖子和右肩膀,连线[63-92]处理角落案例,当一个或者点对中的所有点被丢失了.所有的姿态实例被检查,如果他们包含当前关键点,所以,如果网络步能找到任何右肩的关键点,所有脖子的关键点将会被检查,如果他们已经属于存在的姿态,如果没有一个新的实例被穿件出来

Lines [94-141](https://github.com/Daniil-Osokin/lightweight-human-pose-estimation.pytorch/blob/master/modules/keypoints.py#L94-L141) verify which of found keypoints (of particular type from the pair) are connected by doing exhaustive search between them and checking if learned connection vector corresponds to the vector between keypoints locations.

Lines [159-193](https://github.com/Daniil-Osokin/lightweight-human-pose-estimation.pytorch/blob/master/modules/keypoints.py#L159-L193) assign connected keypoints to one of existed pose instances. If it is the first keypoint pair ([`part_id == 0`](https://github.com/Daniil-Osokin/lightweight-human-pose-estimation.pytorch/blob/master/modules/keypoints.py#L159)) a new pose instance is created, containing both keypoints. Else current keypoints pair will be assigned to that pose instance, which already [contain the first from these keypoints](https://github.com/Daniil-Osokin/lightweight-human-pose-estimation.pytorch/blob/master/modules/keypoints.py#L159), e.g. if current pair of keypoints is right shoulder and right elbow, then right elbow will be assigned to pose instance, which already has right shoulder with particular coordinates (assigned at previous step with neck and right shoulder pair). If no pose instance found, which contains the first keypoint from pair, then a new pose instance [is created](https://github.com/Daniil-Osokin/lightweight-human-pose-estimation.pytorch/blob/master/modules/keypoints.py#L159). And one by one, all keypoints pairs are processed. As you can see, if keypoints pairs order (and order of keypoints in a pair) in [`BODY_PARTS_KPT_IDS`](https://github.com/Daniil-Osokin/lightweight-human-pose-estimation.pytorch/blob/master/modules/keypoints.py#L5-L6) list is random, then multiple pose instances from disjoint keypoints pairs will be created instead of one instance with all keypoints. That is why the order of keypoints pairs matters and root keypoint is useful to connect keypoints more robustly.

We have talked here about person poses, however the same considerations may be applied for different object types.

## Dataset format

The easiest way is to use annotation in [COCO](http://cocodataset.org/#format-data) format. So if you need to label dataset, consider [coco-annotator](https://github.com/jsbroks/coco-annotator) tool (possibly there are alternatives, but I am not aware of it). If there is already annotated dataset, just convert it to COCO [format](http://cocodataset.org/#format-data).

Now convert dataset from COCO format into [internal](https://github.com/Daniil-Osokin/lightweight-human-pose-estimation.pytorch#training) format:

```

python scripts/prepare_train_labels.py --labels custom_dataset_annotation.json

```

## Modifications of the training code

1. Original COCO keypoins order are converted to internal one. It is not necessary for training on new data, so [`_convert`](https://github.com/Daniil-Osokin/lightweight-human-pose-estimation.pytorch/blob/master/datasets/transformations.py#L36) can be safely removed.

2. Modify keypoints indices to properly [swap](https://github.com/Daniil-Osokin/lightweight-human-pose-estimation.pytorch/blob/master/datasets/transformations.py#L252) left and right sides of object.

3. Set own [`BODY_PARTS_KPT_IDS`](https://github.com/Daniil-Osokin/lightweight-human-pose-estimation.pytorch/blob/master/datasets/coco.py#L13) to define keypoints pairs for grouping.

4. Set output channels number for keypoints `num_heatmaps` as number of keypoints to detect + 1 for a background and connections between keypoints `num_pafs` for [network object](https://github.com/Daniil-Osokin/lightweight-human-pose-estimation.pytorch/blob/master/train.py#L26). For example, if new object has 5 keypoints and defined 4 keypoints pairs for grouping, then network object is created as:

```

net = PoseEstimationWithMobileNet(num_refinement_stages, num_heatmaps=6, num_pafs=8)

```

`num_pafs` is 8 because each connection encoded as 2 output channels for `x` and `y` component of vector between keypoints from pair.

5. For proper network inference and validation set new keypoints indices pairs in [`BODY_PARTS_KPT_IDS`](https://github.com/Daniil-Osokin/lightweight-human-pose-estimation.pytorch/blob/master/modules/keypoints.py#L5-L6) and corresponding indices of network output channels for connections between keypoints in pairs in [`BODY_PARTS_PAF_IDS`](https://github.com/Daniil-Osokin/lightweight-human-pose-estimation.pytorch/blob/master/modules/keypoints.py#L7-L8).

6. To run a standalone validation, modify [network object creation](https://github.com/Daniil-Osokin/lightweight-human-pose-estimation.pytorch/blob/master/val.py#L174) accordingly to new number of learned keypoints and connections between them.

## Congratulations

My congratulations, now you are pose estimation master :sunglasses:! May the force be with you! :accept:


 

  • 3
    点赞
  • 15
    收藏
    觉得还不错? 一键收藏
  • 3
    评论
### 回答1: 在lightweight OpenPose人体姿态估计网络中添加SENet注意力模块,可以参考以下步骤: 1. 首先需要在代码中导入相应的库,如下所示: ``` import torch import torch.nn as nn import torch.nn.functional as F ``` 2. 定义SENet注意力模块,代码如下: ``` class SEModule(nn.Module): def __init__(self, channels, reduction): super(SEModule, self).__init__() self.avg_pool = nn.AdaptiveAvgPool2d(1) self.fc1 = nn.Conv2d(channels, channels // reduction, kernel_size=1, bias=False) self.relu = nn.ReLU(inplace=True) self.fc2 = nn.Conv2d(channels // reduction, channels, kernel_size=1, bias=False) self.sigmoid = nn.Sigmoid() def forward(self, x): module_input = x x = self.avg_pool(x) x = self.fc1(x) x = self.relu(x) x = self.fc2(x) x = self.sigmoid(x) return module_input * x ``` 其中,`channels`表示输入的通道数,`reduction`表示缩减比例。 3. 在lightweight OpenPose人体姿态估计网络中添加SENet注意力模块,代码如下: ``` class PoseEstimationWithSENet(nn.Module): def __init__(self, num keypoints): super(PoseEstimationWithSENet, self).__init__() # 定义网络结构 # ... # 添加SENet注意力模块 self.se_block1 = SEModule(64, 16) self.se_block2 = SEModule(128, 16) self.se_block3 = SEModule(256, 16) self.se_block4 = SEModule(512, 16) # 定义输出层 # ... def forward(self, x): # 前向传播过程 # ... # 添加SENet注意力模块 x1 = self.se_block1(x1) x2 = self.se_block2(x2) x3 = self.se_block3(x3) x4 = self.se_block4(x4) # 输出结果 # ... return out ``` 其中,`num_keypoints`表示需要预测的关键点数量。 在代码中添加SENet注意力模块后,就可以对人体姿态进行更加准确的估计。 ### 回答2: 在lightweight OpenPose人体姿态估计网络中添加SENet注意力模块可以通过以下步骤完成: 1. 导入必要的库和模块,包括torch、torchvision、torch.nn等。 2. 定义SEBlock模块,它包含了SENet注意力模块的实现。SENet注意力模块主要由Squeeze操作和Excitation操作组成。 - Squeeze操作:将输入特征图进行全局池化,将每个通道的特征图转化为一个单一的数值。 - Excitation操作:使用全连接层对每个通道的特征进行映射,得到权重向量,表示每个通道的重要性。 - 最后,通过将输入特征图与权重向量进行乘法操作,得到加权后的特征图。 3. 在原始的OpenPose网络中,找到合适的位置插入SEBlock模块。一般来说,可以在每个卷积层(Conv2d)之后添加一个SEBlock模块。 4. 根据具体的网络结构,对每个卷积层后添加SEBlock模块的位置进行修改。可以通过继承nn.Module并重新定义forward函数来实现。 5. 在forward函数中,对每个卷积层后添加SEBlock模块,并将其结果作为输入传递给下一层。 6. 在训练过程中,根据需要进行参数更新和反向传播。 示例代码如下: ```python import torch import torch.nn as nn import torchvision.models as models # 定义SEBlock模块 class SEBlock(nn.Module): def __init__(self, in_channels, reduction_ratio=16): super(SEBlock, self).__init__() self.squeeze = nn.AdaptiveAvgPool2d(1) self.excitation = nn.Sequential( nn.Linear(in_channels, in_channels // reduction_ratio), nn.ReLU(inplace=True), nn.Linear(in_channels // reduction_ratio, in_channels), nn.Sigmoid() ) def forward(self, x): batch_size, channels, _, _ = x.size() squeeze = self.squeeze(x).view(batch_size, channels) excitation = self.excitation(squeeze).view(batch_size, channels, 1, 1) weighted_x = x * excitation.expand_as(x) return weighted_x # 在OpenPose网络中添加SEBlock模块的实现 class OpenPoseWithSENet(nn.Module): def __init__(self): super(OpenPoseWithSENet, self).__init__() self.backbone = models.resnet18(pretrained=True) self.se_block1 = SEBlock(64) self.se_block2 = SEBlock(128) self.se_block3 = SEBlock(256) self.se_block4 = SEBlock(512) def forward(self, x): x = self.backbone.conv1(x) x = self.backbone.bn1(x) x = self.backbone.relu(x) x = self.backbone.maxpool(x) x = self.backbone.layer1(x) x = self.se_block1(x) x = self.backbone.layer2(x) x = self.se_block2(x) x = self.backbone.layer3(x) x = self.se_block3(x) x = self.backbone.layer4(x) x = self.se_block4(x) return x # 初始化网络并进行训练 model = OpenPoseWithSENet() criterion = nn.CrossEntropyLoss() optimizer = torch.optim.SGD(model.parameters(), lr=0.001, momentum=0.9) # 进行训练和测试 for epoch in range(num_epochs): # 训练代码 # 测试代码 ``` 以上是在lightweight OpenPose人体姿态估计网络中添加SENet注意力模块的简要描述和代码实现。具体实现过程可能因网络结构和需求的不同而有所变化,需要根据具体情况进行调整和修改。 ### 回答3: 在轻量级OpenPose人体姿态估计网络中,我们可以通过添加SENet注意力模块来提升网络的性能。SENet注意力模块通过自适应地学习每个通道的权重,使得网络能够更加关注重要的特征信息。 首先,在网络的每个卷积层之后,我们可以添加一个SENet注意力模块。该模块由一个全局平均池化层和两个全连接层组成。全局平均池化层将特征图压缩成一个特征向量,然后通过两个全连接层获得每个通道的权重。最后,通过乘法操作将权重应用到原始特征图中。 下面是伪代码实现: ```python import torch import torch.nn as nn import torch.nn.functional as F class SENet(nn.Module): def __init__(self, in_channels, reduction_ratio=16): super(SENet, self).__init__() self.avg_pool = nn.AdaptiveAvgPool2d(1) self.fc1 = nn.Linear(in_channels, in_channels // reduction_ratio) self.fc2 = nn.Linear(in_channels // reduction_ratio, in_channels) def forward(self, x): b, c, _, _ = x.size() y = self.avg_pool(x).view(b, c) y = F.relu(self.fc1(y)) y = torch.sigmoid(self.fc2(y)).view(b, c, 1, 1) return x * y class LightweightOpenPose(nn.Module): def __init__(self): super(LightweightOpenPose, self).__init__() # define your network architecture here self.conv1 = nn.Conv2d(3, 64, kernel_size=3, stride=1, padding=1) self.senet1 = SENet(64) self.conv2 = nn.Conv2d(64, 128, kernel_size=3, stride=1, padding=1) self.senet2 = SENet(128) # add more layers and SENet modules here def forward(self, x): x = F.relu(self.senet1(self.conv1(x))) x = F.relu(self.senet2(self.conv2(x))) # add more forward operations here return x ``` 在这个示例中,我们只添加了两个SENet模块。你可以根据自己的需求在更多的卷积层之后添加SENet模块。同时,你还可以根据具体任务的需要来调整注意力模块中的超参数,如reduction_ratio等。这个代码只是一个示例,并不是针对具体应用定制的最优实现。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值