HRNet-Facial-Landmark-Detection 训练自己数据集

在这里插入图片描述

This is the official code of High-Resolution Representations for Facial Landmark Detection. We extend the high-resolution representation (HRNet) [1] by augmenting the high-resolution representation by aggregating the (upsampled) representations from all the parallel convolutions, leading to stronger representations. The output representations are fed into classifier. We evaluate our methods on four datasets, COFW, AFLW, WFLW and 300W.

在这里插入图片描述
Git : https://github.com/HRNet/HRNet-Facial-Landmark-Detection


Train 🚀

  1. 准备训练数据集 (数据格式任意,方便读取即可) 参考 300W 数据集

  2. experiments 目录下新建配置文件 eg: experiments/own/own_hrnet_w18.yaml

    1. 修改 DATASET中 ,ROOT、TRAINSET、TESTSET 目录路径 ,DATASET: 数据集名称
    2. 修改MODEL中 ,NUM_JOINTS: 对应自己训练集特征点数
      DATASET:
        DATASET: OWN
        ROOT: '../data/own/images'
        TRAINSET: '../data/own/train.json'
        TESTSET: '../data/own/val.json'
        FLIP: true
        SCALE_FACTOR: 0.25
        ROT_FACTOR: 30
      MODEL:
        NAME: 'hrnet'
        NUM_JOINTS: 37  // 根据自己数据集特征点数量
        INIT_WEIGHTS: true
        PRETRAINED: 'hrnetv2_pretrained/hrnetv2_w18_imagenet_pretrained.pth'
      
  3. lib/datasets 中 新建 own.py 用于根据自己数据格式读取数据 拷贝 face300w.py内容, 修改类名__getitem__方法.

    1. center,scale 计算公式
      scale = max(w, h) / 200
      center_w = (x1 + x2) / 2
      center_h = (y1 + y2) / 2
      
    2. 根据自己格式读取. 我生成的是json格式.
      def calCenterScale(self, bbox):
          w = bbox[2] - bbox[0]
          h = bbox[3] - bbox[1]
          center_w = (bbox[0] + bbox[2]) / 2.0
          center_h = (bbox[1] + bbox[3]) / 2.0
          scale = round((max(w, h) / 200.0), 2)
          return center_w, center_h, scale
          
      def __getitem__(self, idx):
          image_path = os.path.join(self.data_root,
                                        self.landmarks_frame[idx]["image_path"])
          bbox = self.landmarks_frame[idx]['bbox']
          center_w, center_h, scale = self.calCenterScale(bbox)
          center = torch.Tensor([center_w, center_h])
          pts = np.array(self.landmarks_frame[idx]["keypoints"])
          pts = pts.astype('float').reshape(-1, 2)       
          ...
      
    3. 修改 lib/datasets/init.py 增加自己的数据集名称 (yaml 中设置的名称)
      	from .aflw import AFLW
      	from .cofw import COFW
      	from .face300w import Face300W
      	from .wflw import WFLW
      	from .own import Own
      	
      	__all__ = ['AFLW', 'COFW', 'Face300W', 'WFLW', 'OWN', 'get_dataset']
      	
      	def get_dataset(config):
      	    if config.DATASET.DATASET == 'AFLW':
      	        return AFLW
      	    elif config.DATASET.DATASET == 'COFW':
      	        return COFW
      	    elif config.DATASET.DATASET == '300W':
      	        return Face300W
      	    elif config.DATASET.DATASET == 'WFLW':
      	        return WFLW
      	    elif config.DATASET.DATASET == 'OWN':
      	        return Own
      	    else:
      	        raise NotImplemented()
      
  4. 修改 lib/core/evaluation.py compute_nme 方法 ,增加自己的特征点数 取两个眼角下标。

    def compute_nme(preds, meta):
        targets = meta['pts']
        preds = preds.numpy()
        target = targets.cpu().numpy()
    
        N = preds.shape[0]
        L = preds.shape[1]
        rmse = np.zeros(N)
    
        for i in range(N):
            pts_pred, pts_gt = preds[i,], target[i,]
            if L == 19:  # aflw
                interocular = meta['box_size'][i]
            elif L == 29:  # cofw
                interocular = np.linalg.norm(pts_gt[8,] - pts_gt[9,])
            elif L == 68:  # 300w
                # interocular
                interocular = np.linalg.norm(pts_gt[36,] - pts_gt[45,])
            elif L == 98:
                interocular = np.linalg.norm(pts_gt[60,] - pts_gt[72,])
            elif L == 37:
                interocular = np.linalg.norm(pts_gt[0,] - pts_gt[15,])
            else:
                raise ValueError('Number of landmarks is wrong')
            rmse[i] = np.sum(np.linalg.norm(pts_pred - pts_gt, axis=1)) / (interocular * L)
    
        return rmse
    
  5. 修改 utils/transforms.py 中 fliplr_joints 方法 ( FLIP=false 无需改 )
    ** 据自己的特征点标注下标,如果从下标0开始标注,不需要 -1 ,类似 WFLW 数据集

  6. train.py 修改成自己的yaml,开始训练即可

    Epoch: [0][0/916]	Time 18.342s (18.342s)	Speed 0.9 samples/s	Data 14.961s (14.961s)	Loss 0.00214 (0.00214)	
    Epoch: [0][50/916]	Time 0.542s (0.880s)	Speed 29.5 samples/s	Data 0.000s (0.294s)	Loss 0.00076 (0.00085)	
    Epoch: [0][100/916]	Time 0.537s (0.708s)	Speed 29.8 samples/s	Data 0.000s (0.148s)	Loss 0.00074 (0.00080)	
    Epoch: [0][150/916]	Time 0.530s (0.650s)	Speed 30.2 samples/s	Data 0.000s (0.099s)	Loss 0.00075 (0.00079)	
    Epoch: [0][200/916]	Time 0.531s (0.621s)	Speed 30.1 samples/s	Data 0.001s (0.075s)	Loss 0.00074 (0.00077)	
    Epoch: [0][250/916]	Time 0.532s (0.603s)	Speed 30.1 samples/s	Data 0.000s (0.060s)	Loss 0.00072 (0.00077)	
    Epoch: [0][300/916]	Time 0.525s (0.592s)	Speed 30.5 samples/s	Data 0.000s (0.050s)	Loss 0.00073 (0.00076)	
    Epoch: [0][350/916]	Time 0.541s (0.583s)	Speed 29.6 samples/s	Data 0.000s (0.043s)	Loss 0.00071 (0.00075)	
    Epoch: [0][400/916]	Time 0.536s (0.577s)	Speed 29.9 samples/s	Data 0.000s (0.038s)	Loss 0.00067 (0.00074)	
    Epoch: [0][450/916]	Time 0.534s (0.572s)	Speed 30.0 samples/s	Data 0.000s (0.034s)	Loss 0.00057 (0.00073)	
    Epoch: [0][500/916]	Time 0.534s (0.568s)	Speed 30.0 samples/s	Data 0.000s (0.030s)	Loss 0.00056 (0.00072)	
    Epoch: [0][550/916]	Time 0.528s (0.565s)	Speed 30.3 samples/s	Data 0.000s (0.027s)	Loss 0.00055 (0.00071)	
    Epoch: [0][600/916]	Time 0.533s (0.562s)	Speed 30.0 samples/s	Data 0.001s (0.025s)	Loss 0.00053 (0.00069)	
    Epoch: [0][650/916]	Time 0.528s (0.560s)	Speed 30.3 samples/s	Data 0.000s (0.023s)	Loss 0.00051 (0.00068)	
    Epoch: [0][700/916]	Time 0.535s (0.558s)	Speed 29.9 samples/s	Data 0.000s (0.022s)	Loss 0.00050 (0.00067)	
    Epoch: [0][750/916]	Time 0.537s (0.556s)	Speed 29.8 samples/s	Data 0.000s (0.020s)	Loss 0.00053 (0.00066)	
    Epoch: [0][800/916]	Time 0.532s (0.555s)	Speed 30.1 samples/s	Data 0.000s (0.019s)	Loss 0.00047 (0.00065)	
    Epoch: [0][850/916]	Time 0.531s (0.554s)	Speed 30.1 samples/s	Data 0.000s (0.018s)	Loss 0.00051 (0.00064)	
    Epoch: [0][900/916]	Time 0.526s (0.552s)	Speed 30.4 samples/s	Data 0.000s (0.017s)	Loss 0.00054 (0.00063)	
    Train Epoch 0 time:0.5524 loss:0.0006 nme:0.3472
    best: True
    Test Epoch 0 time:0.3146 loss:0.0005 nme:0.1605 [008]:0.8482 [010]:0.5162
    => saving checkpoint to output\OWN\own_hrnet_w18
    

验证 ​✌️

  1. 验证可以参考这个帖子自行修改一下,感觉挺方便测试。 https://github.com/HRNet/HRNet-Facial-Landmark-Detection/issues/21
    # ------------------------------------------------------------------------------
    # Created by Gaofeng(lfxx1994@gmail.com)
    # ------------------------------------------------------------------------------
    
    import os
    import argparse
    
    import torch
    import torch.nn as nn
    import torch.backends.cudnn as cudnn
    import sys
    import cv2
    
    sys.path.insert(0, os.path.join(os.path.dirname(__file__), '..'))
    import lib.models as models
    from lib.config import config, update_config
    from PIL import Image
    import numpy as np
    from lib.utils.transforms import crop
    from lib.core.evaluation import decode_preds
    
    
    def parse_args():
        parser = argparse.ArgumentParser(description='Train Face Alignment')
    
        parser.add_argument('--cfg',
                            default='experiments/300w/face_alignment_300w_hrnet_w18.yaml',
                            help='experiment configuration filename', type=str)
        parser.add_argument('--model-file', help='model parameters',
                            default='HR18-300W.pth', type=str)
        parser.add_argument('--imagepath', help='Path of the image to be detected', default='111.jpg',
                            type=str)
        parser.add_argument('--face', nargs='+', type=float, default=[911, 1281, 1254, 1731],
                            help='The coordinate [x1,y1,x2,y2] of a face')
        args = parser.parse_args()
        update_config(config, args)
        return args
    
    
    def prepare_input(image, bbox, image_size):
        """
    
        :param image:The path to the image to be detected
        :param bbox:The bbox of target face
        :param image_size: refers to config file
        :return:
        """
        scale = max(bbox[2] - bbox[0], bbox[3] - bbox[1]) / 200
        center_w = (bbox[0] + bbox[2]) / 2
        center_h = (bbox[1] + bbox[3]) / 2
        center = torch.Tensor([center_w, center_h])
        scale *= 1.25
        img = np.array(Image.open(image).convert('RGB'), dtype=np.float32)
        mean = np.array([0.485, 0.456, 0.406], dtype=np.float32)
        std = np.array([0.229, 0.224, 0.225], dtype=np.float32)
        img = crop(img, center, scale, image_size, rot=0)
        img = img.astype(np.float32)
        img = (img / 255.0 - mean) / std
        img = img.transpose([2, 0, 1])
        img = torch.Tensor(img)
        img = img.unsqueeze(0)
        return img, center, scale
    
    
    def main():
        args = parse_args()
        cudnn.benchmark = config.CUDNN.BENCHMARK
        cudnn.determinstic = config.CUDNN.DETERMINISTIC
        cudnn.enabled = config.CUDNN.ENABLED
    
        config.defrost()
        config.MODEL.INIT_WEIGHTS = False
        config.freeze()
        model = models.get_face_alignment_net(config)
        if config.GPUS is list:
            gpus = list(config.GPUS)
        else:
            gpus = [config.GPUS]
        model = nn.DataParallel(model, device_ids=gpus).cuda()
    
        # load model
        state_dict = torch.load(args.model_file)
        model.load_state_dict(state_dict)
        model.eval()
        inp, center, scale = prepare_input(args.imagepath, args.face, config.MODEL.IMAGE_SIZE)
        output = model(inp)
        score_map = output.data.cpu()
        preds = decode_preds(score_map, center, scale, [64, 64])
        preds = preds.numpy()
        cv2.namedWindow('test', 0)
        img_once = cv2.imread(args.imagepath)
        for i in preds[0, :, :]:
            cv2.circle(img_once, tuple(list(int(p) for p in i.tolist())), 2, (255, 255, 0), 1)
        cv2.imshow('test', img_once)
        if cv2.waitKey(0) == 27:
            cv2.destroyAllWindows()
    
    
    if __name__ == '__main__':
        main()
    

END 🔚

  1. 感兴趣的兄弟姐妹,可以参考参考,有问题欢迎指正交流。

在这里插入图片描述

  • 5
    点赞
  • 16
    收藏
    觉得还不错? 一键收藏
  • 23
    评论
HRNet是一种用于面部关键点检测的人工智能模型。面部关键点是面部的几个具有重要意义的特定点,例如眼睛、鼻子、嘴巴等。HRNet采用高分辨率表示的思想,通过构建一个多分辨率的深度网络来提取不同层次的特征,从而提高了模型对细节的感知能力。 HRNet-Facial-Landmark-Detection是基于HRNet的面部关键点检测模型。它通过先对输入图像进行预处理,将图像转换为HRNet网络能够处理的格式,然后通过多层次的卷积神经网络提取图像中的特征。这些特征包含了面部关键点的信息,然后通过一个后续的全连接层将这些特征映射到最终的关键点位置。 HRNet-Facial-Landmark-Detection具有准确度高、鲁棒性强的优点。它可以在低光、遮挡等复杂环境下,准确地定位面部关键点。因此,HRNet-Facial-Landmark-Detection在人脸识别、表情识别、虚拟现实等领域具有广泛的应用前景。 需要注意的是,HRNet-Facial-Landmark-Detection的性能受到输入图像质量和数据集的限制。如果输入图像质量较差或数据集中没有涵盖模型需要的样本多样性,可能会降低模型的准确度。此外,模型的训练和测试过程需要耗费大量的计算资源和时间。 总之,HRNet-Facial-Landmark-Detection是一种高效、准确的面部关键点检测模型,它可以在复杂环境下准确地定位人脸的关键点位置。它的应用领域广泛,有助于改进人脸识别、表情识别和虚拟现实等技术。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 23
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值