tf-pose-estimation 姿态估计实测

最新推荐文章于 2022-03-08 17:12:36 发布

南洲.

最新推荐文章于 2022-03-08 17:12:36 发布

阅读量6.1k

点赞数 5

分类专栏：深度学习

本文链接：https://blog.csdn.net/zhou4411781/article/details/100272929

版权

深度学习专栏收录该内容

34 篇文章 5 订阅

订阅专栏

在研究行为检测时，对tf-pose-estimation进行了测试，它本身比较轻巧，可以利用CPU进行实时的检测，容易跑通。
我在Ubuntu16.04上进行了配置和测试，记录如下：
tf-pose-estimation的github地址为：https://github.com/ildoonet/tf-pose-estimation

tf-pose-estimation依赖项：

python3
tensorflow 1.4.1+
opencv3, protobuf, python3-tk
slidingwindow

安装：

下载代码并安装第三方依赖库：

git clone https://www.github.com/ildoonet/tf-pose-estimation
cd tf-pose-estimation
pip3 install -r requirements.txt

编译C++库：

cd tf_pose/pafprocess
swig -python -c++ pafprocess.i && python3 setup.py build_ext --inplace

模型下载：

在运行前，需要下载模型文件。您可以在移动设备或其他平台上部署此模型文件。

cmu (trained in 656x368)
mobilenet_thin (trained in 432x368)
mobilenet_v2_large (trained in 432x368)
mobilenet_v2_small (trained in 432x368)

cd models/graph/cmu
bash download.sh

测试：

单个图片测试

python run.py --model=mobilenet_thin --resize=432x368 --image=./images/apink3.jpg

在这里插入图片描述

电脑摄像头测试

python run_webcam.py --model=mobilenet_thin --resize=432x368 --camera=0

视频测试

作者没有给出命令，不过自己对run_video.py稍加修改代码即可，我在抖音上下载了一段视频，进行了测试：

鉴于有同学不知道如何更改读取视频和保存，我把改好以后的run_video.py代码贴出来，如下：

import argparse
import logging
import time

import cv2
import numpy as np

from tf_pose import common
from tf_pose.estimator import TfPoseEstimator
from tf_pose.networks import get_graph_path, model_wh

logger = logging.getLogger('TfPoseEstimator-Video')
logger.setLevel(logging.DEBUG)
ch = logging.StreamHandler()
ch.setLevel(logging.DEBUG)
formatter = logging.Formatter('[%(asctime)s] [%(name)s] [%(levelname)s] %(message)s')
ch.setFormatter(formatter)
logger.addHandler(ch)

fps_time = 0


if __name__ == '__main__':
    parser = argparse.ArgumentParser(description='tf-pose-estimation Video')
    parser.add_argument('--video', type=str, default='./images/crowd.mp4')  #改为自己的输入视频
    parser.add_argument('--resolution', type=str, default='432x368', help='network input resolution. default=432x368')
    parser.add_argument('--model', type=str, default='mobilenet_thin', help='cmu / mobilenet_thin / mobilenet_v2_large / mobilenet_v2_small')
    parser.add_argument('--resize-out-ratio', type=float, default=4.0,help='if provided, resize heatmaps before they are post-processed. default=1.0')
    parser.add_argument('--show-process', type=bool, default=False,help='for debug purpose, if enabled, speed for inference is dropped.')
    parser.add_argument('--showBG', type=bool, default=True, help='False to show skeleton only.')
    args = parser.parse_args()

    logger.debug('initialization %s : %s' % (args.model, get_graph_path(args.model)))
    w, h = model_wh(args.resolution)
    e = TfPoseEstimator(get_graph_path(args.model), target_size=(w, h))
    cap = cv2.VideoCapture(args.video)

    ret_val, image = cap.read()
    video_size = (image.shape[1],image.shape[0])
    fourcc = cv2.VideoWriter_fourcc('M', 'P', '4', '2')
    #video_size = (2560,1920)
    outVideo = cv2.VideoWriter('save.avi', fourcc, 10, video_size)
    filename = '/home/yasin/save.avi'    #更改为自己的保存路径
    if cap.isOpened() is False:
        print("Error opening video stream or file")
    while cap.isOpened():
        ret_val, image = cap.read()

        if not ret_val:
            break
        logger.debug('image process+')
        #image = common.read_imgfile(args.image, None, None)
        humans = e.inference(image, resize_to_default=(w > 0 and h > 0), upsample_size=args.resize_out_ratio)

        logger.debug('postprocess+')
        image = TfPoseEstimator.draw_humans(image, humans, imgcopy=False)
        logger.debug('show+')
        # image_res = cv2.resize(image,(640,480))
        cv2.putText(image, "FPS: %f" % (1.0 / (time.time() - fps_time)), (10, 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5,(0, 255, 0), 2)
        cv2.imshow('tf-pose-estimation result', image)
        fps_time = time.time()
        outVideo.write(image)
        # video_write = cv2.VideoWriter(filename, fourcc, 10, video_size)
        # video_write.write(image)

        if cv2.waitKey(1) == 27:
            break
        logger.debug('finished+')
    cap.release()
    outVideo.release()
    cv2.destroyAllWindows()
    '''   while cap.isOpened():
        ret_val, image = cap.read()

        humans = e.inference(image,resize_to_default=(w > 0 and h > 0))
        if not args.showBG:
            image = np.zeros(image.shape)
        image = TfPoseEstimator.draw_humans(image, humans, imgcopy=False)

        cv2.putText(image, "FPS: %f" % (1.0 / (time.time() - fps_time)), (10, 10),  cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)
        cv2.imshow('tf-pose-estimation result', image)
        fps_time = time.time()
        if cv2.waitKey(1) == 27:
            break

    cv2.destroyAllWindows()
logger.debug('finished+')
'''