【mediapipe嵌入式实战02】使用libmediapipe编译静态库，并解读实例代码

Jiangnan_Cai

已于 2024-02-06 16:37:07 修改

阅读量1.5k

点赞数 3

分类专栏： mediapipe 文章标签：算法 python c++ 人工智能

于 2023-09-27 11:27:59 首次发布

本文链接：https://blog.csdn.net/Jiangnan_Cai/article/details/133295335

版权

mediapipe 专栏收录该内容

2 篇文章

订阅专栏

文章目录

文章日志：
example.cpp
- main
总结

文章日志：

2023-09-27：文章发布
2024-02-06：更新 facemesh 的 c++ 代码

首先给出 libmediapipe 的代码仓库：libmediapipe

虽然只有十几个星，但是可以一键将 mediapipe 编译成各平台可用的静态库，目前个人编译了一下 linux x86_64 下的静态库，跑了一下它提供的例子，可以跑通，目前看起来还不错的样子。

它的目录下提供了好几个不同平台的编译 shell 文档，找到自己对应的平台运行即可，虽然他建议的 opencv 版本是 4.7.0，但是我的是 4.1.2 也完全没有问题，然后因为内嵌着 bazel 编译，所以也是和第一篇文章一样，要针对自己的环境修改 opencv 和 ffmpeg 的路径。如果在编译过程中报错，也是直接打开 shell 文档内部对应的 bazel 编译命令进行修改，个人就因为缺少 numpy 加上了 python 的路径，也因为 tensorflow 编译卡住了加上了 -c opt 选项，也因为 bazel 的版本不完全匹配，进行了一些修改。

总的来说，只要你顺利的成功地使用 bazel 编译过 mediapipe，这里你都可以见招拆招地顺利完成静态库的导出。

成功编译之后会出现一个 output 文件夹

libmediapipe-main/output
  |—————— data： 内部各种所需的tf-lite模型
  |—————— libmediapipe_version_paltform
  			|—————— include： 头文件
  			|—————— lib：库文件

在 libmediapipe-main/example 内有一个例子 example.cpp，可以对其进行编译测试。

因为我也不怎么会 C++，所以非常头疼怎么将我的 python demo 转写成 C++ 的代码。下面我会边解读代码边学习 C++，尽可能的实现自己想要的功能。

example.cpp

#include <mediapipe.h>
#include <opencv2/opencv.hpp>
#include <iostream>

#define CHECK_MP_RESULT(result) \
    if (!result) \
    { \
        const char* error = mp_get_last_error(); \
        std::cerr << "[MediaPipe] " << error; \
        mp_free_error(error); \
        std::exit(1); \
    }

前面导包的部分我都明白，然后接着是宏定义部分，宏分为宏常量和宏函数两种，上面的就是宏函数，但是因为宏不能换行，所以要使用 “\” 来分割。

在八股文中也会经常考到：

宏函数与函数有什么区别？
使用宏的原因？
使用宏常量和 const 有什么区别？
C++是否应该避免使用宏，如何避免使用宏？
在类型重定义时，#define和typedof的区别？

这里的这个宏，也就是用来检查结果的。

const mp_hand_landmark CONNECTIONS[][2] = {
        {mp_hand_landmark_wrist,             mp_hand_landmark_thumb_cmc},
        {mp_hand_landmark_thumb_cmc,         mp_hand_landmark_thumb_mcp},
        {mp_hand_landmark_thumb_mcp,         mp_hand_landmark_thumb_ip},
        {mp_hand_landmark_thumb_ip,          mp_hand_landmark_thumb_tip},
        {mp_hand_landmark_wrist,             mp_hand_landmark_index_finger_mcp},
        {mp_hand_landmark_index_finger_mcp,  mp_hand_landmark_index_finger_pip},
        {mp_hand_landmark_index_finger_pip,  mp_hand_landmark_index_finger_dip},
        {mp_hand_landmark_index_finger_dip,  mp_hand_landmark_index_finger_tip},
        {mp_hand_landmark_index_finger_mcp,  mp_hand_landmark_middle_finger_mcp},
        {mp_hand_landmark_middle_finger_mcp, mp_hand_landmark_middle_finger_pip},
        {mp_hand_landmark_middle_finger_pip, mp_hand_landmark_middle_finger_dip},
        {mp_hand_landmark_middle_finger_dip, mp_hand_landmark_middle_finger_tip},
        {mp_hand_landmark_middle_finger_mcp, mp_hand_landmark_ring_finger_mcp},
        {mp_hand_landmark_ring_finger_mcp,   mp_hand_landmark_ring_finger_pip},
        {mp_hand_landmark_ring_finger_pip,   mp_hand_landmark_ring_finger_dip},
        {mp_hand_landmark_ring_finger_dip,   mp_hand_landmark_ring_finger_tip},
        {mp_hand_landmark_ring_finger_mcp,   mp_hand_landmark_pinky_mcp},
        {mp_hand_landmark_wrist,             mp_hand_landmark_pinky_mcp},
        {mp_hand_landmark_pinky_mcp,         mp_hand_landmark_pinky_pip},
        {mp_hand_landmark_pinky_pip,         mp_hand_landmark_pinky_dip},
        {mp_hand_landmark_pinky_dip,         mp_hand_landmark_pinky_tip}
}; // 将他们连接起来

这里则定义了一个二维数组 CONNECTIONS，表示手部关键点的连接关系，mp_hand_landmark 是在 mediapipe.h 头文件里面定义的枚举。下面头文件的使用 typeof 关键字，将枚举类型 enum 定义成别名 mp_hand_landmark，所以在上面的代码中，会使用 mp_hand_landmark 来定义二维数组内部的元素。

二维数组的定义

值得注意的是，二维数组使用这种方式定义的时候，一定要指明列的列数。

typedef enum {
    mp_hand_landmark_wrist = 0,
    mp_hand_landmark_thumb_cmc = 1,
    mp_hand_landmark_thumb_mcp = 2,
    mp_hand_landmark_thumb_ip = 3,
    mp_hand_landmark_thumb_tip = 4,
    mp_hand_landmark_index_finger_mcp = 5,
    mp_hand_landmark_index_finger_pip = 6,
    mp_hand_landmark_index_finger_dip = 7,
    mp_hand_landmark_index_finger_tip = 8,
    mp_hand_landmark_middle_finger_mcp = 9,
    mp_hand_landmark_middle_finger_pip = 10,
    mp_hand_landmark_middle_finger_dip = 11,
    mp_hand_landmark_middle_finger_tip = 12,
    mp_hand_landmark_ring_finger_mcp = 13,
    mp_hand_landmark_ring_finger_pip = 14,
    mp_hand_landmark_ring_finger_dip = 15,
    mp_hand_landmark_ring_finger_tip = 16,
    mp_hand_landmark_pinky_mcp = 17,
    mp_hand_landmark_pinky_pip = 18,
    mp_hand_landmark_pinky_dip = 19,
    mp_hand_landmark_pinky_tip = 20
} mp_hand_landmark;

所以我要照搬 facemesh 的连接点过来的话，只要定义一个const int CONNECTIONS[][2] 就行了。

main

我们先跳过 draw_landmarks 和 draw_rects 这两个函数，来解读 main 主函数的，因为这两个函数的功能还是比较简单的。

int main(int argc, char **argv) {
    // Check resource directory argument.
    if (argc < 2) {
        std::cerr << "Missing resource directory argument" << std::endl;
        return 1;
    }

    // Open camera.
    cv::VideoCapture capture(0);
    if (!capture.isOpened()) {
        std::cerr << "Failed to open camera" << std::endl;
        return 1;
    }

    // Set MediaPipe resource directory.
    mp_set_resource_dir(argv[1]);

    // Load the binary graph and specify the input stream name. 加载一个图，然后对图进行配置，来实现任务流程的规定
    std::string path = std::string(argv[1]) + "/mediapipe/modules/hand_landmark/hand_landmark_tracking_cpu.binarypb";  // model path 
    mp_instance_builder *builder = mp_create_instance_builder(path.c_str(), "image");  // 因为是c语言的库，c中没有string类型，要通过c_str将string转成指针与c兼容，实例化模型 builder

    // Configure the graph with node options and side packets. packet就相当于输入的数据，side packet就相当于输入的参数配置，
    mp_add_option_float(builder, "palmdetectioncpu__TensorsToDetectionsCalculator", "min_score_thresh", 0.6);  // 模型参数配置，为某些算子添加参数
    mp_add_option_double(builder, "handlandmarkcpu__ThresholdingCalculator", "threshold", 0.2);  // 模型参数配置
    mp_add_side_packet(builder, "num_hands", mp_create_packet_int(2)); // 模型的输入，几只手
    mp_add_side_packet(builder, "model_complexity", mp_create_packet_int(1)); // 模型输入，模型复杂度 1
    mp_add_side_packet(builder, "use_prev_landmarks", mp_create_packet_bool(true)); // 模型输入，根据前面的手来进行推理

    // Create an instance from the instance builder.通过模型构造器，构建一个模型
    mp_instance *instance = mp_create_instance(builder);
    CHECK_MP_RESULT(instance)

    // Create poller for the hand landmarks. 轮询器
    mp_poller *landmarks_poller = mp_create_poller(instance, "multi_hand_landmarks");
    CHECK_MP_RESULT(landmarks_poller)

    // Create a poller for the hand rectangles.
    mp_poller *rects_poller = mp_create_poller(instance, "hand_rects");
    CHECK_MP_RESULT(rects_poller)

    // Start the graph.
    CHECK_MP_RESULT(mp_start(instance))

    cv::Mat frame;
    while (true) {
        // Acquire the next video frame.
        if (!capture.read(frame)) {
            break;
        }

        // Store the frame data in an image structure.
        mp_image image;
        image.data = frame.data;
        image.width = frame.cols;
        image.height = frame.rows;
        image.format = mp_image_format_srgb; //计算图框架

        // Wrap the image in a packet and process it.
        CHECK_MP_RESULT(mp_process(instance, mp_create_packet_image(image)))

        // Wait until the image has been processed.
        CHECK_MP_RESULT(mp_wait_until_idle(instance))

        // Draw the hand landmarks onto the frame.
        if (mp_get_queue_size(landmarks_poller) > 0) {
            mp_packet *packet = mp_poll_packet(landmarks_poller);
            mp_multi_face_landmark_list *landmarks = mp_get_norm_multi_face_landmarks(packet);
            draw_landmarks(frame, landmarks);
            mp_destroy_multi_face_landmarks(landmarks);
            mp_destroy_packet(packet);
        }

        // Draw the hand rectangles onto the frame.
        if (mp_get_queue_size(rects_poller) > 0) {
            mp_packet *packet = mp_poll_packet(rects_poller);
            mp_rect_list *rects = mp_get_norm_rects(packet);
            draw_rects(frame, rects);
            mp_destroy_rects(rects);
            mp_destroy_packet(packet);
        }

        // Display the frame in a window.
        cv::imshow("MediaPipe", frame);
        if (cv::waitKey(1) == 27) {
            break;
        }
    }

    // Clean up resources.
    cv::destroyAllWindows();
    mp_destroy_poller(rects_poller);
    mp_destroy_poller(landmarks_poller);
    mp_destroy_instance(instance);

    return 0;
}

目录路径设置 + 相机初始化

这部分代码就不必过多解释了。

 // Check resource directory argument.
    if (argc < 2) {
        std::cerr << "Missing resource directory argument" << std::endl;
        return 1;
    }

    // Open camera.
    cv::VideoCapture capture(0);
    if (!capture.isOpened()) {
        std::cerr << "Failed to open camera" << std::endl;
        return 1;
    }

    // Set MediaPipe resource directory.
    mp_set_resource_dir(argv[1]);

计算图配置

// Load the binary graph and specify the input stream name. 
    std::string path = std::string(argv[1]) + "/mediapipe/modules/hand_landmark/hand_landmark_tracking_cpu.binarypb";  // model path 
    mp_instance_builder *builder = mp_create_instance_builder(path.c_str(), "image");  // 因为是c语言的库，c中没有string类型，要通过c_str将string转成指针与c兼容，实例化模型 builder

    // Configure the graph with node options and side packets. 
    mp_add_option_float(builder, "palmdetectioncpu__TensorsToDetectionsCalculator", "min_score_thresh", 0.6);  
    mp_add_option_double(builder, "handlandmarkcpu__ThresholdingCalculator", "threshold", 0.2);  
    mp_add_side_packet(builder, "num_hands", mp_create_packet_int(2)); // 模型的输入，几只手
    mp_add_side_packet(builder, "model_complexity", mp_create_packet_int(1)); // 模型输入，模型复杂度 1
    mp_add_side_packet(builder, "use_prev_landmarks", mp_create_packet_bool(true)); // 模型输入，根据前面的手来进行推理

    // Create an instance from the instance builder.通过模型构造器，构建一个模型
    mp_instance *instance = mp_create_instance(builder);
    CHECK_MP_RESULT(instance)

    // Create poller for the hand landmarks. 轮询器
    mp_poller *landmarks_poller = mp_create_poller(instance, "multi_hand_landmarks");
    CHECK_MP_RESULT(landmarks_poller)

    // Create a poller for the hand rectangles.
    mp_poller *rects_poller = mp_create_poller(instance, "hand_rects");
    CHECK_MP_RESULT(rects_poller)

    // Start the graph.
    CHECK_MP_RESULT(mp_start(instance))

这部分代码，对于第一次接触的我来说，有些难以理解，不过从我事后的经历来说，只要把这部分代码突破了，基本就突破了95%了，要读懂这部分代码，首先需要知道 mediapipe 是使用计算图框架（Computer graph framework）写的项目，计算图框架这个概念我在这里是第一次听说，但是其实之前是遇到很多次的，现在深度学习的框架基本都用到，也有些任务调度框架是使用计算图框架编写的。

在 mediapipe 里，我们首先要搞清楚下面几个概念：

packet
side packet
stream
graph
poller
option
node

其实如果不需要进行过多的深入理解细节的话，我个人是将一个完整的项目类似手部检测项目理解成一个 graph，这个 graph 是通过内部很多 node 连接而成的，这些 node 实际上就是我们平时的函数，只是在计算图框架里被抽象了 node， node 的输入输出被成为 stream，所以有 input stream 和 output stream，stream 是由一个或者多个 packet 组合而成的，而 packet 我们可以理解为我们函数的输入，类似 image，video之类的，而 side packet，虽然也是输入，但是类似于常量输入，一些配置参数，如 threshold，min_scores， num 之类的，option 和 side packet 差不多，也是一些常量，但是是属于 node 的。 poller 是作为 graph 的输出的存在，类似整个函数的输出。

有了这部分基础认知，我们能够很好的理解上面的代码了，整个流程分为下面几步：

构建器：加载 graph，指定 input stream，加载的手部 landmarks 检测和指定输入是 image
配置一些常量：这里就设定了可检测的手的数量 num_hands，模型的复杂度 model_complexity，是否根据之前的 landmarks 进行预测 use_prev_landmarks，还有中间 node 的一些阈值和最小分数。
使用构建器构建实例
创建 poller 用于接收 graph 输出：接收的是手部的 landmarks multi_hand_landmarks 和手部的举行框hand_rects。

这时候有人要问了，你怎么知道 graph 要输入什么，里面有什么参数要设置，最后输出的变量是什么？

这些都可以在 /mediapipe/mediapipe/modules/\*/\*.pbtxt 文件内找到，只要检索 input_stream，output_stream，input_side_packet 等关键字，就可以知道应该怎么对 graph 进行配置了。

结果可视化

 cv::Mat frame;
    while (true) {
        // Acquire the next video frame.
        if (!capture.read(frame)) {
            break;
        }

        // Store the frame data in an image structure.
        mp_image image;
        image.data = frame.data;
        image.width = frame.cols;
        image.height = frame.rows;
        image.format = mp_image_format_srgb; //计算图框架

        // Wrap the image in a packet and process it.
        CHECK_MP_RESULT(mp_process(instance, mp_create_packet_image(image)))

        // Wait until the image has been processed.
        CHECK_MP_RESULT(mp_wait_until_idle(instance))

        // Draw the hand landmarks onto the frame.
        if (mp_get_queue_size(landmarks_poller) > 0) {
            mp_packet *packet = mp_poll_packet(landmarks_poller);
            mp_multi_face_landmark_list *landmarks = mp_get_norm_multi_face_landmarks(packet);
            draw_landmarks(frame, landmarks);
            mp_destroy_multi_face_landmarks(landmarks);
            mp_destroy_packet(packet);
        }

        // Draw the hand rectangles onto the frame.
        if (mp_get_queue_size(rects_poller) > 0) {
            mp_packet *packet = mp_poll_packet(rects_poller);
            mp_rect_list *rects = mp_get_norm_rects(packet);
            draw_rects(frame, rects);
            mp_destroy_rects(rects);
            mp_destroy_packet(packet);
        }

        // Display the frame in a window.
        cv::imshow("MediaPipe", frame);
        if (cv::waitKey(1) == 27) {
            break;
        }
    }

这部分就是正常的视频处理部分，将视频的单帧送入 graph 中进行处理，然后通过 draw_landmarks 和 draw_rects 函数画图，然后可视化的展现到视频上，其中也包含一些 poller 数据转 packet，然后提取内部坐标的过程，但是都还是很好理解的。、

释放资源

    // Clean up resources.
    cv::destroyAllWindows();
    mp_destroy_poller(rects_poller);
    mp_destroy_poller(landmarks_poller);
    mp_destroy_instance(instance);

总结

读完这个 example.cpp 文件之后，稍微改动一下，我就顺利的跑出了自己的 face landmarks，但是有一个问题就是有几个 detection 相关的 graph，最后输出的数据格式是 mediapipe::Detection 格式的，libmediapipe 好像没有做这个转 rect 之类的接口，所以没办法直接使用。可能得自己在头文件上补上，然后重新编译才行。不过我也只需要 face landmarks 的，所以也就不管了。

通过这个项目，我也完成了 facemesh 从 python 的 demo 到 c++ 代码的实现。

代码：facemesh_example
如果对你有帮助的话，麻烦给我点个星星 ⭐