万物分割（Segment Anything Model）C++模型推理部署

知来者逆

已于 2024-09-26 14:34:24 修改

阅读量1.7k

点赞数 20

分类专栏：计算机视觉文章标签： c++ 深度学习人工智能 SAM 计算机视觉

于 2024-08-02 22:47:08 首次发布

本文链接：https://blog.csdn.net/matt45m/article/details/140881488

版权

计算机视觉专栏收录该内容

128 篇文章

订阅专栏

概述

SAM 是一种先进的人工智能模型，已经证明了在分割复杂和多样化图像方面具有优异的表现。该模型是计算机视觉和图像分割领域的一个重大突破。 SAM 的架构旨在处理各种图像分割任务，包括对象检测、实例分割和全景分割。这意味着该模型可以应用于各种用例，从医学图像分析到自主驾驶。
在这里插入图片描述

SAM 的独特之处之一是它具有执行全景分割的能力，这涉及将实例分割和语义分割相结合。实例分割涉及识别和划分图像内每个物体实例，而语义分割涉及为图像中的每个像素标记相应的类别标签。全景分割将这两种方法结合起来，以提供对图像更全面的理解。

SAM 的另一个关键特点是其灵活性。该模型可以针对特定的用例和领域进行微调，使其高度适应性。 SAM 的架构也非常高效，使其能够实时处理大量数据。这使其非常适合需要快速准确的图像分割的应用，例如安全监控、工业自动化和机器人技术。

在这里插入图片描述

SAM 如何运作：模型架构

SAM（Segment Anything Model）是用于图像分割任务的先进深度学习模型。 SAM 使用卷积神经网络（CNN）和基于 Transformer 的架构结合在一起以分层和多尺度的方式处理图像。以下是 SAM 如何工作的高级概述：

骨干网络：SAM 使用预训练的 Vision Transformer，即 ViT 作为其骨干网络。骨干网络用于从输入图像中提取特征。
特征金字塔网络（FPN）：SAM 使用特征金字塔网络（FPN）在多个尺度上生成特征映射。 FPN 是一系列卷积层，它们在不同尺度上运作，以从骨干网络的输出中提取特征。 FPN 确保 SAM 可以在不同细节层次上识别物体和边界。
解码器网络：SAM 使用解码器网络为输入图像生成分割掩模。解码器网络接受 FPN 的输出并将其上采样到原始图像大小。上采样过程使模型能够生成具有与输入图像相同分辨率的分割掩模。
基于 Transformer 的架构：SAM 还使用基于 Transformer 的架构来改进分割结果。 Transformer 是一种神经网络架构，非常有效地处理序列数据，例如文本或图像。使用基于 Transformer 的架构通过从输入图像中获取上下文信息来改进分割结果。
自监督学习：SAM 利用自监督学习从未标记的数据中学习。这涉及在大型未标记图像数据集上训练模型，以学习图像中的常见模式和特征。学习到的特征可以用于改善模型在特定图像分割任务上的性能。
全景分割：SAM 可以执行全景分割，这涉及结合实例和语义分割。实例分割涉及识别和划分图像内每个物体实例，而语义分割涉及为图像中的每个像素标记相应的类别标签。全景分割将这两种方法结合起来，以提供对图像更全面的理解。

SAM 的潜在用例

SAM（Segment Anything Model）是一种高度通用的图像分割模型，可应用于各种用例。以下是 SAM 的五个潜在用例：

自动驾驶车辆：SAM 可用于自动驾驶车辆中，以识别和分割环境中的不同物体，例如车辆、行人和路标。这些信息可用于帮助车辆做出有根据的导航和安全决策。
医学影像：SAM 可用于医学影像中，以分割图像中的不同结构和组织，例如肿瘤、血管和器官。这些信息可用于协助医生进行诊断和治疗计划。
对象检测：SAM 可用于识别和分割图像中的对象，用于对象检测任务。这可以在安全监控、工业自动化和机器人应用中很有用。
农业：SAM 可用于农业中，以监测作物的健康和生长情况。通过对田地或作物的不同区域进行分割，SAM 可以识别需要关注的区域，例如害虫侵害或营养不足的区域。
建筑工地监测：SAM 可用于监测建筑工地的进度，通过分割工地的不同组件，例如建筑物、设备和材料。这些信息可用于跟踪项目进度，确保项目按计划进行。

C++推理

ncnn

NCNN是一个为移动和嵌入式设备设计的高性能神经网络推理库，由腾讯的优图实验室（YouTu Lab）开发并开源。以下是对NCNN的简要概述：

目标：NCNN旨在提供快速、轻量级的深度学习模型部署方案，特别优化了在资源受限的设备上的性能。
性能优化：NCNN利用了多种硬件加速技术，包括NEON、Metal、OpenGL等，以实现在不同平台上的最优性能。
跨平台：支持跨平台使用，包括但不限于Android、iOS、Linux、Windows等操作系统。
模型支持：支持多种深度学习框架的模型转换，例如Caffe、TensorFlow等，方便开发者将不同来源的模型集成到NCNN中。
轻量化设计：NCNN的库文件体积小，适合移动设备和嵌入式设备，减少存储和内存占用。
灵活性：提供了灵活的输入输出接口，可以轻松地与现有的应用程序或系统进行集成。
易用性：NCNN提供了简洁的API，使得模型的加载、运行和推理过程简单明了。
硬件兼容性：针对不同的硬件平台进行了优化，包括CPU、GPU和DSP等，以充分利用各种硬件的计算能力。
社区支持：作为一个开源项目，NCNN拥有活跃的社区支持，不断有新的功能和优化被加入。
应用场景：适用于实时性要求高的场景，如视频流处理、图像识别、语音识别等。

NCNN的设计哲学是“小而美”，它专注于推理（inference）而非训练（training），并且特别注重在移动和嵌入式设备上的性能和效率。这使得NCNN成为在边缘设备上部署深度学习模型的理想选择。

C++ 推理

#include "pipeline.h"
#include <iostream>
namespace sam{
PipeLine::~PipeLine()
{

}
int PipeLine::Init(const std::string& image_encoder_param, 
    const std::string& image_encoder_bin, const std::string& mask_decoder_param,
    const std::string& mask_decoder_bin)
{
    sam_ = std::make_shared<SegmentAnything>();
    int ret = sam_->Load(image_encoder_param,image_encoder_bin,mask_decoder_param,mask_decoder_bin);
    return ret;
}

int PipeLine::ImageEmbedding(const cv::Mat& bgr, pipeline_result_t& pipeline_result)
{
    std::cout << "start image encoder..." << std::endl;
    sam_->ImageEncoder(bgr, pipeline_result.image_embeddings, pipeline_result.image_info);
    std::cout << "finish image encoder..." << std::endl;

    return 0;
}

int PipeLine::AutoPredict(const cv::Mat& bgr, pipeline_result_t& pipeline_result, int n_per_side)
{
    pipeline_result.prompt_info.prompt_type = PromptType::Point;

    //generate grid points
    std::vector<float> points_xy_vec;
    get_grid_points(points_xy_vec, n_per_side);

    std::vector<sam_result_t> proposals;
    for(int i = 0; i < n_per_side; ++i) {
        std::vector<sam_result_t> objects;
        for(int j = 0; j < n_per_side; ++j) {
            pipeline_result.prompt_info.points.clear();
            pipeline_result.prompt_info.labels.clear();
            pipeline_result.prompt_info.points.push_back(points_xy_vec[i * n_per_side * 2 + 2 * j] * pipeline_result.image_info.img_w);
            pipeline_result.prompt_info.points.push_back(points_xy_vec[i * n_per_side * 2 + 2 * j + 1] * pipeline_result.image_info.img_h);
            
            pipeline_result.prompt_info.points.push_back(0);
            pipeline_result.prompt_info.points.push_back(0);

            pipeline_result.prompt_info.labels.push_back(1);
            pipeline_result.prompt_info.labels.push_back(-1);

            sam_->MaskDecoder(pipeline_result.image_embeddings, pipeline_result.image_info, pipeline_result.prompt_info, objects);
        }
        proposals.insert(proposals.end(), objects.begin(), objects.end());
        std::cout<<"processing: "<< i <<"/"<<n_per_side<<std::endl;
    }

    std::vector<int> picked;
    sam_->NMS(bgr, proposals, picked);
    int num_picked = picked.size();
    
    for(int j = 0; j < num_picked; ++j){
        pipeline_result.sam_result.push_back(proposals[picked[j]]);
    }
    
    return 0;
}


int PipeLine::Predict(const cv::Mat& bgr, pipeline_result_t& pipeline_result)
{
    sam_->MaskDecoder(pipeline_result.image_embeddings, pipeline_result.image_info, pipeline_result.prompt_info, pipeline_result.sam_result);
    return 0;
}


void PipeLine::Draw(const cv::Mat& bgr, const pipeline_result_t& pipeline_result)
{
    static const unsigned char colors[81][3] = {
            {56,  0,   255},
            {226, 255, 0},
            {0,   94,  255},
            {0,   37,  255},
            {0,   255, 94},
            {255, 226, 0},
            {0,   18,  255},
            {255, 151, 0},
            {170, 0,   255},
            {0,   255, 56},
            {255, 0,   75},
            {0,   75,  255},
            {0,   255, 169},
            {255, 0,   207},
            {75,  255, 0},
            {207, 0,   255},
            {37,  0,   255},
            {0,   207, 255},
            {94,  0,   255},
            {0,   255, 113},
            {255, 18,  0},
            {255, 0,   56},
            {18,  0,   255},
            {0,   255, 226},
            {170, 255, 0},
            {255, 0,   245},
            {151, 255, 0},
            {132, 255, 0},
            {75,  0,   255},
            {151, 0,   255},
            {0,   151, 255},
            {132, 0,   255},
            {0,   255, 245},
            {255, 132, 0},
            {226, 0,   255},
            {255, 37,  0},
            {207, 255, 0},
            {0,   255, 207},
            {94,  255, 0},
            {0,   226, 255},
            {56,  255, 0},
            {255, 94,  0},
            {255, 113, 0},
            {0,   132, 255},
            {255, 0,   132},
            {255, 170, 0},
            {255, 0,   188},
            {113, 255, 0},
            {245, 0,   255},
            {113, 0,   255},
            {255, 188, 0},
            {0,   113, 255},
            {255, 0,   0},
            {0,   56,  255},
            {255, 0,   113},
            {0,   255, 188},
            {255, 0,   94},
            {255, 0,   18},
            {18,  255, 0},
            {0,   255, 132},
            {0,   188, 255},
            {0,   245, 255},
            {0,   169, 255},
            {37,  255, 0},
            {255, 0,   151},
            {188, 0,   255},
            {0,   255, 37},
            {0,   255, 0},
            {255, 0,   170},
            {255, 0,   37},
            {255, 75,  0},
            {0,   0,   255},
            {255, 207, 0},
            {255, 0,   226},
            {255, 245, 0},
            {188, 255, 0},
            {0,   255, 18},
            {0,   255, 75},
            {0,   255, 151},
            {255, 56,  0},
            {245, 255, 0}
    };

    cv::Mat img = bgr.clone();

    for(size_t n = 0; n < pipeline_result.sam_result.size(); ++n)
    {
        for (int y = 0; y < img.rows; ++y) {
            uchar* image_ptr = img.ptr(y);
            const uchar* mask_ptr = pipeline_result.sam_result[n].mask.ptr<uchar>(y);
            for (int x = 0; x < img.cols; ++x) {
                if (mask_ptr[x] > 0)
                {
                    image_ptr[0] = cv::saturate_cast<uchar>(image_ptr[0] * 0.5 + colors[n][0] * 0.5);
                    image_ptr[1] = cv::saturate_cast<uchar>(image_ptr[1] * 0.5 + colors[n][1] * 0.5);
                    image_ptr[2] = cv::saturate_cast<uchar>(image_ptr[2] * 0.5 + colors[n][2] * 0.5);
                }
                image_ptr += 3;
            }
        }

        //cv::rectangle(img, pipeline_result.sam_result[n].box, cv::Scalar(0,255,0), 2, 8,0);

        switch(pipeline_result.prompt_info.prompt_type)
        {
            case PromptType::Point:
                for(int i = 0; i < pipeline_result.prompt_info.points.size() / 2; ++i)
                {
                    cv::circle(img, cv::Point(pipeline_result.prompt_info.points[2 * i], pipeline_result.prompt_info.points[2 * i + 1]), 5, cv::Scalar(255,255,0),2,8);
                }
                break;
            case PromptType::Box:
                cv::rectangle(img, cv::Rect(cv::Point(pipeline_result.prompt_info.points[0], pipeline_result.prompt_info.points[1]), cv::Point(pipeline_result.prompt_info.points[2], pipeline_result.prompt_info.points[3])), cv::Scalar(255,255,0),2,8);
                break;
            default:
                break;
        }
    }

    cv::imshow("dst", img);
    //cv::imshow("mask", pipeline_result.sam_result.mask);
    cv::imwrite("dst.jpg",img);
    cv::waitKey();
}

void PipeLine::get_grid_points(std::vector<float>& points_xy_vec, int n_per_side)
{
    float offset = 1.f / (2 * n_per_side);
    
    float start = offset;
    float end = 1 - offset;
    float step = (end - start) / (n_per_side - 1);

    std::vector<float> points_one_side;
    for (int i = 0; i < n_per_side; ++i) {
        points_one_side.push_back(start + i * step);
    }

    points_xy_vec.resize(n_per_side * n_per_side * 2);
    for (int i = 0; i < n_per_side; ++i) {
        for (int j = 0; j < n_per_side; ++j) {
            points_xy_vec[i * n_per_side * 2 + 2 * j + 0] = points_one_side[j];
            points_xy_vec[i * n_per_side * 2 + 2 * j + 1] = points_one_side[i];
        }
    }
}

}

#include "segment_anything.h"

namespace sam
{
SegmentAnything::~SegmentAnything()
{
    image_encoder_net_.clear();
    mask_decoder_net_.clear();
}

static inline float intersection_area(const sam_result_t& a, const sam_result_t& b)
{
    cv::Rect_<float> inter = a.box & b.box;
    return inter.area();
}

static void qsort_descent_inplace(std::vector<sam_result_t>& faceobjects, int left, int right)
{
    int i = left;
    int j = right;
    float p = faceobjects[(left + right) / 2].iou_pred;

    while (i <= j)
    {
        while (faceobjects[i].iou_pred > p)
            i++;

        while (faceobjects[j].iou_pred < p)
            j--;

        if (i <= j)
        {
            // swap
            std::swap(faceobjects[i], faceobjects[j]);

            i++;
            j--;
        }
    }

    #pragma omp parallel sections
    {
        #pragma omp section
        {
            if (left < j) qsort_descent_inplace(faceobjects, left, j);
        }
        #pragma omp section
        {
            if (i < right) qsort_descent_inplace(faceobjects, i, right);
        }
    }
}

static void qsort_descent_inplace(std::vector<sam_result_t>& faceobjects)
{
    if (faceobjects.empty())
        return;

    qsort_descent_inplace(faceobjects, 0, faceobjects.size() - 1);
}

static void nms_sorted_bboxes(const cv::Mat& bgr,const std::vector<sam_result_t>& faceobjects, std::vector<int>& picked, float nms_threshold)
{
    picked.clear();

    const int n = faceobjects.size();

    std::vector<float> areas(n);
    for (int i = 0; i < n; i++)
    {
        areas[i] = faceobjects[i].box.area();
    }
    cv::Mat img = bgr.clone();
    for (int i = 0; i < n; i++)
    {
        const sam_result_t& a = faceobjects[i];

        int keep = 1;
        for (int j = 0; j < (int)picked.size(); j++)
        {
            const sam_result_t& b = faceobjects[picked[j]];

            // intersection over union
            float inter_area = intersection_area(a, b);
            float union_area = areas[i] + areas[picked[j]] - inter_area;
            // float IoU = inter_area / union_area
            if (inter_area / union_area > nms_threshold){
                keep = 0;
            }
                
        }

        if (keep)
            picked.push_back(i);
    }
}
int SegmentAnything::NMS(const cv::Mat& bgr, std::vector<sam_result_t>& proposals, std::vector<int>& picked, float nms_threshold)
{
    qsort_descent_inplace(proposals);
    nms_sorted_bboxes(bgr, proposals, picked, nms_threshold);
    
    return 0;
}

int SegmentAnything::Load(const std::string& image_encoder_param, const std::string& image_encoder_bin, const std::string& mask_decoder_param, const std::string& mask_decoder_bin)
{
    int ret = 0;
    ret = image_encoder_net_.load_param(image_encoder_param.c_str());
    if (ret < 0)
        return -1;
    ret = image_encoder_net_.load_model(image_encoder_bin.c_str());
    if (ret < 0)
        return -1;
    ret = mask_decoder_net_.load_param(mask_decoder_param.c_str());
    if (ret < 0)
        return -1;
    ret = mask_decoder_net_.load_model(mask_decoder_bin.c_str());
    if (ret < 0)
        return -1;

    return 0;
}
int SegmentAnything::ImageEncoder(const cv::Mat& bgr, ncnn::Mat& image_embeddings, image_info_t& image_info)
{
    const int target_size = 1024;
    int img_w = bgr.cols;
    int img_h = bgr.rows;

    int w = img_w;
    int h = img_h;
    float scale = 1.f;
    if (w > h)
    {
        scale = (float)target_size / w;
        w = target_size;
        h = h * scale;
    }
    else
    {
        scale = (float)target_size / h;
        h = target_size;
        w = w * scale;
    }

    ncnn::Mat in = ncnn::Mat::from_pixels_resize(bgr.data, ncnn::Mat::PIXEL_BGR2RGB, img_w, img_h, w, h);

    int wpad = target_size - w;
    int hpad = target_size - h;
    ncnn::Mat in_pad;
    ncnn::copy_make_border(in, in_pad, 0, hpad, 0, wpad, ncnn::BORDER_CONSTANT, 0.f);

    in_pad.substract_mean_normalize(means_, norms_);

    ncnn::Extractor image_encoder_ex = image_encoder_net_.create_extractor();

    image_encoder_ex.input("image", in_pad);
    image_encoder_ex.extract("image_embeddings", image_embeddings);

    image_info.img_h = img_h;
    image_info.img_w = img_w;
    image_info.pad_h = h;
    image_info.pad_w = w;
    image_info.scale = scale;

    return 0;
}

int SegmentAnything::embed_masks(const prompt_info_t& prompt_info, ncnn::Mat& mask_input, ncnn::Mat& has_mask)
{
    mask_input = ncnn::Mat(256, 256, 1);
    mask_input.fill(0.f);
    has_mask = ncnn::Mat(1);
    has_mask.fill(0.f);

    return 0;
}
int SegmentAnything::transform_coords(const image_info_t& image_info, ncnn::Mat& point_coords)
{
    for(int h = 0; h < point_coords.h; ++h){
        float* ptr = point_coords.row(h);
        ptr[0] *= image_info.scale;
        ptr[1] *= image_info.scale;
    }

    return 0;
}
int SegmentAnything::embed_points(const prompt_info_t& prompt_info, std::vector<ncnn::Mat>& point_labels, ncnn::Mat& point_coords)
{
    int num_points = prompt_info.points.size() / 2;
    point_coords = ncnn::Mat(num_points * 2, (void*)prompt_info.points.data()).reshape(2, num_points).clone();

    ncnn::Mat point_labels1 = ncnn::Mat(256, num_points);
    ncnn::Mat point_labels2 = ncnn::Mat(256, num_points);
    ncnn::Mat point_labels3 = ncnn::Mat(256, num_points);
    ncnn::Mat point_labels4 = ncnn::Mat(256, num_points);
    ncnn::Mat point_labels5 = ncnn::Mat(256, num_points);
    ncnn::Mat point_labels6 = ncnn::Mat(256, num_points);

    point_labels1.row_range(0, num_points - 1).fill(1.f);
    point_labels1.row_range(num_points - 1, 1).fill(0.f);

    for (int i = 0; i < num_points - 1; ++i) {
        if (prompt_info.labels[i] == -1)
            point_labels2.row_range(i, 1).fill(1.f);
        else
            point_labels2.row_range(i, 1).fill(0.f);
    }
    point_labels2.row_range(num_points - 1, 1).fill(1.f);

    for (int i = 0; i < num_points - 1; ++i) {
        if (prompt_info.labels[i] == 0)
            point_labels3.row_range(i, 1).fill(1.f);
        else
            point_labels3.row_range(i, 1).fill(0.f);
    }
    point_labels3.row_range(num_points - 1, 1).fill(0.f);

    for (int i = 0; i < num_points - 1; ++i) {
        if (prompt_info.labels[i] == 1)
            point_labels4.row_range(i, 1).fill(1.f);
        else
            point_labels4.row_range(i, 1).fill(0.f);
    }
    point_labels4.row_range(num_points - 1, 1).fill(0.f);

    for (int i = 0; i < num_points - 1; ++i) {
        if (prompt_info.labels[i] == 2)
            point_labels5.row_range(i, 1).fill(1.f);
        else
            point_labels5.row_range(i, 1).fill(0.f);
    }
    point_labels5.row_range(num_points - 1, 1).fill(0.f);

    for (int i = 0; i < num_points - 1; ++i) {
        if (prompt_info.labels[i] == 3)
            point_labels6.row_range(i, 1).fill(1.f);
        else
            point_labels6.row_range(i, 1).fill(0.f);
    }
    point_labels6.row_range(num_points - 1, 1).fill(0.f);

    point_labels.push_back(point_labels1);
    point_labels.push_back(point_labels2);
    point_labels.push_back(point_labels3);
    point_labels.push_back(point_labels4);
    point_labels.push_back(point_labels5);
    point_labels.push_back(point_labels6);

    return 0;
}
int SegmentAnything::MaskDecoder(const ncnn::Mat& image_embeddings, image_info_t& image_info, 
    const prompt_info_t& prompt_info, std::vector<sam_result_t>& sam_results, float pred_iou_thresh, float stability_score_thresh)
{
    std::vector<ncnn::Mat> point_labels;
    ncnn::Mat point_coords;
    embed_points(prompt_info, point_labels, point_coords);

    transform_coords(image_info, point_coords);

    ncnn::Mat mask_input, has_mask;
    embed_masks(prompt_info, mask_input, has_mask);

    ncnn::Extractor mask_decoder_ex = mask_decoder_net_.create_extractor();
    mask_decoder_ex.input("mask_input", mask_input);
    mask_decoder_ex.input("point_coords", point_coords);
    mask_decoder_ex.input("point_labels1", point_labels[0]);
    mask_decoder_ex.input("point_labels2", point_labels[1]);
    mask_decoder_ex.input("point_labels3", point_labels[2]);
    mask_decoder_ex.input("point_labels4", point_labels[3]);
    mask_decoder_ex.input("point_labels5", point_labels[4]);
    mask_decoder_ex.input("point_labels6", point_labels[5]);
    mask_decoder_ex.input("image_embeddings", image_embeddings);
    mask_decoder_ex.input("has_mask_input", has_mask);

    ncnn::Mat scores;
    mask_decoder_ex.extract("scores", scores);

    ncnn::Mat masks;
    mask_decoder_ex.extract("masks", masks);

    //postprocess
    std::vector<std::pair<float, int>> scores_vec;
    for (int i = 1; i < scores.w; ++i) {
        scores_vec.push_back(std::pair<float, int>(scores[i], i));
    }

    std::sort(scores_vec.begin(), scores_vec.end(), std::greater<std::pair<float, int>>());

    if (scores_vec[0].first > pred_iou_thresh) {
        sam_result_t sam_result;
        ncnn::Mat mask = masks.channel(scores_vec[0].second);
        cv::Mat cv_mask_32f = cv::Mat::zeros(cv::Size(mask.w, mask.h), CV_32F);
        std::copy((float*)mask.data, (float*)mask.data + mask.w * mask.h, (float*)cv_mask_32f.data);
        
        cv::Mat single_mask_32f;
        cv::resize(cv_mask_32f(cv::Rect(0, 0, image_info.pad_w, image_info.pad_h)), single_mask_32f, cv::Size(image_info.img_w,image_info.img_h), 0, 0, 1);

        float stable_score = calculate_stability_score(single_mask_32f);
        if (stable_score < stability_score_thresh)
            return -1;

        single_mask_32f = single_mask_32f > 0;
        single_mask_32f.convertTo(sam_result.mask, CV_8UC1, 1, 0);
        
        if (postprocess_mask(sam_result.mask, sam_result.box) < 0)
            return -1;

        sam_results.push_back(sam_result);
    }
    else {
        return -1;
    }

    return 0;
}
int SegmentAnything::postprocess_mask(cv::Mat& mask, cv::Rect& box)
{
    std::vector<std::vector<cv::Point>> contours;
    std::vector<cv::Vec4i> hierarchy;
    cv::findContours(mask.clone(), contours, hierarchy, cv::RETR_EXTERNAL, cv::CHAIN_APPROX_SIMPLE);
    if(contours.size() == 0)
        return -1;

    if (contours.size() > 1) {
        float max_area = 0;
        int max_idx = 0;
        std::vector<std::pair<float,int>> areas;
        for (size_t i = 0; i < contours.size(); ++i) {
            float area = cv::contourArea(contours[i]);
            if (area > max_area) {
                max_idx = i;
                max_area = area;
            }
            areas.push_back(std::pair<float,int>(area,i));
        }
        
        for (size_t i = 0; i < areas.size(); ++i) {
            //if (i == max_idx)
            //    continue;
            //else {
            //    cv::drawContours(mask, contours, i, cv::Scalar(0), -1);
            //}
            if(areas[i].first < max_area * 0.3){
                cv::drawContours(mask, contours, i, cv::Scalar(0), -1);
            }
            else{
                box = box | cv::boundingRect(contours[i]);
            }
        }
    }
    else {
        box = cv::boundingRect(contours[0]);
    }
    return 0;
}
float SegmentAnything::calculate_stability_score(cv::Mat& mask, float mask_threshold, float stable_score_offset)
{
    float intersections = (float)cv::countNonZero(mask > (mask_threshold + stable_score_offset));
    float unions = (float)cv::countNonZero(mask > (mask_threshold - stable_score_offset));
    
    return intersections / unions;
}
}

调用模型

#include "pipeline.h"
#include <iostream>

int main()
{
    int type = 1;
    cv::Mat bgr = cv::imread("2.jpg");

    std::shared_ptr<sam::PipeLine> pipe(new sam::PipeLine());

    pipe->Init("models/encoder-matmul.param","models/encoder-matmul.bin", 
        "models/decoder.param", "models/decoder.bin");
    

    pipeline_result_t pipe_result;
    pipe->ImageEmbedding(bgr, pipe_result);
    switch (type)
    {
    case 1://automatic mask
        pipe_result.sam_result.clear();
        pipe_result.prompt_info.points.clear();
        pipe_result.prompt_info.labels.clear();
        pipe->AutoPredict(bgr, pipe_result);
        pipe->Draw(bgr, pipe_result);
        break;
    case 2://prompt input: points
        pipe_result.prompt_info.prompt_type = PromptType::Point;
        pipe_result.prompt_info.points.push_back(497);
        pipe_result.prompt_info.points.push_back(220);
        pipe_result.prompt_info.points.push_back(455);
        pipe_result.prompt_info.points.push_back(294);
        pipe_result.prompt_info.points.push_back(0);
        pipe_result.prompt_info.points.push_back(0);

        pipe_result.prompt_info.labels.push_back(1);
        pipe_result.prompt_info.labels.push_back(1);
        pipe_result.prompt_info.labels.push_back(-1);

        pipe->Predict(bgr, pipe_result);

        pipe->Draw(bgr, pipe_result);
        break;


    case 3://prompt input: box
        pipe_result.prompt_info.prompt_type = PromptType::Box;
		pipe_result.prompt_info.points.push_back(344);
		pipe_result.prompt_info.points.push_back(144);
		pipe_result.prompt_info.points.push_back(607);
		pipe_result.prompt_info.points.push_back(582);
		pipe_result.prompt_info.points.push_back(0);
		pipe_result.prompt_info.points.push_back(0);

		pipe_result.prompt_info.labels.push_back(2);
		pipe_result.prompt_info.labels.push_back(3);
		pipe_result.prompt_info.labels.push_back(-1);

		pipe->Predict(bgr, pipe_result);
		pipe->Draw(bgr, pipe_result);
        break;
    default:
        break;
    }
  
    return 0;
}