TensorRT 分析engine的每一层的耗时

最新推荐文章于 2025-04-12 11:39:32 发布

转载最新推荐文章于 2025-04-12 11:39:32 发布 · 1.5k 阅读

2 ·

CC 4.0 BY-SA版权

原文链接：https://blog.csdn.net/hjxu2016/article/details/109258566

TensorRT 专栏收录该内容

20 篇文章

订阅专栏

该博客介绍了如何实现一个用于记录和分析深度学习模型中各层运行时间的Profiler类。Profiler类继承自IProfiler接口，包含一个Record类型的vector来存储层名及其运行时间。通过reportLayerTime方法，可以将每层的运行时间累加到相应的记录中。printLayerTimes方法则用于打印所有层的平均运行时间，提供模型性能的直观评估。该工具对于理解和优化深度学习模型的计算效率非常有用。

转载自：https://blog.csdn.net/hjxu2016/article/details/109258566

核心代码如下

// profile类，继承自 IProfiler
struct Profiler : public IProfiler
{
    typedef std::pair<std::string, float> Record;
    std::vector<Record> mProfile;
    // 将每一层的运行时间存放到 vector中
    virtual void reportLayerTime(const char* layerName, float ms)
    {
        // find_if找到第一个 r.first 与 layerName 相同的层，返回一个迭代器
        auto record = std::find_if(mProfile.begin(), mProfile.end(), [&](const Record& r){ return r.first == layerName; });
        // 如果是新的层就push_back进vector
        if (record == mProfile.end())
            mProfile.push_back(std::make_pair(layerName, ms));
        // 如果是vector中已有的层就直接累加时间，因为他是迭代1000次的，肯定会重复，所以要累加时间
        else
            record->second += ms;
    }
    // 打印各层的运行时间，打印时要除掉 总的迭代次数
    void printLayerTimes()
    {
        float totalTime = 0;
        for (size_t i = 0; i < mProfile.size(); i++)
        {
            printf("%-40.40s %4.3fms\n", mProfile[i].first.c_str(), mProfile[i].second / TIMING_ITERATIONS);
            totalTime += mProfile[i].second;
        }
        printf("Time over all layers: %4.3f\n", totalTime / TIMING_ITERATIONS);
    }
} gProfiler;
————————————————
版权声明：本文为CSDN博主「hjxu2016」的原创文章，遵循CC 4.0 BY-SA版权协议，转载请附上原文出处链接及本声明。
原文链接：https://blog.csdn.net/hjxu2016/article/details/109258566