打工日志11.25

最新推荐文章于 2024-12-31 20:00:00 发布

qq_22549311

最新推荐文章于 2024-12-31 20:00:00 发布

阅读量110

点赞数

分类专栏：打工日志

本文链接：https://blog.csdn.net/qq_22549311/article/details/110121575

版权

打工日志专栏收录该内容

3 篇文章

订阅专栏

在API优化过程中发现，接口初次调用耗时较长，原因是GPU首次计算时间较长，可能是GPU初始化过程导致。通过预执行GPU操作进行初始化，可以解决这个问题。虽然对多数应用影响不大，但对于线上接口服务，首次调用时延尤其重要，建议在服务启动时预先初始化GPU。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

今天在进行API优化排查的时候发现接口在第一次进行调用的时候的运行耗时明显高于之后的调用。

后来经过排查，其实是因为GPU的第一次调用计算会明显高于之后的计算时间。我觉得可能是因为GPU要初始化的原因？

下面做一个简单的小实验，像这样子的，循环计算多次

import torch
import time

if __name__ == '__main__':
    device = torch.device("cuda:1" if torch.cuda.is_available() else "cpu")
    for i in range(0, 100):
        st = time.time()
        a1 = torch.rand(9999, 9999).to(device)
        b1 = torch.rand(9999, 9999).to(device)
        torch.mul(a1, b1)
        et = time.time()
        print('In computation {}, the computing operation costs {}ms'.format(str(i), (et-st)))
        time.sleep(2)

输出如下：

In computation 0, the computing operation costs 4.841561555862427ms
In computation 1, the computing operation costs 1.8976020812988281ms
In computation 2, the computing operation costs 1.9691121578216553ms
In computation 3, the computing operation costs 1.8744938373565674ms
In computation 4, the computing operation costs 1.8520243167877197ms
...

但是只要在前面随机进行一下gpu的操作，让GPU初始化完成，时间就正常了。。

import torch
import time
import logging

if __name__ == '__main__':
    device = torch.device("cuda:1" if torch.cuda.is_available() else "cpu")
    before_laodd = torch.rand(9999, 9999).to(device)
    for i in range(0, 100):
        st = time.time()
        a1 = torch.rand(9999, 9999).to(device)
        b1 = torch.rand(9999, 9999).to(device)
        torch.mul(a1, b1)
        et = time.time()
        print('In computation {}, the computing operation costs {}ms'.format(str(i), (et-st)))
        time.sleep(2)

output:

In computation 0, the computing operation costs 1.9176990985870361ms
In computation 1, the computing operation costs 1.9287288188934326ms
In computation 2, the computing operation costs 1.8840551376342773ms
...

这个看似好像对实际应用没啥影响，但是因为我们是做线上的接口服务，大部分用户在启动服务之后，可能就只调用一次，那么首次的时延就显得非常重要。
做接口的时候，启动服务时要顺便初始化一下GPU呀