Nvidia Deepstream极致细节:1. Deepstream Python 官方案例1:deepstream_test_1
此博客将详细解释如何使用Deepstream中的元素搭建起一套完整的简单的机器视觉流程,包括h264parse
, nvv4l2decoder
, nvstreammux
, nvinfer
, nvvideoconvert
, nvdsosd
, nvegltransform
, nveglglessink
。当我们成功运行了这个Python文件后,我们会在屏幕上看见经过识别后的一段视频。此章节将详细对deepstream_test_1.py
作解读。可能会较为繁琐。但此案例后,其他的一些案例就会简单许多。
喜欢的朋友收藏点赞哦。
文章目录
1. 如何运行
首先来解决一个大家最关心的问题:如何把这个py程序跑起来。如果还没有成功安装Deepstream6.0以及Deepstream Python 1.1.0的同学,请查看此系列第一篇博文:Nvidia Deepstream极致细节:Deepstream 6.0以及Deepstream Python 1.1.0的安装。
安装成功后的第一个案例就是运行此程序。Terminal commands如下:
cd /opt/nvidia/deepstream/deepstream-6.0/sources/apps/deepstream_python_apps
python3 deepstream_test_1.py /opt/nvidia/deepstream/deepstream-6.0/samples/streams/sample_qHD.h264
我们来看一下Terminal的记录:
Creating Pipeline
Creating Source
Creating H264Parser
Creating Decoder
Creating EGLSink
Playing file /opt/nvidia/deepstream/deepstream-6.0/samples/streams/sample_qHD.h264
Adding elements to Pipeline
Linking elements in the Pipeline
Starting pipeline
Using winsys: x11
Opening in BLOCKING MODE
0:00:00.969568613 14913 0x2705b290 WARN nvinfer gstnvinfer.cpp:635:gst_nvinfer_logger:<primary-inference> NvDsInferContext[UID 1]: Warning from NvDsInferContextImpl::initialize() <nvdsinfer_context_impl.cpp:1161> [UID = 1]: Warning, OpenCV has been deprecated. Using NMS for clustering instead of cv::groupRectangles with topK = 20 and NMS Threshold = 0.5
ERROR: Deserialize engine failed because file path: /opt/nvidia/deepstream/deepstream-6.0/models open error
0:00:04.349072641 14913 0x2705b290 WARN nvinfer gstnvinfer.cpp:635:gst_nvinfer_logger:<primary-inference> NvDsInferContext[UID 1]: Warning from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:1889> [UID = 1]: deserialize engine from file :/opt/nvidia/deepstream/deepstream-6.0/models failed
0:00:04.357956347 14913 0x2705b290 WARN nvinfer gstnvinfer.cpp:635:gst_nvinfer_logger:<primary-inference> NvDsInferContext[UID 1]: Warning from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:1996> [UID = 1]: deserialize backend context from engine from file :/opt/nvidia/deepstream/deepstream-6.0/models failed, try rebuild
0:00:04.358024141 14913 0x2705b290 INFO nvinfer gstnvinfer.cpp:638:gst_nvinfer_logger:<primary-inference> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:1914> [UID = 1]: Trying to create engine from model files
WARNING: [TRT]: Detected invalid timing cache, setup a local cache instead
0:00:46.474632523 14913 0x2705b290 INFO nvinfer gstnvinfer.cpp:638:gst_nvinfer_logger:<primary-inference> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:1947> [UID = 1]: serialize cuda engine to file: /opt/nvidia/deepstream/deepstream-6.0/samples/models/Primary_Detector/resnet10.caffemodel_b1_gpu0_int8.engine successfully
INFO: [Implicit Engine Info]: layers num: 3
0 INPUT kFLOAT input_1 3x368x640
1 OUTPUT kFLOAT conv2d_bbox 16x23x40
2 OUTPUT kFLOAT conv2d_cov/Sigmoid 4x23x40
0:00:46.557861268 14913 0x2705b290 INFO nvinfer gstnvinfer_impl.cpp:313:notifyLoadModelStatus:<primary-inference> [UID 1]: Load new model:dstest1_pgie_config.txt sucessfully
NvMMLiteOpen : Block : BlockType = 261
NVMEDIA: Reading vendor.tegra.display-size : status: 6
NvMMLiteBlockCreate : Block : BlockType = 261
Frame Number=0 Number of Objects=6 Vehicle_count=5 Person_count=1
我们先不管其中的细节,但看最前面几行:
Creating Pipeline
Creating Source
Creating H264Parser
Creating Decoder
Creating EGLSink
Playing file /opt/nvidia/deepstream/deepstream-6.0/samples/streams/sample_qHD.h264
Adding elements to Pipeline
Linking elements in the Pipeline
Starting pipeline
这就是GStreamer的一个pipeline了。(Deepstream是基于Gstreamer的基础写的)
而最后一行
Frame Number=0 Number of Objects=6 Vehicle_count=5 Person_count=1
其实就开始打印和记数识别到的人和车了。
2. Pipeline可视化
在博文Nvidia Deepstream小细节系列:Deepstream python保存pipeline结构图中,我们详细解释了如何制作pipeline结构图。下图展示了我们这个案例完整的pipeline。
3. 逐段理解
3.1 Pipeline的初始化以及新建视频源
# Standard GStreamer initialization
GObject.threads_init()
Gst.init(None)
# Create gstreamer elements
# Create Pipeline element that will form a connection of other elements
print("Creating Pipeline \n ")
pipeline = Gst.Pipeline()
if not pipeline:
sys.stderr.write(" Unable to create Pipeline \n")
# Source element for reading from the file
print("Creating Source \n ")
source = Gst.ElementFactory.make("filesrc", "file-source")
if not source:
sys.stderr.write(" Unable to create Source \n")
source.set_property('location', args[1])
关于管道(pipeline)的理解:我们可以参考此链接。其实管道就是一个容器,你可以把其他对象放进去,这样就会有多媒体流在管道里了。
关于Gst.ElementFactory.make
:请参考此链接。官方的定义是:Create a new element of the type defined by the given element factory.在这个例子中,我们只是调用了filesrc
。从上面的结构图中,我们可以清晰地看到视频源的路径。(当前看起来很简单,但随着案例的复杂化,我们会看到这张图变得越来越庞大,也越来越有趣)
3.2 解码
# Since the data format in the input file is elementary h264 stream,
# we need a h264parser
print("Creating H264Parser \n")
h264parser = Gst.ElementFactory.make("h264parse", "h264-parser")
if not h264parser:
sys.stderr.write(" Unable to create h264 parser \n")
# Use nvdec_h264 for hardware accelerated decode on GPU
print("Creating Decoder \n")
decoder = Gst.ElementFactory.make("nvv4l2decoder", "nvv4l2-decoder")
if not decoder:
sys.stderr.write(" Unable to create Nvv4l2 Decoder \n")
我们看到,实际上输入的视频编码是.h264
格式,所以,这里我们县需要让它通过一个h264parse
模块。这个模块的定义我实在没找到,我们直接用就可以了。当然,除了h264parse
,也有h265parse
,取决于我们输入的视频格式。
然后,我们使其通过nvv4l2decoder
模块。关于这个模块的解释以及案例,请参见这个文档。
如果我们去细究nvv4l2decoder
,其实是很有意思。
首先,我们看上面pipeline这张图,会发现nvv4l2decoder
有一些properties,比如drop-frame-interval
,num-extra-surfaces
。我们在这个网站找到了nvv4l2decoder
的继承关系
GObject
+----GInitiallyUnowned
+----GstObject
+----GstElement
+----GstVideoDecoder
+----GstNvV4l2VideoDec
+----nvv4l2decoder
也看到相关的解释:
drop-frame-interval : Interval to drop the frames,ex: value of 5 means every 5th frame will be given by decoder, rest all dropped
flags: readable, writable, changeable only in NULL or READY state
Unsigned Integer. Range: 0 - 30 Default: 0
num-extra-surfaces : Additional number of surfaces in addition to min decode surfaces given by the v4l2 driver
flags: readable, writable, changeable only in NULL or READY state
Unsigned Integer. Range: 0 - 24 Default: 0
我们在这个网站找到了源代码,与之相关的代码如下:
g_object_class_install_property (gobject_class, PROP_DROP_FRAME_INTERVAL,
g_param_spec_uint ("drop-frame-interval",
"Drop frames interval",
"Interval to drop the frames,ex: value of 5 means every 5th frame will be given by decoder, rest all dropped",
0,
30, 30,
G_PARAM_READWRITE | G_PARAM_STATIC_STRINGS | GST_PARAM_MUTABLE_READY));
g_object_class_install_property (gobject_class, PROP_NUM_EXTRA_SURFACES,
g_param_spec_uint ("num-extra-surfaces",
"Number of extra surfaces",
"Additional number of surfaces in addition to min decode surfaces given by the v4l2 driver",
0,
24, 24,
G_PARAM_READWRITE | G_PARAM_STATIC_STRINGS | GST_PARAM_MUTABLE_READY));
感兴趣的朋友可以字形查阅更多细节。在Nidia Deepstream官方文档中,有一个类似的插件/plugin:Gst-nvvideo4linux2
(链接)。看起来是基于nvv4l2decoder
上面又加了一层,但好像没人用。
如果需要设置比如drop-frame-interval
参数,只需要写一行decoder.set_property('drop-frame-interval', 15)
即可(这个15
是自己定义的)。不要小看这个参数,通过这个参数的调整,我们可以调整整个输出视频的fps。
3.3 Stream Mux
# Create nvstreammux instance to form batches from one or more sources.
streammux = Gst.ElementFactory.make("nvstreammux", "Stream-muxer")
if not streammux:
sys.stderr.write(" Unable to create NvStreamMux \n")
streammux.set_property('width', 1920)
streammux.set_property('height', 1080)
streammux.set_property('batch-size', 1)
streammux.set_property('batched-push-timeout', int(1000000/30))
nvstreammux
的详细解释见此链接。
在这个例子中,nvstreammux
作用还不是很明显,因为就一个输入。但如果我们有8路摄像头输入,那么这个时候我们就能明显看到,nvstreammux
将这8路视频的每一桢集起来,形成一个个batch,发送到下一个模块。官方对于nvstreammux
的定义是:The Gst-nvstreammux plugin forms a batch of frames from multiple input sources.
。下面这张图可以很明显地看到其作用:
在官方文档中,目前有两个nvstreammux
:Gst-nvstreammux
以及Gst-nvstreammux New
。
**我们先来看Gst-nvstreammux
这个插件。**引用一下官方文档:
The muxer forms a batched buffer of batch-size frames. (batch-size is specified using the gst object property.) If the muxer’s output format and input format are the same, the muxer forwards the frames from that source as a part of the muxer’s output batched buffer.
也就是说,nvstreammux
输入的是buffer
,输出的是batched buffer
。一般来说,batch
的值和输入的视频流的数量是一致的。我们可以通过streammux.set_property('batch-size', 1)
来对batch size进行设置。
The frames are returned to the source when muxer gets back its output buffer. If the resolution is not the same, the muxer scales frames from the input into the batched buffer and then returns the input buffers to the upstream component. The muxer pushes the batch downstream when the batch is filled, or the batch formation timeout batched-pushed-timeout is reached. The timeout starts running when the first buffer for a new batch is collected.
这里有一条很重要,由于经过了nvstreammux
之后的buffer数据就会进入nvinfer
环节,也就是CNN模型的输入。所以,我们可以通过调整buffer的尺度来调整CNNM模型的精度以及实时性。相关的代码如streammux.set_property('width', 1920)
以及streammux.set_property('height', 1080)
。另外一个比较重要的参数就是batched-push-timeout
。官方的解释是:Timeout in microseconds to wait after the first buffer is available to push the batch even if a complete batch is not formed.
。关于这个值的选取,可以参考这个网站,即,1000000(us)/FPS
。1000000us = 1s。也就是说,为了确保实时性,我们至少需要确保每隔1/FPS秒可以把对应时刻的batch buffer给push出去。反过来讲,如果超过了这个时间还没有处理完,说明在stream mux这一端就已经延迟了。
The muxer uses a round-robin algorithm to collect frames from the sources. It tries to collect an average of (batch-size/num-source) frames per batch from each source (if all sources are live and their frame rates are all the same). The number varies for each source, though, depending on the sources’ frame rates. The muxer outputs a single resolution (i.e. all frames in the batch have the same resolution).
从上面这段话我们可以看到,最好我们输入的视频流的FPS都是一致的,否则会比较麻烦。因为Gst-nvstreammux
这个插件没有单独为每一个输入视频流“私人定制”的功能。这点在Gst-nvstreammux New
有了改变。最后,live-source
这个参数,如果我们的视频流输入源于摄像头,可以设置其等于1:streammux.set_property('live-source', 1)
。
**我们再来看Gst-nvstreammux New
这个插件。**引用一下官方文档:
The muxer’s default batching uses a round-robin algorithm to collect frames from the sources. It tries to collect an average of ( batch-size/num-source ) frames per batch from each source (if all sources are live and their frame rates are all the same). The number varies for each source, though, depending on the sources’ frame rates. The muxer attaches an NvDsBatchMeta metadata structure to the output batched buffer. This meta contains information about the frames copied into the batch (e.g. source ID of the frame, original resolutions of the input frames, original buffer PTS of the input frames). The source connected to the Sink_N pad will have pad_index N in NvDsBatchMeta. The muxer supports addition and deletion of sources at run time. When the muxer receives a buffer from a new source, it sends a GST_NVEVENT_PAD_ADDED event. When a muxer sink pad is removed, the muxer sends a GST_NVEVENT_PAD_DELETED event. Both events contain the source ID of the source being added or removed (see sources/includes/gst-nvevent.h). Downstream elements can reconfigure when they receive these events. Additionally, the muxer also sends a GST_NVEVENT_STREAM_EOS to indicate EOS from the source.
这里我们看到了一些新的需求的实现,比如,我希望可以自由地在系统允许的时候添加或删除某几路视频流,又比如,我们输入的视频流FPS不同。这里我们不细致地去讨论具体用法,未来可能会有相关的案例,到时候一起讨论。
3.4 NvInfer
# Use nvinfer to run inferencing on decoder's output,
# behaviour of inferencing is set through config file
pgie = Gst.ElementFactory.make("nvinfer", "primary-inference")
if not pgie:
sys.stderr.write(" Unable to create pgie \n")
pgie.set_property('config-file-path', "dstest1_pgie_config.txt")
对于整个pipeline来说,这个插件至关重要。我自己也有疑惑,就是这些CNN的模型是怎么通过这个插件使用起来的。首先,还是让我们先回到py主程序,以及文档,看看nvinfer
是如何使用的。
原文引用:
The plugin accepts batched NV12/RGBA buffers from upstream. The NvDsBatchMeta structure must already be attached to the Gst Buffers. The low-level library (libnvds_infer) operates on any of INT8 RGB, BGR, or GRAY data with dimension of Network Height and Network Width. The Gst-nvinfer plugin performs transforms (format conversion and scaling), on the input frame based on network requirements, and passes the transformed data to the low-level library. The low-level library preprocesses the transformed frames (performs normalization and mean subtraction) and produces final float RGB/BGR/GRAY planar data which is passed to the TensorRT engine for inferencing. The output type generated by the low-level library depends on the network type.
也就是说,nvinfer
的输入输出是一样的,都是NV12/RGBA buffers
(我们从pipeline这张图里面也能看到)。从内部看,nvinfer
将输入的buffer数据,根据网络需求,转换成了CNN模型的输入,而且在输入之前还进行了一个与处理,及y = net scale factor*(x-mean)
。
nvinfer
有三个模式:
Primary mode: Operates on full frames
Secondary mode: Operates on objects added in the meta by upstream components
Preprocessed Tensor Input mode: Operates on tensors attached by upstream components
也就是说,Deepstream支持多模型的串行计算(不是并行)。即,如果我们有8路输入视频,我们的目标是检测这八路视频里的车,以及车牌。这个Deepstream是可以做到的,在Primary infer设置检测车的模型,在Secondary mode设置检测车牌的模型。但是,如果这8路视频,有3路需要检测车和车牌,5路需要检测狗和猫,那么这个情况Deepstream是做不到的,除非一个模型里既可以识别车又可以识别猫和狗(当然,这种模型很多,大家知道我的意思即可)。
Preprocessed Tensor Input mode
比较有意思,有的需求是说,我只需要最ROI里面做检测即可,不需要把一整张图作为模型的输入。这个时候我们需要使用这个模式。官方的解释是:When operating in preprocessed tensor input mode, the pre-processing inside Gst-nvinfer is completely skipped. The plugin looks for GstNvDsPreProcessBatchMeta attached to the input buffer and passes the tensor as is to TensorRT inference function without any modifications. This mode currently supports processing on full-frame and ROI. The GstNvDsPreProcessBatchMeta is attached by the Gst-nvdspreprocess plugin.
。
**关于模型的调用:**这里我们着重阐述这个案例里面caffee模型的使用。resnet10.caffemodel
这个模型是Nvidia自己训练好的。模型很小,只能检测4类物体。
nvinfer
的输入支持:
- Gst Buffer
- NvDsBatchMeta (attaching NvDsFrameMeta)
- Caffe Model and Caffe Prototxt
- ONNX
- UFF file
- TAO Encoded Model and Key
- Offline: Supports engine files generated by TAO Toolkit SDK Model converters
- Layers: Supports all layers supported by TensorRT
在源代码中,关于nvinfer
的参数设置只有一句:pgie.set_property('config-file-path', "dstest1_pgie_config.txt")
,那就让我们来看一下这个txt文件:
[property]
gpu-id=0
net-scale-factor=0.0039215697906911373
model-file=/opt/nvidia/deepstream/deepstream-6.0/samples/models/Primary_Detector/resnet10.caffemodel
proto-file=/opt/nvidia/deepstream/deepstream-6.0/samples/models/Primary_Detector/resnet10.prototxt
model-engine-file=/opt/nvidia/deepstream/deepstream-6.0/models/Primary_Detector/resnet10.caffemodel_b1_gpu0_int8.engine
labelfile-path=/opt/nvidia/deepstream/deepstream-6.0/samples/models/Primary_Detector/labels.txt
int8-calib-file=/opt/nvidia/deepstream/deepstream-6.0/samples/models/Primary_Detector/cal_trt.bin
force-implicit-batch-dim=1
batch-size=1
network-mode=1
num-detected-classes=4
interval=0
gie-unique-id=1
output-blob-names=conv2d_bbox;conv2d_cov/Sigmoid
#scaling-filter=0
#scaling-compute-hw=0
[class-attrs-all]
pre-cluster-threshold=0.2
eps=0.2
group-threshold=1
在config文件中也有两组:property
和class-sttrs-XXX
。The [property] group configures the general behavior of the plugin. It is the only mandatory group. The [class-attrs-all] group configures detection parameters for all classes. The [class-attrs-<class-id>] group configures detection parameters for a class specified by <class-id>. For example, the [class-attrs-23] group configures detection parameters for class ID 23.
下面几个参数将着重作出解释:
- batch-size: Number of frames or objects to be inferred together in a batch。这个值应该和之前streamux的batch相同。
- interval: Number of consecutive batches to be skipped for inference。如果每一桢我们都用CNN去做物体检测,那么就设置这个参数为0。但实际情况是,我们会每间隔几桢来对使用CNN模型,而其余的桢我们会使用物体跟踪算法,因为这样实时性更好,而且识别到的框抖动小。比如,interval=3就表示我们每隔3桢过一次物体识别模型。
- num-detected-classes: Number of classes detected by the network。模型的检测物体class的数量
- filter-out-class-ids: Filter out detected objects belonging to specified class-ids。有一些class,我们并不需要用到,去识别,这个时候我们可以把它们屏蔽掉,比如filter-out-class-ids=1;2就把第1类和第2类物体给屏蔽了。
- gie-unique-id: Unique ID to be assigned to the GIE to enable the application and other elements to identify detected bounding boxes and labels。这个我不是特别确定,作用应该是,如果我们使用好几个模型,这个时候每一个模型会设置一个独特的id号,用于我们可以提取其中的数据。
当然,除了上述几个参数外,官方文档中还有非常多的其他参数,感兴趣的朋友可以直接去看文档。未来我们也会详细地尝试和介绍如何把其他格式的模型跑起来。
关于NvInfer
,我们还有一个重要的部分没有涉猎,就是NvDsBatchMeta
。待会儿会提到。
3.5 输出和展示
# Use convertor to convert from NV12 to RGBA as required by nvosd
nvvidconv = Gst.ElementFactory.make("nvvideoconvert", "convertor")
if not nvvidconv:
sys.stderr.write(" Unable to create nvvidconv \n")
# Create OSD to draw on the converted RGBA buffer
nvosd = Gst.ElementFactory.make("nvdsosd", "onscreendisplay")
if not nvosd:
sys.stderr.write(" Unable to create nvosd \n")
# Finally render the osd output
if is_aarch64():
transform = Gst.ElementFactory.make("nvegltransform", "nvegl-transform")
print("Creating EGLSink \n")
sink = Gst.ElementFactory.make("nveglglessink", "nvvideo-renderer")
if not sink:
sys.stderr.write(" Unable to create egl sink \n")
nvvideoconvert
没有太多可以讲的,将输入Gst Buffer batched buffer; NvDsBatchMeta
格式变化,到输出Gst Buffer; NvDsBatchMeta
。
nvdsosd
的作用就是将用户自定义的内容画在视频/图像里:This plugin draws bounding boxes, text, and region of interest (RoI) polygons. (Polygons are presented as a set of lines.)
当我们把这些插件都初始化/实例化后,我们需要把它们都加到pipeline里面来。
pipeline.add(source)
pipeline.add(h264parser)
pipeline.add(decoder)
pipeline.add(streammux)
pipeline.add(pgie)
pipeline.add(nvvidconv)
pipeline.add(nvosd)
pipeline.add(sink)
if is_aarch64():
pipeline.add(transform)
最后,把这些模块link在一起:
print("Linking elements in the Pipeline \n")
source.link(h264parser)
h264parser.link(decoder)
sinkpad = streammux.get_request_pad("sink_0")
if not sinkpad:
sys.stderr.write(" Unable to get the sink pad of streammux \n")
srcpad = decoder.get_static_pad("src")
if not srcpad:
sys.stderr.write(" Unable to get source pad of decoder \n")
srcpad.link(sinkpad)
streammux.link(pgie)
pgie.link(nvvidconv)
nvvidconv.link(nvosd)
if is_aarch64():
nvosd.link(transform)
transform.link(sink)
else:
nvosd.link(sink)
关于NvDsBatchMeta
以及相关的Meta-data
我专门做了一期博客,见此链接。