OpenVINO 2021r2 C++ 超分辨率重建FSRCNN

OpenVINO 同时被 2 个专栏收录
22 篇文章 9 订阅
13 篇文章 2 订阅

最近把OpenVINO升级到了最新版本(超级不喜欢openvino这点,每次升级都要换几个接口,虽说API会向前兼容几个版本,不过跟起来真累啊,OpenCV, FFMPEG也是这样,是不是开源项目都是这么玩的啊... ) 顺便来试试看最新版本的OpenVINO对图像超分的模型支持的怎么样。

 

先从FSRCNN 开始,毕竟这是图像超分的经典模型,运算量小推理速度快,超分效果又好。

 

https://www.github.com/Saafke/FSRCNN_Tensorflow上看具体的实现,FSRCNN模型是针对图像的Y通道做处理,先除以255.0转到[0,1]的浮点,然后做2倍的超分,推理输出乘以255.0,并且clip(0,255)作为输出Y通道,对于Cb,Cr通道直接做bicubic 2X放大,最后组合成BGR图像输出

    def upscale(self, path):
        """
        Upscales an image via model.
        """
        img = cv2.imread(path, 3)
#BGR转YCbCr
        img_ycc = cv2.cvtColor(img, cv2.COLOR_BGR2YCrCb)
        img_y = img_ycc[:,:,0]
#Y通道转为[0,1]之间的浮点
        floatimg = img_y.astype(np.float32) / 255.0
        LR_input_ = floatimg.reshape(1, floatimg.shape[0], floatimg.shape[1], 1)

        with tf.Session(config=self.config) as sess:
            print("\nUpscale image by a factor of {}:\n".format(self.scale))
            
            # load and run
            ckpt_name = self.ckpt_path + "fsrcnn_ckpt" + ".meta"
            saver = tf.train.import_meta_graph(ckpt_name)
            saver.restore(sess, tf.train.latest_checkpoint(self.ckpt_path))
            graph_def = sess.graph
            LR_tensor = graph_def.get_tensor_by_name("IteratorGetNext:0")
            HR_tensor = graph_def.get_tensor_by_name("NHWC_output:0")
#推理
            output = sess.run(HR_tensor, feed_dict={LR_tensor: LR_input_})

            # post-process
            Y = output[0]
#输出数据Y通道乘255.0, clip到[0,255]之间
            Y = (Y * 255.0).clip(min=0, max=255)
            Y = (Y).astype(np.uint8)
#Cb,Cr做Bicubic插值放大
            # Merge with Chrominance channels Cr/Cb
            Cr = np.expand_dims(cv2.resize(img_ycc[:,:,1], None, fx=self.scale, fy=self.scale, interpolation=cv2.INTER_CUBIC), axis=2)
            Cb = np.expand_dims(cv2.resize(img_ycc[:,:,2], None, fx=self.scale, fy=self.scale, interpolation=cv2.INTER_CUBIC), axis=2)
#YCbCr转BGR
            HR_image = (cv2.cvtColor(np.concatenate((Y, Cr, Cb), axis=2), cv2.COLOR_YCrCb2BGR))

            bicubic_image = cv2.resize(img, None, fx=self.scale, fy=self.scale, interpolation=cv2.INTER_CUBIC)

            cv2.imshow('Original image', img)
            cv2.imshow('HR image', HR_image)
            cv2.imshow('Bicubic HR image', bicubic_image)
            cv2.waitKey(0)
        sess.close()

对于openvino实现来说,所有的超分模型,只要Module Optimizer能正确的转换,那么推理部分基本都没什么问题,需要考虑的就是输入给模型的数据预处理部分,是丢进去[0,1]之间的浮点,还是[-1,1]的浮点,输入数据要不要叠加mean/shift的计算, 这部分预处理可以在MO转IR模型时候通过参数丢给IR模型,让IE去做;以及输出部分的浮点怎么转换到[0,255]之间的RGB/YUV像素,这部分需要自己实现代码手工处理。

 

开始MO转换, 我希望输入图像分辨率大一点,所以定义输入尺寸为640x480, 这样输出的图片尺寸在1280x960. 通过scale_value=[255.0]告诉IE在计算时每个输入数据要除以255.0

C:\temp_20151027\FSRCNN_Tensorflow-master\models>python "c:\Program Files (x86)\IntelSWTools\openvino_2021\deployment_tools\model_optimizer\mo_tf.py" --scale_values=[255.0] --input_shape=[1,480,640,1] --input_model=FSRCNN_x2.pb --data_type FP16 --output=NHWC_output

 

接下来是C++代码的实现,借用了前一篇文章 OpenVINO 2020r3 体验GPU Remote Blob API 里推理的代码,只是在最后处理输出outputblob的地方换成转换像素的代码

 

/*
loadjpg将彩色图像变成灰度图像
static void loadjpg(const char * jpgname, int width, int height)
{
	//loadimage(&jpg, jpgname);//
	cv::Mat jpg_2x;
	jpg = cv::imread(jpgname);
	cout << "load image: " << jpgname << " resize: w=" << width << " h=" << height << endl;
	//resize to width*height

	std::cout << "convert img to Gray" << std::endl;
	cv::cvtColor(jpg, jpg, cv::COLOR_BGR2GRAY);  //COLOR_BGR2YCrCb or COLOR_BGR2YUV

	cv::resize(jpg, jpg, cv::Size(width, height), 0, 0, cv::INTER_CUBIC);
	cv::resize(jpg, jpg_2x, cv::Size(width * 2, height * 2), 0, 0, cv::INTER_CUBIC);
	cv::imshow("bic_2x", jpg_2x);
	cv::imwrite("palace_gray_bic_2x.png", jpg_2x);
}
*/


		string FLAGS_d = "GPU"; //"CPU"; 选择用CPU还是GPU推理
		string FLAGS_m = "C:\\work\\opencl_2020\\cmake_fsrcnn_ov2021\\src\\FSRCNN_x2_FP16.xml";
		string FLAGS_i = "C:\\work\\opencl_2020\\cmake_fsrcnn_ov2021\\src\\palace.jpg";
		int FLAGS_nt = 10;

		cout << "starting" << endl;
		const Version *IEversion;
		IEversion = GetInferenceEngineVersion();
		cout << "InferenceEngine: API version " << IEversion->apiVersion.major << "." << IEversion->apiVersion.minor << endl;
		cout << "InferenceEngine: Build : " << IEversion->buildNumber << endl << endl;

		// --------------------------- 1. Load inference engine -------------------------------------
		cout << "Creating Inference Engine" << endl;

		Core ie;
		// -----------------------------------------------------------------------------------------------------

				// --------------------------- 2. Read IR Generated by ModelOptimizer (.xml and .bin files) ------------
		cout << "Loading network files" << endl;

		/** Read network model **/
		CNNNetwork network = ie.ReadNetwork(FLAGS_m);
		cout << "network layer count: " << network.layerCount() << endl;
		// -----------------------------------------------------------------------------------------------------

				// --------------------------- 3. Configure input & output ---------------------------------------------

			// --------------------------- Prepare input blobs -----------------------------------------------------
		cout << "Preparing input blobs" << endl;

		/** Taking information about all topology inputs **/
		InputsDataMap inputInfo(network.getInputsInfo());
		if (inputInfo.size() != 1) throw std::logic_error("Sample supports topologies with 1 input only");

		auto inputInfoItem = *inputInfo.begin();

		/** Specifying the precision and layout of input data provided by the user.
		 * This should be called before load of the network to the device **/
		inputInfoItem.second->setPrecision(Precision::U8);
		inputInfoItem.second->setLayout(Layout::NCHW);

		//cout << FLAGS_i << endl;
//loadjpg将RGB图像转换成灰度图像,这样比较简单
		loadjpg(FLAGS_i.c_str(), inputInfoItem.second->getTensorDesc().getDims()[3],
			inputInfoItem.second->getTensorDesc().getDims()[2]);

		if (jpg.data == NULL)
		{
			cout << "Valid input images were not found!" << endl;
		}

		/** Setting batch size to 1 **/
		network.setBatchSize(1);
		size_t batchSize = network.getBatchSize();
		cout << "Batch size is " << std::to_string(batchSize) << endl;


		// --------------------------- 4. Loading model to the device ------------------------------------------
		cout << "Loading model to the device: " << FLAGS_d << endl;
		ExecutableNetwork executable_network = ie.LoadNetwork(network, FLAGS_d);
		// -----------------------------------------------------------------------------------------------------

		// --------------------------- 5. Create infer request -------------------------------------------------
		cout << "Create infer request" << endl;
		InferRequest inferRequest_regular = executable_network.CreateInferRequest();
		// -----------------------------------------------------------------------------------------------------

		// --------------------------- 6. Prepare input --------------------------------------------------------
		for (auto & item : inputInfo) {
			Blob::Ptr inputBlob = inferRequest_regular.GetBlob(item.first);
			SizeVector dims = inputBlob->getTensorDesc().getDims();
			/** Fill input tensor with images. First b channel, then g and r channels **/
			size_t num_channels = dims[1];
			std::cout << "num_channles = " << num_channels << std::endl;
			size_t image_size = dims[3] * dims[2];

			MemoryBlob::Ptr minput = as<MemoryBlob>(inputBlob);
			if (!minput) {
				cout << "We expect MemoryBlob from inferRequest_regular, but by fact we were not able to cast inputBlob to MemoryBlob" << endl;
				return 1;
			}
			// locked memory holder should be alive all time while access to its buffer happens
			auto minputHolder = minput->wmap();

			auto data = minputHolder.as<PrecisionTrait<Precision::U8>::value_type *>();
			unsigned char* pixels = (unsigned char*)(jpg.data);

			cout << "image_size = " << image_size << endl;
			/** Iterate over all pixel in image (b,g,r) **/
//将Mat数据转换给inputBlob
			for (size_t pid = 0; pid < image_size; pid++) {
				/** Iterate over all channels **/
				for (size_t ch = 0; ch < num_channels; ++ch) {
					/**          [images stride + channels stride + pixel id ] all in bytes            **/
					data[ch * image_size + pid] = pixels[pid*num_channels + ch];
				}
			}
		}

		milliseconds start_ms = duration_cast<milliseconds>(
			system_clock::now().time_since_epoch()
			);
		// --------------------------- 7. Do inference ---------------------------------------------------------
#if 0
		//for async inference
		size_t numIterations = 10;
		size_t curIteration = 0;
		std::condition_variable condVar;

		inferRequest_regular.SetCompletionCallback(
			[&] {
			curIteration++;
			cout << "Completed " << curIteration << " async request execution" << endl;
			if (curIteration < numIterations) {
				/* here a user can read output containing inference results and put new input
				   to repeat async request again */
				inferRequest_regular.StartAsync();
			}
			else {
				/* continue sample execution after last Asynchronous inference request execution */
				condVar.notify_one();
			}
		});

		/* Start async request for the first time */
		cout << "Start inference (" << numIterations << " asynchronous executions)" << endl;
		inferRequest_regular.StartAsync();

		/* Wait all repetitions of the async request */
		std::mutex mutex;
		std::unique_lock<std::mutex> lock(mutex);
		condVar.wait(lock, [&] { return curIteration == numIterations; });
#else
		/* Start sync request */
		cout << "Start inference " << endl;
		inferRequest_regular.Infer();
#endif
		milliseconds end_ms = duration_cast<milliseconds>(
			system_clock::now().time_since_epoch()
			);
		std::cout << "total cost time: " << (end_ms - start_ms).count() << " ms" << std::endl;
		float total_time = (end_ms - start_ms).count() / 1000.0;
		std::cout << "FPS: " << (float)1.0 / total_time << std::endl;

		// -----------------------------------------------------------------------------------------------------

		// --------------------------- 8. Process output -------------------------------------------------------
		cout << "Processing output blobs" << endl;
		OutputsDataMap outputInfo(network.getOutputsInfo());

		cout << "output blob name: " << outputInfo.begin()->first << endl;
		if (outputInfo.size() != 1) throw std::logic_error("Sample supports topologies with 1 output only");
		MemoryBlob::CPtr moutput = as<MemoryBlob> (inferRequest_regular.GetBlob(outputInfo.begin()->first));

		/** Validating -nt value **/
		const size_t resultsCnt = moutput->size() / batchSize;
		if (FLAGS_nt > resultsCnt || FLAGS_nt < 1) {
			cout << "-nt " << FLAGS_nt << " is not available for this network (-nt should be less than " \
				<< resultsCnt + 1 << " and more than 0)\n            will be used maximal value : " << resultsCnt << endl;
			FLAGS_nt = resultsCnt;
		}


		if (!moutput) {
			throw std::logic_error("We expect output to be inherited from MemoryBlob, "
				"but by fact we were not able to cast it to MemoryBlob");
		}
		// locked memory holder should be alive all time while access to its buffer happens
		auto lmoHolder = moutput->rmap();
		const auto output_data = lmoHolder.as<const PrecisionTrait<Precision::FP32>::value_type *>();

		size_t num_images = moutput->getTensorDesc().getDims()[0];
		size_t num_channels = moutput->getTensorDesc().getDims()[1];
		size_t H = moutput->getTensorDesc().getDims()[2];
		size_t W = moutput->getTensorDesc().getDims()[3];
		size_t nPixels = W * H;


//处理outputBlob, 将输出浮点数转换成像素
		std::cout << "Output size [N,C,H,W]: " << num_images << ", " << num_channels << ", " << H << ", " << W << std::endl;

		{
			std::vector<float> data_img(nPixels * num_channels);

			if (num_channels == 1)
			{
				cv::Mat Img(H, W, CV_8U);
				unsigned char *image_ptr = Img.data;

				for (size_t n = 0; n < num_images; n++) {
					for (size_t i = 0; i < nPixels; i++) {
						data_img[i ] = static_cast<float>(output_data[i + n * nPixels ])*255.0;

						//std::cout << "i:" << i << "  data:" << data_img[i] << std::endl;

						if (data_img[i  ] < 0) data_img[i  ] = 0;
						if (data_img[i  ] > 255) data_img[i  ] = 255;
						image_ptr[i] = data_img[i];

					}
				}

				imshow("FSRCNN_2x", Img);
				cv::imwrite("palace_FSRCNN_gray_2x.png", Img);
				std::cout << "Output Image created" << std::endl;

			}

最终得到输出结果

原始图片(测试图片来自网络)

Bicubic的2x放大效果

FSRCNN 2X效果

 

最终调用inferRequest_regular.Infer()推理的时间, 在我的8665U 4核8线程的CPU和 Gen9 24EU的核显上

  • CPU: 68ms (14.71FPS)
  • GPU: 48ms (20.83FPS)

基本上在8代CPU的核显上能到20fps, 如果换到现在主流平台的11代Tigerlake的Gen12 96EU上, 预计性能翻个3倍应该没问题,到时候应该能用FSRCNN来做个老电影AI修复的实时播放器

 

最后源码奉上,仅供参考

https://gitee.com/tisandman/fsrcnn_ov2021

 

  • 1
    点赞
  • 0
    评论
  • 1
    收藏
  • 一键三连
    一键三连
  • 扫一扫,分享海报

<p> <span></span> </p> <p> 手把手讲授如何搭建成功OpenVINO框架,并且使用预训练模型快速开发分辨率、道路分割、汽车识别、人脸识别、人体姿态和行人车辆分析。得益于OpenVINO框架的强大能力,这些例子都能够基于CPU达到实时帧率。<br /> 课程的亮点在于在调通Demo的基础上更进一步:一是在讲Demo的时候,对相关领域问题进行分析(比如介绍什么是分辨率,有什么作用)、预训练模型的来龙去脉(来自那篇论文,用什么训练的)、如何去查看不同模型的输入输出参数、如何编写对应的接口参数进行详细讲解;二是基本上对所有的代码进行重构,也就是能够让例子独立出来,并且给出了带有较详细注释的代码;三是注重实际运用,将Demo进一步和实时视频处理框架融合,形成能够独立运行的程序,方便模型落地部署;四是重难点突出、注重总结归纳,对OpenVINO基本框架,特别是能够提高视频处理速度的异步机制和能够直接部署解决实际问题的骨骼模型着重讲解,帮助学习理解;五是整个课程准备精细,每一课都避免千篇一律,前一课有对后一课的预告,后一课有对前一课的难点回顾,避免学习过程中出现突兀;六是在适当的时候拓展衍生,不仅讲OpenVINO解决图像处理问题,而且还补充图像处理的软硬选择、如何在手机上开发图像处理程序等内容,帮助拓展视野,增强对行业现状的了解。<br /><br /> 基本提纲:<br /> 1、课程综述、环境配置<br /> 2OpenVINO范例-分辨率(super_resolution_demo)<br /> 3、OpenVINO范例-道路分割(segmentation_demo)<br /> 4、OpenVINO范例-汽车识别(security_barrier_camera_demo)<br /> 5、OpenVINO范例-人脸识别(interactive_face_detection_demo)<br /> 6、OpenVINO范例-人体姿态分析(human_pose_estimation_demo)<br /> 7、OpenVINO范例-行人车辆分析(pedestrian_tracker_demo)<br /> 8、NCS和GOMFCTEMPLATE<br /> 9、课程小结,资源分享 </p>
©️2021 CSDN 皮肤主题: 大白 设计师:CSDN官方博客 返回首页
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、C币套餐、付费专栏及课程。

余额充值