PaddleOCR超大分辨率文本检测代码教程

LEILEI18A

已于 2024-04-12 09:00:48 修改

阅读量1k

点赞数 7

CC 4.0 BY-SA版权

分类专栏： Python 深度学习文章标签： paddle paddleocr ppocr 超大分辨率文本检测

于 2024-04-08 16:54:23 首次发布

本文链接：https://blog.csdn.net/LEILEI18A/article/details/137514657

Python 同时被 2 个专栏收录

18 篇文章

订阅专栏

深度学习

6 篇文章

订阅专栏

本文介绍了如何在PaddleOCR中针对超大分辨率进行文本检测，通过修改`predict_det.py`中的代码，实现滑动窗口和多尺度检测策略，无需CUDA，适用于DB等分割体系的文本检测。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

PaddleOCR超大分辨率文本检测代码教程

1.前提

2.PaddleOCR部署（win10下）

3.解决思路和代码

1.前提

这是我提的issue：https://github.com/PaddlePaddle/PaddleOCR/issues/11888

很多问题可以看：https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.7/doc/doc_ch/FAQ.md

对于超大分辨率，直接resize已经不适合，那么就需要滑动窗口、以及不同尺度的窗口滑动，对于目标检测yolov5 v8中有时用到，其中需要多次nms；但是这里ocr采用DB这种分割体系的文本检测就减少了很多操作。

仅适用DB这种分割体系的文本检测！！！

2.PaddleOCR部署（win10下）

对于win10下部署paddleocr其中重要的一环是conda虚拟环境教程；不用安装cuda，仅需要按照最新的nvidia驱动即可，驱动向下兼容；再参考paddle-gpu官网conda安装cudatoolkit、cudnn对应版本即可！

3.解决思路和代码

直接上代码，简单。

这部分代码是直接在 PaddleOCR\tools\infer\predict_det.py中TextDetector的__call__函数中添加，支持多尺度的分割文本检测！

注意：直接在predict_det.py注释掉这2行代码# 'limit_side_len': args.det_limit_side_len, # 'limit_type': args.det_limit_type，那么对于超大分辨率10000多尺寸的也能直接预测（前提GPU至少8G显存）！预测效果和滑动窗口的效果差不多（互相有检测不到的地方）

        ori_im = img.copy()
        image_height, image_width = img.shape[:2]
        # 前提是采用两阶段检测识别方法，det采用分割的方法，如DB、DB++
        if (image_width // max(window_size) >= 2) and (image_height // max(window_size) >= 2) and self.det_algorithm in ['DB', 'PSE', 'DB++']:
            preds_all = np.zeros([1, 1, image_height, image_width])  # 预测概率图
            assert len(window_size) == len(stride), "窗口尺寸和步长 列表长度不一致"
            st = time.time()
            preds = {}
            for i in range(len(window_size)):
                window_size_i = window_size[i]
                stride_i = stride[i]
                # 计算水平和垂直方向上窗口的数量  
                num_windows_height = (image_height - window_size_i) // stride_i + 1  
                num_windows_width = (image_width - window_size_i) // stride_i + 1 
                # 这里有天然的缺陷
                if window_size_i > self.args.det_limit_side_len:
                    raise ValueError("window_size超过默认的参数，导致无法预测后赋值，暂时无法修改，建议降低window_size！")
                windows_x, windows_y = np.meshgrid(
                    np.append(np.arange(0, num_windows_width * stride_i, stride_i), image_width-window_size_i),
                    np.append(np.arange(0, num_windows_height * stride_i, stride_i), image_height-window_size_i),
                ) # x-w y-h
                # 遍历超大分辨率图片 滑动窗口
                h,w = windows_x.shape[:2]
                for y in range(h):
                    for x in range(w):
                        start_h, start_w = windows_y[y,x], windows_x[y,x]
                        print(f"正在处理y{start_h}-x{start_w}块。。。")
                        img_ = img[start_h:start_h+window_size_i, start_w:start_w+window_size_i]
                        data = {'image': img_}
                        if self.args.benchmark:
                            self.autolog.times.start()
                        data = transform(data, self.preprocess_op)
                        img_, shape_list = data
                        if img_ is None:
                            return None, 0
                        img_ = np.expand_dims(img_, axis=0)
                        shape_list[0] = image_height
                        shape_list[1] = image_width
                        shape_list = np.expand_dims(shape_list, axis=0)
                        img_ = img_.copy()
                        if self.args.benchmark:
                            self.autolog.times.stamp()
                        if self.use_onnx:
                            input_dict = {}
                            input_dict[self.input_tensor.name] = img_
                            outputs = self.predictor.run(self.output_tensors, input_dict)
                        else:
                            self.input_tensor.copy_from_cpu(img_)
                            self.predictor.run()
                            outputs = []
                            for output_tensor in self.output_tensors:
                                output = output_tensor.copy_to_cpu()
                                outputs.append(output)
                            if self.args.benchmark:
                                self.autolog.times.stamp()
                        if self.det_algorithm in ['DB', 'PSE', 'DB++']:
                            preds_all[:, :, start_h:start_h+window_size_i, start_w:start_w+window_size_i] = np.maximum(preds_all[:, :, start_h:start_h+window_size_i, start_w:start_w+window_size_i], outputs[0])
                        else:
                            raise NotImplementedError
            preds['maps'] = preds_all

            post_result = self.postprocess_op(preds, shape_list)
            dt_boxes = post_result[0]['points']
            if self.args.det_box_type == 'poly':
                dt_boxes = self.filter_tag_det_res_only_clip(dt_boxes, ori_im.shape)
            else:
                dt_boxes = self.filter_tag_det_res(dt_boxes, ori_im.shape)
            if self.args.benchmark:
                self.autolog.times.end(stamp=True)
            et = time.time()
            return dt_boxes, et - st