跑通CLIP4STR，用于字符识别的预标签制作

最新推荐文章于 2024-05-15 16:29:03 发布

猫猫与橙子

最新推荐文章于 2024-05-15 16:29:03 发布

阅读量735

点赞数 9

分类专栏： ocr linux 文章标签： OCR

本文链接：https://blog.csdn.net/qq_22764813/article/details/135858030

版权

ocr 同时被 2 个专栏收录

37 篇文章 7 订阅

订阅专栏

linux

13 篇文章 0 订阅

订阅专栏

本文介绍了如何在GitHub上下载和配置CLIP4STR模型，包括环境设置、虚拟环境创建、应用库安装，以及在使用过程中遇到的显卡驱动过旧问题的解决方法。尽管模型对短文本识别表现良好，但因其推理尺寸限制，不适用于长文本行的识别，因此未用于标签质检。

摘要由CSDN通过智能技术生成

工程链接：https://github.com/VamosC/CLIP4STR

下载工程链接工程，下载模型clip4str_base16x16_d70bde1f2d.ckpt和ViT-B-16.pt；

首先根据工程中的README.md进行环境处理：

Requires `Python >= 3.8` and `PyTorch >= 1.12`.
The following commands are tested on a Linux machine with CUDA Driver Version `525.105.17` and CUDA Version `11.3`.
```
conda create --name clip4str python==3.8
conda install pytorch==1.12.0 torchvision==0.13.0 torchaudio==0.12.0 -c pytorch
pip install -r requirements.txt

具体步骤：

1.指定环境，创立虚拟环境

conda create --name /home/fxp/fxp/envs/CLIP4STR python==3.8

2.启动虚拟环境

source activate /home/fxp/fxp/envs/CLIP4STR

3.装指定的应用库

1）

conda install pytorch==1.12.0 torchvision==0.13.0 torchaudio==0.12.0 -c pytorch

2）

pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple

其次，修改路径

CLIP_PATH = '/PUT/YOUR/PATH/HERE/pretrained/clip'

在CLIP4STR-main/strhub/models/vl_str/system.py中的line22行；

根据README.md进行测试：

bash scripts/read.sh 0 clip4str_base16x16_d70bde1f2d.ckpt misc/test_images

我在pycharm中配置参数：

/home/fxp/4tdisk/code/certificate_reader/CLIP4STR-main/weights/clip4str_base16x16_d70bde1f2d.ckpt
--images_path
/home/fxp/4tdisk/code/certificate_reader/CLIP4STR-main/misc/test_image

测试可输出正常字符识别结果；

在运用中遇到的问题：

RuntimeError: The NVIDIA driver on your system is too old (found version 10010).  
Please update your GPU driver by downloading and installing a new version from the 
URL: http://www.nvidia.com/Download/index.aspx Alternatively, go to: 
https://pytorch.org to install a PyTorch version that has been compiled with your 
version of the CUDA driver.

解决办法：直接使用新的显卡驱动，显卡驱动的安装参考：

ubuntu重装cuda，cudnn，并挂载硬盘到home_cudnn重新安装-CSDN博客

后续：

因为是为了检查人工标注的字符，所以才想到用这个大模型，但是模型的推理尺寸是224*224，短的文本行识别效果还是可以，太长的文本行效果不如paddleOCR的服务器大模型，所以就没有使用该模型做标签质检；