我搞坏了我儿子的“牛听听”

搜狐技术产品小编2023

于 2022-07-21 07:30:13 发布

阅读量488

点赞数

文章标签： opencv 计算机视觉人工智能 java 图像识别

本文链接：https://blog.csdn.net/SOHU_TECH/article/details/126397473

版权

format,png

一个程序员搞坏了孩子玩具的故事

我搞坏了我儿子的“牛听听”，搞坏东西就得赔。

我是一个的中年程序员，我有一个六岁的儿子。我每天的主要生活，除了写代码，还需要经常和我的儿子斗智斗勇。记忆中，我小时候，都是作为儿子的我在闯祸，而我爸在收拾我。可是，终于熬到了我成为了爸爸，时代却改变了，新时代的亲子关系，是平等的，甚至很多时候，需要大人降低姿态来包容孩子。于是，在我和我儿子的对线中，悲剧重演了。再次变成了，我，在不停闯祸，而我的儿子，在替我爸收拾我。

于是，这一天，我又闯祸了。

——我搞坏了我儿子的“牛听听”

啥是“牛听听”

中年油腻的程序员，购物在猫狗之间，自然是选择狗东的。不过这不重要。我熟练的打开了京东，搜索“牛听听”，进入商品页面。商品首页上醒目的位置有这么两张图片。

第一张图上的小女孩，旁边就摆着这个叫做“牛听听”的道具，以至于她甚至可以一边学习，一边快乐的欢笑，仿佛马上就要说出那句著名的广告语“哪里不会点哪里”。

而第二张图片，简单直接的描绘了“牛听听”的用法，它可以扫描摆在它面前的书，并且把书里的内容“读”出来。

是的，这又是一个“科技解放父母的双手”的神器，作为被解放的父母，有了它，就可以减少很多帮助不识字的孩子读绘本的时间支出。有了它，想必，多出来的时间，一定自己又可以多玩不少手机吧，不过这也不重要。

重要的是，我儿子的“牛听听”，它被我搞坏了。

怎么它就坏了呢

它被我的手机充电器充过一次电以后，就再也不能开机了。我想当然的认为，充电口都是Type-C，既然插得进去，电自然就充得了。可是，它就是这么坏了，坏得透透的。

心怀一丝希望的我，接通了“牛听听”的真人客服，客服告诉我，虽然都是Type-C，但是手机快充头的输出电流太高了，把设备烧坏了。并且，这个属于人为损坏，而且本身我的这个牛听听就已经过了保修期了，她没有办法可以免费帮到我。

再买一个？

看来，只能认栽了。于是，我再次打开了购物网站的商品页，准备以身作则的履行“弄坏东西就得赔”的教诲。但是，我又一次被“牛听听”惊到了。

这玩意儿，原来，这么贵啊。

广告里都说什么爸爸少抽一包烟，妈妈少买一件化妆品。这，可不止一包烟呢…

就当我准备忍痛下单的时候，我那该死的工程师之魂，终于醒过来了。

等一等！

它，复杂吗？

它，不就是能识别书里面的文字，并且用声音把文字念出来吗？这功能，怎么听着挺耳熟的呢？

我过去开发了搜狐新闻的手机端AI框架，里面OCR文字识别模块是现成的轮子，可以直接拿来用。

而文字转语音的部分，那也是我的老本行啊，搜狐新闻的听新闻功能也是我们开发的。

于是，万事俱备，我准备自己动手写一个，“手机版”的“牛听听”。

干货

先回顾一下搜狐双擎AI框架的实现：

搜狐双擎AI框架的文章：https://blog.csdn.net/SOHU_TECH/article/details/112975827

集成这个sdk之后，只用简单几行代码，就可以实现从Bitmap到String的转化过程：

AIHelperFactory.getInstance(context).init(AI_TOOL_OCR);

...

AIHelperFactory.getInstance(context).getOcr().recognize(bitmap);

…

AIHelperFactory.getInstance(context).release();

中间模型加载选型已经对开发者完全透明了，非常好用，不亏是我自己写的轮子。

5分钟集成完毕，我打印了李白的《静夜思》，随手拍了张照片，丢给demo程序单元测试一下效果：

嗯，仔细一看，不对劲？识别成了：

夜思陲

李白

麻前明月光

疑似地上霜

举头望明月

低夫照おり

翻车了…

问题出在哪里？

对比了成熟商用的扫描文档软件，看了看他门的广告图片，我发现了一个问题：

他门图片都是正面直出。随手拍摄的图片，也会有自动识别并且拉伸变形：

看来，需要在OCR识别之前，加入这么几个预处理：

自动识别图片中四边形
辨认四边形是否具备纸张的特性
提取四边形内容，变形
锐化清晰文本

手机版牛听听的技术选型，从OCR+TTS变成了“纸张识别预处理”+OCR+TTS:

OCR+TTS使用现成的轮子完成。问题简化成了下面这张示意图，从左图自动提取右图的过程。

OpenCV实现提取照片中的纸张

图像到图像的处理，当仁不让的，这次，我选择了OpenCV。

OpenCV的强大工具类Imgproc提供了一系列的图形处理方法，仔细看每个方法的说明，总能找到实现自己想要功能的思路。

这次，我使用了里面4个方法，轻而易举的就完成了提取纸张的功能：

第一步：findContours

提取图像中的轮廓

Finds contours in a binary image.

* The function retrieves contours from the binary image using the algorithm CITE: Suzuki85 . The contours are a useful tool for shape analysis and object detection and recognition.

如方法说明，这个方法可以输出图像中的轮廓Mat，对上面的照片使用此方法后，大概可以得到下图：

第二步：approxPolyDP

提取轮廓中的多边形，使用“道格拉斯-普克算法”提取多边形

Approximates a polygonal curve(s) with the specified precision. * The function cv::approxPolyDP approximates a curve or a polygon with another curve/polygon with less vertices so that the distance between them is less or equal to the specified precision.

这个算法非常巧妙的把我们前一步提取到的轮廓，通过抽样概括成了多边形。

“道格拉斯-普克算法(Douglas–Peucker algorithm，亦称为拉默-道格拉斯-普克算法、迭代适应点算法、分裂与合并算法)是将曲线近似表示为一系列点，并减少点的数量的一种算法。它的优点是具有平移和旋转不变性，给定曲线与阈值后，抽样结果一定。”

我选择的采样阈值是多边形周长的2%，这样得到的近似多边形中，只留下四边形之后是这样，只留下了红绿两个四边形：

第三步：contourArea

保证四边形为主四边形

Calculates a contour area.

The function computes a contour area. Similarly to moments , the area is computed using the Green formula. Thus, the returned area and the number of non-zero pixels.

通过计算四边形的面积，当面积达到一定量，才可以足够呈现里面的文本内容。

这一步，在这个测试case中，因为两个四边形都足够大，所以没有过滤出结果。

第四步：isContourConvex

保证四边形形状像一个纸张（外凸四边形）

Tests a contour convexity.

The function tests whether the input contour is convex or not. The contour must be simple, that is, without self-intersections. Otherwise, the function output is undefined.

过滤掉红色的内凹四边形之后，剩下唯一的结果：

大功告成！

来回归一下核心几步的代码实现：

getPaperBitmapWithDefaultRect方法获取图片中的纸张图片

public static Bitmap getPaperBitmapWithDefaultRect(Context context,  
        Uri srcUri, RectF defaultRect) {  
  
    // 1. Resize the srcBitmap to a smaller recognizeMat for performance  
    // optimization.  
    Bitmap recognizeBitmap = ImageUtils.getBitmapWithoutOrientation(  
            context, srcUri, PAPER_RECOGNIZE_WIDTH);  
    Mat recognizeMat = new Mat(recognizeBitmap.getHeight(),  
            recognizeBitmap.getWidth(), CvType.CV_8UC3);  
    try {  
        Utils.bitmapToMat(recognizeBitmap, recognizeMat);  
    } catch (org.opencv.core.CvException e) {  
        return null;  
    } catch (IllegalArgumentException e) {  
        return null;  
    }  
  
    if (recognizeMat.empty()) {  
        return null;  
    }  
  
    // 2. Find the paper edge in the recoginzeMat  
    MatOfPoint recognizeCorners = find_largest_square(find_squares(recognizeMat));  
  
    // 3. Get paper edge in the srcMat from paper edge in the recognizeMat  
    InputStream input = null;  
    try {  
        input = context.getContentResolver().openInputStream(srcUri);  
    } catch (FileNotFoundException e) {  
        e.printStackTrace();  
        return null;  
    }  
    BitmapFactory.Options opts = new BitmapFactory.Options();  
    opts.inJustDecodeBounds = true;  
    BitmapFactory.decodeStream(input, null, opts);  
  
    String filePath = ImageUtils.getImagePathFromUri(context, srcUri);  
    Mat srcMat = Imgcodecs.imread(filePath);  
    boolean needRotate = false;  
    if (srcMat.width() > srcMat.height()) {  
        needRotate = true;  
    }  
    Point[] recognizePoints;  
    if (recognizeCorners == null) {  
        recognizePoints = new Point[4];  
        int defaultWidth = needRotate ? srcMat.height()  
                : srcMat.width();  
        int defaultHeight = needRotate ? srcMat.width()  
                : srcMat.height();  
        if (needRotate) {  
            recognizePoints[0] = new Point(defaultHeight * defaultRect.top,  
                    defaultWidth * (1 - defaultRect.right));  
            recognizePoints[1] = new Point(defaultHeight  
                    * defaultRect.bottom, defaultWidth  
                    * (1 - defaultRect.right));  
            recognizePoints[2] = new Point(defaultHeight  
                    * defaultRect.bottom, defaultWidth  
                    * (1 - defaultRect.left));  
            recognizePoints[3] = new Point(defaultHeight * defaultRect.top,  
                    defaultWidth * (1 - defaultRect.left));  
        } else {  
            recognizePoints[0] = new Point(defaultWidth * defaultRect.left,  
                    defaultHeight * defaultRect.top);  
            recognizePoints[1] = new Point(  
                    defaultWidth * defaultRect.right, defaultHeight  
                            * defaultRect.top);  
            recognizePoints[2] = new Point(  
                    defaultWidth * defaultRect.right, defaultHeight  
                            * defaultRect.bottom);  
            recognizePoints[3] = new Point(defaultWidth * defaultRect.left,  
                    defaultHeight * defaultRect.bottom);  
        }  
    } else {  
        float scale = 1;  
        scale = (float) recognizeBitmap.getWidth() / opts.outWidth;  
        recognizePoints = recognizeCorners.toArray();  
        for (Point pt : recognizePoints) {  
            pt.x /= scale;  
            pt.y /= scale;  
        }  
    }  
  
    MatOfPoint srcCorners = new MatOfPoint(recognizePoints);  
    if (needRotate) {  
        srcCorners = sortRotateCorners(srcCorners);  
    } else {  
        srcCorners = sortCorners(srcCorners);  
    }  
  
    // 4. Get the transfer mat from the paper edge  
    MatOfPoint2f quad_pts = new MatOfPoint2f();  
    int padding = PAPER_PADDING;  
    Mat quad = Mat.zeros(PAPER_SIZE_HEIGHT, PAPER_SIZE_WIDTH,  
            CvType.CV_8UC3);  
    Size size = quad.size();  
    quad_pts.push_back(new MatOfPoint2f(new Point(-padding, -padding)));  
    quad_pts.push_back(new MatOfPoint2f(new Point(size.width + padding,  
            -padding)));  
    quad_pts.push_back(new MatOfPoint2f(new Point(size.width + padding,  
            size.height + padding)));  
    quad_pts.push_back(new MatOfPoint2f(new Point(-padding, size.height  
            + padding)));  
    srcCorners.convertTo(srcCorners, CvType.CV_32F);  
    Mat transmtx = Imgproc.getPerspectiveTransform(srcCorners, quad_pts);  
  
    // 5. Transfer the paper in srcMat  
    Imgproc.warpPerspective(srcMat, quad, transmtx, quad.size());  
  
    // 6. get paper bitmap  
    quad = getGrayContrastMat(quad);  
    Bitmap dstBitmap = Bitmap.createBitmap(quad.width(), quad.height(),  
            Bitmap.Config.ARGB_8888);  
    Utils.matToBitmap(quad, dstBitmap);  
    if (DEBUG) {  
        Debug.stopMethodTracing();  
    }  
    return dstBitmap;  
}

这方法里面核心得方法是find_squares

把图片的Bitmap，转换为OpenCV Mat模式。然后转为灰度图像，开始寻找四边形：

寻找四边形，使用我们前面提到的四个方法，得到纸张的四边形：

public static List<MatOfPoint> find_squares(Mat image) {  
    List<MatOfPoint> contours = new ArrayList<MatOfPoint>();  
    List<MatOfPoint> squares = new ArrayList<MatOfPoint>();  
  
    Mat blurred = new Mat(image.height(), image.width(), CvType.CV_8UC3);  
    Imgproc.GaussianBlur(image, blurred, new Size(11, 11), 0);  
    ArrayList<Mat> grayList = new ArrayList<Mat>();  
    Core.split(blurred, grayList);  
    Mat gray0 = new Mat(blurred.size(), CvType.CV_8U);  
    Mat gray = new Mat(image.height(), image.width(), CvType.CV_8U);  
    for (int a = 0; a < grayList.size(); a++) {  
        gray0 = grayList.get(a);  
        int threshold_level = 2;  
        for (int level = 0; level < threshold_level; level++) {  
            Imgproc.Canny(gray0, gray, 10 * (level + 1), 10 * (level + 1));  
            Imgproc.dilate(gray, gray, new Mat(), new Point(-1, -1), 1);  
            Mat hierarchy = new Mat();  
            hierarchy.convertTo(hierarchy, CvType.CV_32SC1);  
            Imgproc.findContours(gray, contours, hierarchy, Imgproc.RETR_LIST, Imgproc.CHAIN_APPROX_SIMPLE);  
            MatOfPoint2f approx = new MatOfPoint2f();  
            for (MatOfPoint contoursPoint : contours) {  
                // get each contours  
                // convert contours to point2f  
                // close a contours start point and end point  
                MatOfPoint2f contourPoint2f = new MatOfPoint2f();  
                contoursPoint.convertTo(contourPoint2f, CvType.CV_32F);  
                Imgproc.approxPolyDP(contourPoint2f, approx,  
                        Imgproc.arcLength(contourPoint2f, true) * 0.02,  
                        true);  
                if (approx.total() == 4) {  
                    // convert the closed path to a MatOfPoint  
                    Point[] approxArray = approx.toArray();  
                    MatOfPoint approxPoint = new MatOfPoint(approxArray);  
                    if (Math.abs(Imgproc.contourArea(approx)) > 1000  
                            && Imgproc.isContourConvex(approxPoint)) {  
                        double maxCosine = 0;  
                        for (int j = 2; j < 5; j++) {  
                            double cosine = Math.abs(angle(  
                                    approxArray[j % 4], approxArray[j - 2],  
                                    approxArray[j - 1]));  
                            maxCosine = Math.max(maxCosine, cosine);  
                        }  
                        if (maxCosine < 0.3) {  
                            squares.add(approxPoint);  
                        }  
                    }  
                }  
            }  
        }  
    }  
    return squares;  
}

最后把得到的四边形区域，形变，锐化。

关于锐化图像，我们又要再次请出这位著名的“带帽子的菇凉”，相信每个程序员都在各个文档中见过她的身影：

锐化矩阵，没有比这个更有名的了：

我们稍微加大一点亮度，中心从5，改为5.5，实测这样提取文字的成功率更高：

private static Mat getGrayContrastMat(Mat srcMat) {  
    Mat highcontrastMat = Mat.zeros(srcMat.size(), CvType.CV_32FC4);  
  
    Imgproc.cvtColor(srcMat, highcontrastMat, Imgproc.COLOR_BGR2GRAY, 4);  
    Imgproc.GaussianBlur(highcontrastMat, highcontrastMat, new Size(3, 3),  
            0);  
  
    Mat kernel = new Mat(3, 3, CvType.CV_32F, new Scalar(0));  
    kernel.put(0, 1, -1.0);  
    kernel.put(1, 0, -1.0);  
    kernel.put(2, 1, -1.0);  
    kernel.put(1, 2, -1.0);  
    kernel.put(1, 1, 5.5);  
    Imgproc.filter2D(highcontrastMat, highcontrastMat,  
            highcontrastMat.depth(), kernel);  
  
    return highcontrastMat;  
}

效果

改完之后，来看看效果视频：

效果不错，正确率100%！再提高一点难度，识别手写体的图片怎么样呢？