tessract-ocr文本识别系统网页搭建【PHP】

3 篇文章 0 订阅
2 篇文章 0 订阅

tessract-ocr文本识别系统网页搭建

下面是我搭的网站:

OCR文本识别系统


在安装后tesstact-ocr后,我找到了一个利用php调用ocr接口的方法,在github上有这样一个项目:

https://github.com/thiagoalessio/tesseract-ocr-for-php

这个就是用php封装了一下命令行的ocr接口

我利用composer把源码下载下来,然后编写了一个简单的ocr识别网站

利用一下命令进行下载:

composer requir thiagoalessio/tesseract_ocr


下面展示各个函数具体用法

Tesseract OCR for PHP

A wrapper to work with Tesseract OCR inside PHP.

Total Downloads Build Status Code Climate Test Coverage

Installation

First of all, make sure you have Tesseract OCR installed. (v3.03 or greater)

As a composer dependency

{
    "require": {
        "thiagoalessio/tesseract_ocr": "1.0.0-RC"
    }
}

Usage

Basic usage

Given the following image (text.png):

The quick brown fox jumps over the lazy dog

And the following code:

<?php
echo (new TesseractOCR('text.png'))
    ->run();

The output would be:

The quick brown fox
jumps over the lazy
dog.

Other languages

Given the following image (german.png):

grüßen - Google Translate said it means "to greet" in German

And the following code:

<?php
echo (new TesseractOCR('german.png'))
    ->run();

The output would be:

griifien

Which is not good, but defining a language:

<?php
echo (new TesseractOCR('german.png'))
    ->lang('deu')
    ->run();

Will produce:

grüßen

Multiple languages

Given the following image (multi-languages.png):

The phrase "I each apple sushi", with mixed English, Japanese and Portuguese

And the following code ....

<?php
echo (new TesseractOCR('multi-languages.png'))
    ->lang('eng', 'jpn', 'por')
    ->run();

The output would be:

I eat 寿司 de maçã

Inducing recognition

Given the following image (8055.png):

Number 8055

And the following code ....

<?php
echo (new TesseractOCR('8055.png'))
    ->whitelist(range('A', 'Z'))
    ->run();

The output would be:

BOSS

API

->executable('/path/to/tesseract')

Define a custom location of the tesseract executable, if by any reason it is not present in the $PATH.

->tessdataDir('/path')

Specify a custom location for the tessdata directory.

->userWords('/path/to/user-words.txt')

Specify the location of user words file.

This is a plain text file containing a list of words that you want to be considered as a normal dictionary words by tesseract.

Useful when dealing with contents that contain technical terminology, jargon, etc.

Example of a user words file:

$ cat /path/to/user-words.txt
foo
bar

->userPatterns('/path/to/user-patterns.txt')

Specify the location of user patterns file.

If the contents you are dealing with have known patterns, this option can help a lot tesseract's recognition accuracy.

Example of a user patterns file:

$ cat /path/to/user-patterns.txt'
1-\d\d\d-GOOG-441
www.\n\\\*.com

->lang('lang1', 'lang2', 'lang3')

Define one or more languages to be used during the recognition. A complete list of available languages can be found athttps://github.com/tesseract-ocr/tesseract/blob/master/doc/tesseract.1.asc#languages

Tip from @daijiale: Use the combination ->lang('chi_sim', 'chi_tra') for proper recognition of Chinese.

->psm(6)

Specify the Page Segmentation Mode, which instructs tesseract how to interpret the given image.

Possible psm values are:

 0 = Orientation and script detection (OSD) only.
 1 = Automatic page segmentation with OSD.
 2 = Automatic page segmentation, but no OSD, or OCR.
 3 = Fully automatic page segmentation, but no OSD. (Default)
 4 = Assume a single column of text of variable sizes.
 5 = Assume a single uniform block of vertically aligned text.
 6 = Assume a single uniform block of text.
 7 = Treat the image as a single text line.
 8 = Treat the image as a single word.
 9 = Treat the image as a single word in a circle.
10 = Treat the image as a single character.

->config('configvar', 'value')

Tesseract offers incredible control to the user through its 660 configuration vars.

You can see the complete list by running the following command:

$ tesseract --print-parameters
Tesseract parameters:
... long list with all parameters ...

->whitelist(range('a', 'z'), range(0, 9), '-_@')

This is a shortcut for ->config('tessedit_char_whitelist', 'abcdef....').

Where to get help

  • #tesseract-ocr-for-php on freenode IRC

License

Apache License 2.0.


下面就是编一个php网页了:


我用了bootstrap,和表单进行文件上传,

下面直接上代码:

<!DOCTYPE html>
<html lang="zh-CN">
<head>
<meta charset="UTF-8">
<title>OCR文本识别系统</title>
<!-- 新 Bootstrap 核心 CSS 文件 -->
<link rel="stylesheet" href="http://cdn.bootcss.com/bootstrap/3.3.0/css/bootstrap.min.css">
<!-- 可选的Bootstrap主题文件(一般不用引入) -->
<link rel="stylesheet" href="http://cdn.bootcss.com/bootstrap/3.3.0/css/bootstrap-theme.min.css">
<!-- jQuery文件。务必在bootstrap.min.js 之前引入 -->
<script src="http://cdn.bootcss.com/jquery/1.11.1/jquery.min.js"></script>

<!-- 最新的 Bootstrap 核心 JavaScript 文件 -->
<script src="http://cdn.bootcss.com/bootstrap/3.3.0/js/bootstrap.min.js"></script>


<style type="text/css">
  .form{
    position:absolute;
    left:600px;
    top:100px 

  }
  .image{
    position:absolute;
    left:10px;
    top:60px 
  }
  .retext{
    position:absolute;
    top:370px 
  }

  .body{
    background-image: url("./img/background.jpg");
  }
  .text{
    text-align: center;       
  }
</style>
</head>

<body class = "body">
<?php// $file_path = './img/text.png'?>
<nav class="navbar navbar-inverse" role="navigation">
  <div class="container-fluid">
    <!-- Brand and toggle get grouped for better mobile display -->
    <div class="navbar-header">
      <button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#bs-example-navbar-collapse-1">
        <span class="sr-only">Toggle navigation</span>
        <span class="icon-bar"></span>
        <span class="icon-bar"></span>
        <span class="icon-bar"></span>
      </button>
      <a class="navbar-brand" href="index.php">OCR文本识别系统</a>
    </div>

    <!-- Collect the nav links, forms, and other content for toggling -->
    <div class="collapse navbar-collapse" id="bs-example-navbar-collapse-1">
      <!--<ul class="nav navbar-nav">
        <li class="active"><a href="#">系统介绍</a></li>
        <li class="dropdown">
          <a href="#" class="dropdown-toggle" data-toggle="dropdown">Dropdown <span class="caret"></span></a>
          <ul class="dropdown-menu" role="menu">
            <li><a href="#">Action</a></li>
            <li><a href="#">Another action</a></li>
            <li><a href="#">Something else here</a></li>
            <li class="divider"></li>
            <li><a href="#">Separated link</a></li>
            <li class="divider"></li>
            <li><a href="#">One more separated link</a></li>
          </ul>
        </li>
      </ul>
-->
      <!--<form class="navbar-form navbar-left" role="search">
        <div class="form-group">
          <input type="text" class="form-control" placeholder="Search">
        </div>
        <button type="submit" class="btn btn-default">Submit</button>
      </form>
      -->
      <ul class="nav navbar-nav navbar-right">
        <li><a href="#">系统介绍</a></li>
        <li><a href="#">实验室介绍</a></li>
        <li class="dropdown">
          <a href="#" class="dropdown-toggle" data-toggle="dropdown">参考资源 <span class="caret"></span></a>
          <ul class="dropdown-menu" role="menu">
            <li><a href="https://github.com/thiagoalessio/tesseract-ocr-for-php">tesseract-ocr-for-php</a></li>
            <li><a href="http://getbootstrap.com/">Bootstrap</a></li>
            <li><a href="http://www.w3school.com.cn/">W3School</a></li>
            <li class="divider"></li>
            <li><a href="https://github.com/tesseract-ocr/tesseract">tesseract-ocr</a></li>
          </ul>
        </li>
      </ul>
    </div><!-- /.navbar-collapse -->
  </div><!-- /.container-fluid -->
</nav>





<div class = "form">
  <form  action="" method="post" enctype="multipart/form-data">
    <label for="file">上传图片:</label>
      <input type="file" name="file" id="file" /> 
      <br />
      <p>请选择语言类型(可多选):</p> 
      <div class="checkbox">
        <label>
         <input type="checkbox" value="English" name ="mrbook[]" >
         English
        </label>
      </div>
      <div class="checkbox">
       <label>
        <input type="checkbox" value="中文" name ="mrbook[]">
        中文
       </label>
     </div>

     <div class="checkbox">
     <label>
       <input type="checkbox" value="Deutsch" name ="mrbook[]">
       Deutsch
     </label>
     </div>
     <div class="checkbox">
       <label>
       <input type="checkbox" value="한국의" name ="mrbook[]">
         한국의
       </label>
     </div>
      <input type="submit" name="submit" class="btn btn-success" value="Submit" />
</form>
</div>


<?php

    if(!empty($_FILES['file']))
    {
        $file_path = sprintf("./upload/%s",$_FILES['file']['name']);
        if(!move_uploaded_file($_FILES["file"]["tmp_name"],
            $file_path))
            echo $_FILES["file"]["error"];
        for($i=0 ;$i<count($_POST[mrbook]);$i++)
            if(strcmp($_POST[mrbook][$i],"English") == 0)
                $lan_type = sprintf("%s %s",$lan_type,"eng");
            else if(strcmp($_POST[mrbook][$i],"中文") == 0)
                $lan_type = sprintf("%s %s",$lan_type,"chi_sim");

    }


    require '../vendor/autoload.php';
    $ocr = new \TesseractOCR($file_path);
    $string = $ocr ->lang($lan_type) ->run();

?>
<div class = "image">
   <p>
    <img src=<?php if(!empty($file_path)) echo $file_path; else echo './img/180.jpg';?> width="440" height="300" />
   </p>
</div>

<div class = "retext" style ="width :100%">
<p>识别结果:</p>
<textarea class="form-control" rows="10"><?php echo $string; ?></textarea>
</div>



<!--
<p>
<img src="./img/text.png" width="128" height="128" />
</p>
-->
<?php
/*require '../vendor/autoload.php';
$ocr = new \TesseractOCR('./img/text.png');
$string = $ocr->run();
echo $string;
 */
?>


</body>
</html>


以后还会再优化,代码很简单,不再赘述。
写的不太好,请指出不足^.^!


  • 1
    点赞
  • 5
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值