Tess-two - Tess-two 文字识别(Tess-two 概述、Tess-two 文字识别、补充情况)

一、Tess-two 概述

  1. Tess-two 是 Tesseract OCR 引擎在 Android 平台上的一个封装库,用于实现离线文字识别

  2. Tess-two 的 GitHub 官网:https://github.com/rmtheis/tess-two


二、Tess-two 文字识别

1、演示
(1)Dependencies
  • 模块级 build.gradle
implementation 'com.rmtheis:tess-two:9.1.0'
(2)Tessdata
  1. 从 Tessdata 仓库 https://github.com/tesseract-ocr/tessdata 下载所需语言包

  2. 例如,eng.traineddata 用于英文、chi_sim.traineddata 用于简体中文

  3. 将下载的 .traineddata 文件放在项目的 src/main/assets 目录下

(3)Manifest
  • AndroidManifest.xml
<uses-permission android:name="android.permission.READ_EXTERNAL_STORAGE" />
<uses-permission android:name="android.permission.WRITE_EXTERNAL_STORAGE" />
(4)Test
  • MainActivity.java
public class MainActivity extends AppCompatActivity {

    public static final String TAG = MainActivity.class.getSimpleName();

    @Override
    protected void onCreate(Bundle savedInstanceState) {
        super.onCreate(savedInstanceState);
        EdgeToEdge.enable(this);
        setContentView(R.layout.activity_main);
        ViewCompat.setOnApplyWindowInsetsListener(findViewById(R.id.main), (v, insets) -> {
            Insets systemBars = insets.getInsets(WindowInsetsCompat.Type.systemBars());
            v.setPadding(systemBars.left, systemBars.top, systemBars.right, systemBars.bottom);
            return insets;
        });

        if (checkSelfPermission(Manifest.permission.READ_EXTERNAL_STORAGE) != PackageManager.PERMISSION_GRANTED
                || checkSelfPermission(Manifest.permission.WRITE_EXTERNAL_STORAGE) != PackageManager.PERMISSION_GRANTED) {
            registerForActivityResult(
                    new ActivityResultContracts.RequestMultiplePermissions(),
                    o -> {
                        for (Map.Entry<String, Boolean> entry : o.entrySet()) {
                            Log.i(TAG, entry.getKey() + " : " + entry.getValue());
                        }

                        boolean allGranted = true;
                        for (Map.Entry<String, Boolean> entry : o.entrySet()) {
                            if (!entry.getValue()) {
                                allGranted = false;
                                break;
                            }
                        }
                        if (allGranted) {
                            test();
                        } else {
                            Log.i(TAG, "权限未全部授予");
                        }
                    }
            ).launch(new String[]{
                    Manifest.permission.READ_EXTERNAL_STORAGE,
                    Manifest.permission.WRITE_EXTERNAL_STORAGE
            });
        } else {
            test();
        }
    }

    private void test() {
        copyTessDataToStorage("chi_sim.traineddata", "eng.traineddata");

        TessBaseAPI tessBaseAPI = new TessBaseAPI();

        String tesseractDirPath = getExternalFilesDir(null) + "/tesseract/";

        boolean initResult = tessBaseAPI.init(tesseractDirPath, "chi_sim+eng");
        if (!initResult) {
            Log.i(TAG, "初始化 Tesseract 失败");
            return;
        }

        Bitmap bitmap = BitmapFactory.decodeResource(getResources(), R.drawable.test_img);

        tessBaseAPI.setImage(bitmap);

        String result = tessBaseAPI.getUTF8Text();

        Log.i(TAG, "result: " + result);
    }

    public void copyTessDataToStorage(String... tessDataFiles) {
        String tessDataDirPath = getExternalFilesDir(null) + "/tesseract/tessdata/";
        File tessDataDir = new File(tessDataDirPath);
        if (!tessDataDir.exists()) {
            tessDataDir.mkdirs();
        }

        AssetManager assetManager = getAssets();

        for (String fileName : tessDataFiles) {
            File outFile = new File(tessDataDirPath + fileName);
            if (outFile.exists()) continue;
            try (InputStream in = assetManager.open(fileName);
                 OutputStream out = new FileOutputStream(outFile)) {
                byte[] buffer = new byte[1024];
                int read;
                while ((read = in.read(buffer)) != -1) {
                    out.write(buffer, 0, read);
                }
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
    }
}
# 输出结果

result: 张 三
2、解读
(1)请求权限
  1. 通过 checkSelfPermission 方法检查是否已有权限,如果已有权限,执行测试代码

  2. 如果没有权限,则使用 Activity Result API 请求权限

  3. 请求完成后,检查所有权限是否都被授予,如果都被授予,执行测试代码

// 检查是否已有权限
if (checkSelfPermission(Manifest.permission.READ_EXTERNAL_STORAGE) != PackageManager.PERMISSION_GRANTED
        || checkSelfPermission(Manifest.permission.WRITE_EXTERNAL_STORAGE) != PackageManager.PERMISSION_GRANTED) {

    // 如果没有权限,请求权限
    registerForActivityResult(
            new ActivityResultContracts.RequestMultiplePermissions(),
            o -> {
                for (Map.Entry<String, Boolean> entry : o.entrySet()) {
                    Log.i(TAG, entry.getKey() + " : " + entry.getValue());
                }

                boolean allGranted = true;
                for (Map.Entry<String, Boolean> entry : o.entrySet()) {
                    if (!entry.getValue()) {
                        allGranted = false;
                        break;
                    }
                }

                // 检查所有权限是否都被授予
                if (allGranted) {

                    // 如果都被授予,执行测试代码
                    test();
                } else {
                    Log.i(TAG, "权限未全部授予");
                }
            }
    ).launch(new String[]{
            Manifest.permission.READ_EXTERNAL_STORAGE,
            Manifest.permission.WRITE_EXTERNAL_STORAGE
    });
} else {

    // 如果已有权限,执行测试代码
    test();
}
(2)复制 Tessdata
  • src/main/assets 目录复制 .traineddata 文件到应用私有存储外部目录的 files/tesseract/tessdata/ 目录
public void copyTessDataToStorage(String... tessDataFiles) {

    // 创建目标目录
    String tessDataDirPath = getExternalFilesDir(null) + "/tesseract/tessdata/";
    File tessDataDir = new File(tessDataDirPath);
    if (!tessDataDir.exists()) {
        tessDataDir.mkdirs();
    }

    AssetManager assetManager = getAssets();

    for (String fileName : tessDataFiles) {
        File outFile = new File(tessDataDirPath + fileName);
        if (outFile.exists()) continue; // 如果文件已存在则跳过
        try (InputStream in = assetManager.open(fileName);
                OutputStream out = new FileOutputStream(outFile)) {
            byte[] buffer = new byte[1024];
            int read;
            while ((read = in.read(buffer)) != -1) {
                out.write(buffer, 0, read);
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}
(3)初始化与识别
  • 调用 init 方法初始化 Tesseract
  1. 第一个参数是包含 Tessdata 目录的父目录,Tessdata 在 files/tesseract/tessdata/ 目录,那么这里就是 files/tesseract/

  2. 第二个参数是语言代码,多个可以用加号 + 连接,chi_sim+eng 表示识别中文和英文

TessBaseAPI tessBaseAPI = new TessBaseAPI();

String tesseractDirPath = getExternalFilesDir(null) + "/tesseract/";

boolean initResult = tessBaseAPI.init(tesseractDirPath, "chi_sim+eng");
if (!initResult) {
    Log.i(TAG, "初始化 Tesseract 失败");
    return;
}
  • 调用 setImage 方法识别,调用 getUTF8Text 获取识别结果
Bitmap bitmap = BitmapFactory.decodeResource(getResources(), R.drawable.test_img);

tessBaseAPI.setImage(bitmap);

String result = tessBaseAPI.getUTF8Text();

Log.i(TAG, "result: " + result);

三、补充情况

1、Bitmap 获取失败的情况
  • 这里从一个不存在的资源文件获取 Bitmap
Bitmap bitmap = BitmapFactory.decodeResource(getResources(), 1001);

Log.i(TAG, "bitmap: " + bitmap);

tessBaseAPI.setImage(bitmap);

String result = tessBaseAPI.getUTF8Text();

Log.i(TAG, "result: " + result);
# 输出结果

bitmap: null
...
FATAL EXCEPTION: main
Process: com.my.ocr_tesseract, PID: 25149
java.lang.RuntimeException: Unable to start activity ComponentInfo{com.my.ocr_tesseract/com.my.ocr_tesseract.MainActivity}: java.lang.RuntimeException: Failed to read bitmap
2、识别连笔字
  • Tess-two 文字识别,识别连笔字的能力有限,推荐使用 ML Kit 数字墨水识别
# 输出结果

result: 
# 输出结果

result: 锄
3、使用应用私有存储内部目录
  • 也可以使用应用私有存储内部目录,这样也不需要请求权限
private void test() {
    copyTessDataToStorage("chi_sim.traineddata", "eng.traineddata");

    TessBaseAPI tessBaseAPI = new TessBaseAPI();

    String tesseractDirPath = getFilesDir() + "/tesseract/";

    boolean initResult = tessBaseAPI.init(tesseractDirPath, "chi_sim+eng");
    if (!initResult) {
        Log.i(TAG, "初始化 Tesseract 失败");
        return;
    }

    Bitmap bitmap = BitmapFactory.decodeResource(getResources(), R.drawable.test_img);

    Log.i(TAG, "bitmap: " + bitmap);
    
    tessBaseAPI.setImage(bitmap);

    String result = tessBaseAPI.getUTF8Text();

    Log.i(TAG, "result: " + result);
}

public void copyTessDataToStorage(String... tessDataFiles) {
    String tessDataDirPath = getFilesDir() + "/tesseract/tessdata/";
    File tessDataDir = new File(tessDataDirPath);
    if (!tessDataDir.exists()) {
        tessDataDir.mkdirs();
    }

    AssetManager assetManager = getAssets();

    for (String fileName : tessDataFiles) {
        File outFile = new File(tessDataDirPath + fileName);
        if (outFile.exists()) continue;
        try (InputStream in = assetManager.open(fileName);
                OutputStream out = new FileOutputStream(outFile)) {
            byte[] buffer = new byte[1024];
            int read;
            while ((read = in.read(buffer)) != -1) {
                out.write(buffer, 0, read);
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值