一、Tess-two 概述
-
Tess-two 是 Tesseract OCR 引擎在 Android 平台上的一个封装库,用于实现离线文字识别
-
Tess-two 的 GitHub 官网:
https://github.com/rmtheis/tess-two
二、Tess-two 文字识别
1、演示
(1)Dependencies
- 模块级 build.gradle
implementation 'com.rmtheis:tess-two:9.1.0'
(2)Tessdata
-
从 Tessdata 仓库
https://github.com/tesseract-ocr/tessdata
下载所需语言包 -
例如,
eng.traineddata
用于英文、chi_sim.traineddata
用于简体中文 -
将下载的
.traineddata
文件放在项目的src/main/assets
目录下
(3)Manifest
- AndroidManifest.xml
<uses-permission android:name="android.permission.READ_EXTERNAL_STORAGE" />
<uses-permission android:name="android.permission.WRITE_EXTERNAL_STORAGE" />
(4)Test
- MainActivity.java
public class MainActivity extends AppCompatActivity {
public static final String TAG = MainActivity.class.getSimpleName();
@Override
protected void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
EdgeToEdge.enable(this);
setContentView(R.layout.activity_main);
ViewCompat.setOnApplyWindowInsetsListener(findViewById(R.id.main), (v, insets) -> {
Insets systemBars = insets.getInsets(WindowInsetsCompat.Type.systemBars());
v.setPadding(systemBars.left, systemBars.top, systemBars.right, systemBars.bottom);
return insets;
});
if (checkSelfPermission(Manifest.permission.READ_EXTERNAL_STORAGE) != PackageManager.PERMISSION_GRANTED
|| checkSelfPermission(Manifest.permission.WRITE_EXTERNAL_STORAGE) != PackageManager.PERMISSION_GRANTED) {
registerForActivityResult(
new ActivityResultContracts.RequestMultiplePermissions(),
o -> {
for (Map.Entry<String, Boolean> entry : o.entrySet()) {
Log.i(TAG, entry.getKey() + " : " + entry.getValue());
}
boolean allGranted = true;
for (Map.Entry<String, Boolean> entry : o.entrySet()) {
if (!entry.getValue()) {
allGranted = false;
break;
}
}
if (allGranted) {
test();
} else {
Log.i(TAG, "权限未全部授予");
}
}
).launch(new String[]{
Manifest.permission.READ_EXTERNAL_STORAGE,
Manifest.permission.WRITE_EXTERNAL_STORAGE
});
} else {
test();
}
}
private void test() {
copyTessDataToStorage("chi_sim.traineddata", "eng.traineddata");
TessBaseAPI tessBaseAPI = new TessBaseAPI();
String tesseractDirPath = getExternalFilesDir(null) + "/tesseract/";
boolean initResult = tessBaseAPI.init(tesseractDirPath, "chi_sim+eng");
if (!initResult) {
Log.i(TAG, "初始化 Tesseract 失败");
return;
}
Bitmap bitmap = BitmapFactory.decodeResource(getResources(), R.drawable.test_img);
tessBaseAPI.setImage(bitmap);
String result = tessBaseAPI.getUTF8Text();
Log.i(TAG, "result: " + result);
}
public void copyTessDataToStorage(String... tessDataFiles) {
String tessDataDirPath = getExternalFilesDir(null) + "/tesseract/tessdata/";
File tessDataDir = new File(tessDataDirPath);
if (!tessDataDir.exists()) {
tessDataDir.mkdirs();
}
AssetManager assetManager = getAssets();
for (String fileName : tessDataFiles) {
File outFile = new File(tessDataDirPath + fileName);
if (outFile.exists()) continue;
try (InputStream in = assetManager.open(fileName);
OutputStream out = new FileOutputStream(outFile)) {
byte[] buffer = new byte[1024];
int read;
while ((read = in.read(buffer)) != -1) {
out.write(buffer, 0, read);
}
} catch (IOException e) {
e.printStackTrace();
}
}
}
}

# 输出结果
result: 张 三
2、解读
(1)请求权限
-
通过 checkSelfPermission 方法检查是否已有权限,如果已有权限,执行测试代码
-
如果没有权限,则使用 Activity Result API 请求权限
-
请求完成后,检查所有权限是否都被授予,如果都被授予,执行测试代码
// 检查是否已有权限
if (checkSelfPermission(Manifest.permission.READ_EXTERNAL_STORAGE) != PackageManager.PERMISSION_GRANTED
|| checkSelfPermission(Manifest.permission.WRITE_EXTERNAL_STORAGE) != PackageManager.PERMISSION_GRANTED) {
// 如果没有权限,请求权限
registerForActivityResult(
new ActivityResultContracts.RequestMultiplePermissions(),
o -> {
for (Map.Entry<String, Boolean> entry : o.entrySet()) {
Log.i(TAG, entry.getKey() + " : " + entry.getValue());
}
boolean allGranted = true;
for (Map.Entry<String, Boolean> entry : o.entrySet()) {
if (!entry.getValue()) {
allGranted = false;
break;
}
}
// 检查所有权限是否都被授予
if (allGranted) {
// 如果都被授予,执行测试代码
test();
} else {
Log.i(TAG, "权限未全部授予");
}
}
).launch(new String[]{
Manifest.permission.READ_EXTERNAL_STORAGE,
Manifest.permission.WRITE_EXTERNAL_STORAGE
});
} else {
// 如果已有权限,执行测试代码
test();
}
(2)复制 Tessdata
- 从
src/main/assets
目录复制.traineddata
文件到应用私有存储外部目录的files/tesseract/tessdata/
目录
public void copyTessDataToStorage(String... tessDataFiles) {
// 创建目标目录
String tessDataDirPath = getExternalFilesDir(null) + "/tesseract/tessdata/";
File tessDataDir = new File(tessDataDirPath);
if (!tessDataDir.exists()) {
tessDataDir.mkdirs();
}
AssetManager assetManager = getAssets();
for (String fileName : tessDataFiles) {
File outFile = new File(tessDataDirPath + fileName);
if (outFile.exists()) continue; // 如果文件已存在则跳过
try (InputStream in = assetManager.open(fileName);
OutputStream out = new FileOutputStream(outFile)) {
byte[] buffer = new byte[1024];
int read;
while ((read = in.read(buffer)) != -1) {
out.write(buffer, 0, read);
}
} catch (IOException e) {
e.printStackTrace();
}
}
}
(3)初始化与识别
- 调用 init 方法初始化 Tesseract
-
第一个参数是包含 Tessdata 目录的父目录,Tessdata 在
files/tesseract/tessdata/
目录,那么这里就是files/tesseract/
-
第二个参数是语言代码,多个可以用加号
+
连接,chi_sim+eng
表示识别中文和英文
TessBaseAPI tessBaseAPI = new TessBaseAPI();
String tesseractDirPath = getExternalFilesDir(null) + "/tesseract/";
boolean initResult = tessBaseAPI.init(tesseractDirPath, "chi_sim+eng");
if (!initResult) {
Log.i(TAG, "初始化 Tesseract 失败");
return;
}
- 调用 setImage 方法识别,调用 getUTF8Text 获取识别结果
Bitmap bitmap = BitmapFactory.decodeResource(getResources(), R.drawable.test_img);
tessBaseAPI.setImage(bitmap);
String result = tessBaseAPI.getUTF8Text();
Log.i(TAG, "result: " + result);
三、补充情况
1、Bitmap 获取失败的情况
- 这里从一个不存在的资源文件获取 Bitmap
Bitmap bitmap = BitmapFactory.decodeResource(getResources(), 1001);
Log.i(TAG, "bitmap: " + bitmap);
tessBaseAPI.setImage(bitmap);
String result = tessBaseAPI.getUTF8Text();
Log.i(TAG, "result: " + result);
# 输出结果
bitmap: null
...
FATAL EXCEPTION: main
Process: com.my.ocr_tesseract, PID: 25149
java.lang.RuntimeException: Unable to start activity ComponentInfo{com.my.ocr_tesseract/com.my.ocr_tesseract.MainActivity}: java.lang.RuntimeException: Failed to read bitmap
2、识别连笔字
- Tess-two 文字识别,识别连笔字的能力有限,推荐使用 ML Kit 数字墨水识别

# 输出结果
result:

# 输出结果
result: 锄
3、使用应用私有存储内部目录
- 也可以使用应用私有存储内部目录,这样也不需要请求权限
private void test() {
copyTessDataToStorage("chi_sim.traineddata", "eng.traineddata");
TessBaseAPI tessBaseAPI = new TessBaseAPI();
String tesseractDirPath = getFilesDir() + "/tesseract/";
boolean initResult = tessBaseAPI.init(tesseractDirPath, "chi_sim+eng");
if (!initResult) {
Log.i(TAG, "初始化 Tesseract 失败");
return;
}
Bitmap bitmap = BitmapFactory.decodeResource(getResources(), R.drawable.test_img);
Log.i(TAG, "bitmap: " + bitmap);
tessBaseAPI.setImage(bitmap);
String result = tessBaseAPI.getUTF8Text();
Log.i(TAG, "result: " + result);
}
public void copyTessDataToStorage(String... tessDataFiles) {
String tessDataDirPath = getFilesDir() + "/tesseract/tessdata/";
File tessDataDir = new File(tessDataDirPath);
if (!tessDataDir.exists()) {
tessDataDir.mkdirs();
}
AssetManager assetManager = getAssets();
for (String fileName : tessDataFiles) {
File outFile = new File(tessDataDirPath + fileName);
if (outFile.exists()) continue;
try (InputStream in = assetManager.open(fileName);
OutputStream out = new FileOutputStream(outFile)) {
byte[] buffer = new byte[1024];
int read;
while ((read = in.read(buffer)) != -1) {
out.write(buffer, 0, read);
}
} catch (IOException e) {
e.printStackTrace();
}
}
}