基于HoloLens2的字符识别翻译系统

La Go

已于 2023-06-12 11:30:10 修改

阅读量449

点赞数 3

文章标签： python c# hololens 混合现实

于 2023-06-12 11:23:20 首次发布

本文链接：https://blog.csdn.net/LaGod/article/details/131155869

版权

文章目录

前言
一、前端
二、后端
- 1. 建立服务器
- 2. 字符识别翻译
三、打包程序
四、效果展示

前言

本人于2023年四月完成了基于HoloLens2的本科毕业设计，在制作过程中发现相关资料较少，故希望分享自己探索的过程以作学习交流使用。
本毕设使用了HoloLens2, Unity3D, MRTK, C#, Python,您需要对以上设备或相关技术有较为初步的了解。

一、前端

前端框架图
为了实现本系统的功能，您需要在Unity3D中正确建立模型，同时编写C#脚本。本部分将对前端实现过程进行阐述。

此部分在实现过程中参考了很多前辈的经验，十分感谢！在此列举对我帮助最大的两篇文章：

Hololens2脚本使用RGB传感器拍照并上传到后端
 HoloLens2开发入门教程

1. Unity3D建模

可以使用MRTK中的资源包，选择喜好的模型进行快速配置。

1.1 触发模型

带有“Translate”字样的黄色奶酪模型即为触发模型，其作用为使用者在点击触发模型时即可调用系统摄像头

在这里插入图片描述

1.2 主摄像机

白色摄像机图标表示了人眼在虚拟空间中的位置，白色细框表示了视野的范围，可以调整摄像机的位置以改变虚拟空间的初始可见样式。

在这里插入图片描述

1.3 结果显示框

字样“Translate Result Screener”所在文本框表示结果显示框，右侧“检查器”中展示了该文本框和文本框内字符的各项参数。因后续将绑定脚本，故可以在接收翻译结果后在此文本框内实时更新显示结果。

在这里插入图片描述

2. C#脚本

Unity3D中可以为模型编写C#脚本，使其可以完成特定功能。

2.1 调用系统摄像头（MyPhotoCapture）

为了获取所需翻译的图像，需要调用HoloLens2的系统摄像头，代码如下：

using System.Collections;
using System.Collections.Generic;
using UnityEngine;
using UnityEngine.Windows.WebCam;
using System.Linq;
using UnityEngine.Networking;


public class test : MonoBehaviour
{
    void Start()
    {

    }
    void Update()
    {

    }

    private void FixedUpdate()
    {
        MyPhotoCapture myPhotoCapture = new MyPhotoCapture();
        myPhotoCapture.StartCapture();
    }
}


public class MyPhotoCapture : MonoBehaviour
{
    PhotoCapture photoCaptureObject = null;

    internal bool captureIsActive;

    public void StartCapture()
    {
        if (!captureIsActive)
        {
            captureIsActive = true;
            PhotoCapture.CreateAsync(false, OnPhotoCaptureCreated);
        }
        else
        {
            captureIsActive = false;
        }
    }

    void OnPhotoCaptureCreated(PhotoCapture captureObject)
    {
        photoCaptureObject = captureObject;

        Resolution cameraResolution = PhotoCapture.SupportedResolutions.OrderByDescending((res) => res.width * res.height).First();


        var cameraParams = new CameraParameters()
        {
            hologramOpacity = 0f,
            cameraResolutionWidth = cameraResolution.width,
            cameraResolutionHeight = cameraResolution.height,
            pixelFormat = CapturePixelFormat.JPEG
        };

        captureObject.StartPhotoModeAsync(cameraParams, OnPhotoModeStarted);

    }

    private void OnPhotoModeStarted(PhotoCapture.PhotoCaptureResult result)
    {
        if (result.success)
        {
            photoCaptureObject.TakePhotoAsync((photoCaptureResult, frame) =>
            {
                if (photoCaptureResult.success)
                {
                    Debug.Log("Photo capture done.");

                    var buffer = new List<byte>();
                    frame.CopyRawImageDataIntoBuffer(buffer);
                    StartCoroutine(CustomVisionAnalyser.Instance.AnalyseLastImageCaptured(buffer.ToArray()));
                }
                photoCaptureObject.StopPhotoModeAsync(OnStoppedPhotoMode);
            });
        }
        else
        {
            Debug.LogError("Unable to start photo mode!");
        }
    }

    void OnStoppedPhotoMode(PhotoCapture.PhotoCaptureResult result)
    {
        photoCaptureObject.Dispose();
        photoCaptureObject = null;

        captureIsActive = false;
    }
}

请注意，该脚本需要挂靠至1.1 触发脚本，同时指定在某个Event下触发该脚本中的某个函数功能。以我的程序为例，我指定在对触发模型进行操作时（On Manipulation Started）调用该脚本中的StartCapture函数。

在这里插入图片描述

2.2 与后端的数据通信（CustomVisionAnalyser）

为了将拍摄图像传输至后端，需要编写脚本实现数据通信，代码如下：

using System.Collections;
using System.Collections.Generic;
using UnityEngine;
using UnityEngine.Windows.WebCam;
using System.Linq;
using UnityEngine.Networking;
using UnityEngine.UI;
using TMPro;

public class CustomVisionAnalyser : MonoBehaviour
{

    public static CustomVisionAnalyser Instance;
    public string response = "Please Click Cheese";
    private string predictionEndpoint = "http://192.168.54.236:8000/file";

    private string translateresults;
    public string TranslateResults
    {
        get
        {
            return translateresults;
        }
        set
        {
            translateresults = value;
        }
    }

    private void Awake()
    {
        Instance = this;
    }

    public IEnumerator AnalyseLastImageCaptured(byte[] imageBytes)
    {
        WWWForm webForm = new WWWForm();
        webForm.AddBinaryData("file", imageBytes, "photo.jpg");

        using (UnityWebRequest unityWebRequest = UnityWebRequest.Post(predictionEndpoint, webForm))
        {
            // The download handler will help receiving the analysis from Azure
            unityWebRequest.downloadHandler = new DownloadHandlerBuffer();

            yield return unityWebRequest.SendWebRequest();

            if (unityWebRequest.isHttpError || unityWebRequest.isNetworkError)
            {
                Debug.Log(unityWebRequest.error);
            }
            else
            {
                response = unityWebRequest.downloadHandler.text;
                Debug.Log(response);
            }
            }
        }

    }

请注意：private string predictionEndpoint = “http://192.168.54.236:8000/file”; 中的192.168.54.236:8000是服务器地址，该部分需要根据实际服务器地址进行修改，此部分将在后续后端部分进一步解释。

同理，此脚本亦需要绑定至1.1 触发模型,从而能够将调用系统摄像机拍摄后的图像进行数据通信，但因此脚本不需要完成“触发特定动作后执行某个函数”，所以在绑定后，在Event内选择No Function即可。

在这里插入图片描述

同时，该脚本也需要和1.3 结果显示框绑定，这样结果显示框才能接收到翻译结果Result。
在这里插入图片描述

2.3 结果显示（ResultScreener）

获取后端翻译结果后，需要将结果实时显示在文本框内，代码如下：

using System.Collections;
using System.Collections.Generic;
using UnityEngine;
using TMPro;
using UnityEngine.Windows.WebCam;
using System.Linq;
using UnityEngine.Networking;
using UnityEngine.UI;

public class ResultScreener : MonoBehaviour
{
        public TextMeshProUGUI Text;
        public CustomVisionAnalyser Result;
        string translateresult;

        void Start()
        {
            //response = "Hi";
            Text = transform.GetComponent<TextMeshProUGUI>();
        }

        void Update()
        {
            translateresult = Result.response;
            Text.text = translateresult.ToString();
        }

}

显然，该脚本需要与1.3 结果显示框绑定。

在这里插入图片描述

二、后端

1. 建立服务器

为实现HoloLens2与后端的数据通信，通过FastAPI搭建了一个简易的服务器，使用POST通信方式，令PC端磁盘内一个文件夹作为接收HoloLens2拍摄照片的传输地址，同时传输翻译结果回HoloLens2。

若您想使用FastAPI搭建服务器，您可以按照FastAPI简单入门或者参考FastAPI官方简单了解和安装环境。

import json
import uvicorn
from fastapi import FastAPI
from fastapi import File
from TransAPI import translate_api
from uuid import uuid4

app = FastAPI()

@app.post("/file")
async def file_upload(file: bytes = File(..., max_length=2097152)):
    path = "D:\\Program\\tmp\\CapturePhoto.jpg"
    with open(path, "wb") as f:
        f.write(file)
    res = translate_api(path)
    # print(res)

    try:
        translated_strs = json.loads(res)['data']['content']
        # a = 1
        # for i in translated_strs:
        #     print(str(a) + ". src:", i['src'], "| dst: ", i['dst'])
        #     a += 1
        # return res
        print(translated_strs[0]['dst'])
        return translated_strs[0]['dst']

    except Exception as e:
        return {"msg": "Internal Error", "error": str(e)}
    # return {"file_size": len(file)}

if __name__ == '__main__':
    uvicorn.run('server:app', host='0.0.0.0', port=8000, reload=True, workers=1)

2. 字符识别翻译

为了增加毕设的工作量，我编写并训练了基于KNN算法的字符识别模型，之后传输至百度翻译API进行翻译。因百度翻译API支持传输图片进行翻译，且效果较好，故在此不再赘述我的字符识别工作，如下将给出调用百度翻译API的代码。因图片内可能不止一处需要翻译，百度翻译API翻译结果返回值为坐标值和翻译结果，为方便，仅取最靠前坐标（最靠左上角部分）的翻译结果。

# -*- coding: utf-8 -*-
import requests
import random
import json
import os
import sys
from hashlib import md5

def translate_api(file_name):
    def get_md5(string, encoding='utf-8'):
        return md5(string.encode(encoding)).hexdigest()

    def get_file_md5(file_name):
        with open(file_name, 'rb') as f:
            data = f.read()
            return md5(data).hexdigest()

    endpoint = 'http://api.fanyi.baidu.com'
    path = '/api/trans/sdk/picture'
    url = endpoint + path

    from_lang = 'en'
    to_lang = 'zh'

    # Set your own appid/appkey.
    app_id = '20230406001631266'
    app_key = 'ueEha_eEZTd2aIe0Iy6K'

    # cuid & mac
    cuid = 'APICUID'
    mac = 'mac'

    salt = random.randint(32768, 65536)
    sign = get_md5(app_id + get_file_md5(file_name) + str(salt) + cuid + mac + app_key)

    # Build request
    payload = {'from': from_lang, 'to': to_lang, 'appid': app_id, 'salt': salt, 'sign': sign, 'cuid': cuid, 'mac': mac}
    image = {'image': (os.path.basename(file_name), open(file_name, 'rb'), "multipart/form-data")}

    # Send request
    response = requests.post(url, params=payload, files=image)
    result = response.json()

    # Show response
    #print(json.dumps(result, indent = 4, ensure_ascii = False))
    res = json.dumps(result, indent=4, ensure_ascii=False)

    return res