使用WinRT OCR API的WPF中的OCR

最新推荐文章于 2025-03-18 10:53:50 发布

cunhan4654

最新推荐文章于 2025-03-18 10:53:50 发布

阅读量782

点赞数

文章标签： python java linux android 人工智能

原文链接：https://www.codeproject.com/Articles/5276805/OCR-in-WPF-using-the-WinRT-OCR-API

版权

介绍 (Introduction)

Optical Character Recognition (OCR) is one of the Windows Runtime features that is currently accessible to WPF and WinForms applications. This is made possible thanks to the Windows 10 WinRT API Pack which provides quick and easy access to a number of WinRT APIs, including the OCR API. This article will take a look at how to go about using the OCR API in a WPF application.

光学字符识别(OCR)是WPF和WinForms应用程序当前可访问的Windows运行时功能之一。 Windows 10 WinRT API Pack使得这一切成为可能，它提供了快速简便的访问许多WinRT API的方法，包括OCR API。本文将介绍如何在WPF应用程序中使用OCR API。

背景 (Background)

The sample project for this article is a .NET Core WPF application which contains a button for launching an open file dialog, used to select an image; a combo box for selecting the language to use for text extraction; a button for executing text extraction; and a button for copying the extracted text onto the clipboard. The language codes listed in the combo box represent languages installed on a device.

本文的示例项目是.NET Core WPF应用程序，其中包含用于启动打开文件对话框的按钮，该按钮用于选择图像。一个组合框，用于选择用于文本提取的语言；用于执行文本提取的按钮；一个按钮，用于将提取的文本复制到剪贴板上。组合框中列出的语言代码代表设备上安装的语言。

Sample application

样品申请

WinRT OCR (WinRT OCR)

The WinRT OCR API is a highly optimized optical character recognition system that currently supports 26 languages and works without requiring an internet connection. The API can extract text from a wide variety of images; from scanned documents to photos with text in natural scene images.

WinRT OCR API是高度优化的光学字符识别系统，当前支持26种语言，并且无需互联网即可运行。该API可以从各种图像中提取文本。从扫描的文档到自然场景图像中带有文字的照片。

Natural scene image text extraction

自然场景图像文本提取

To use the API in a WPF application, you have to reference the Microsoft.Windows.SDK.Contracts NuGet package and install languages which you intend to use for text extraction. If you attempt to do an extraction using a language that isn't installed on a device or isn't supported by the API, the extraction process will fail.

若要在WPF应用程序中使用该API，必须引用Microsoft.Windows.SDK.Contracts NuGet程序包并安装要用于文本提取的语言。如果您尝试使用设备上未安装的语言或API不支持的语言进行提取，提取过程将失败。

Some languages installed on a machine

机器上安装的某些语言

提取文字 (Extracting Text)

With the Microsoft.Windows.SDK.Contracts package installed, using the OCR API is a very simple affair. In the sample project, the ExtractText() method in the OcrService class calls the RecognizeAsync() method, of the API's OcrEngine class, to extract text from a specified image using a specific language code.

安装了Microsoft.Windows.SDK.Contracts程序包后，使用OCR API是一件非常简单的事情。在示例项目中， OcrService类中的ExtractText()方法调用API的OcrEngine类的RecognizeAsync()方法，以使用特定语言代码从指定图像中提取文本。

      public async Task<string> ExtractText(string image, string languageCode)
      {
          ... 

          if (!GlobalizationPreferences.Languages.Contains(languageCode))
              throw new ArgumentOutOfRangeException($"{languageCode} is not installed.");
      
          StringBuilder text = new StringBuilder();
      
          await using (var fileStream = File.OpenRead(image))
          {
              var bmpDecoder = 
                  await BitmapDecoder.CreateAsync(fileStream.AsRandomAccessStream());
              var softwareBmp = await bmpDecoder.GetSoftwareBitmapAsync();
      
              var ocrEngine = OcrEngine.TryCreateFromLanguage(new Language(languageCode));
              var ocrResult = await ocrEngine.RecognizeAsync(softwareBmp);
      
              foreach (var line in ocrResult.Lines) text.AppendLine(line.Text);
          }
      
          return text.ToString();
      }

In ExtractText, an ArgumentOutOfRangeException is thrown if the specified language code doesn't represent any language installed on a device. To get the resultant text to closely match the layout of the text in the image, I'm getting each line of extracted text and adding it to a StringBuilder before returning the overall text.

在ExtractText ，如果指定的语言代码不代表设备上安装的任何语言，则抛出ArgumentOutOfRangeException 。为了使生成的文本与图像中文本的布局紧密匹配，我获取了提取文本的每一行并将其添加到StringBuilder然后返回整个文本。

Text can also be extracted from an image by using a device's first preferred language. This is done by calling the OcrEngine's TryCreateFromUserProfileLanguages() method.

也可以使用设备的首选首选语言从图像中提取文本。这是通过调用OcrEngine的TryCreateFromUserProfileLanguages()方法来完成的。

public async Task<string> ExtractText(string image)
{
    ...

    StringBuilder text = new StringBuilder();

    await using (var fileStream = File.OpenRead(image))
    {
        var bmpDecoder =
            await BitmapDecoder.CreateAsync(fileStream.AsRandomAccessStream());
        var softwareBmp = await bmpDecoder.GetSoftwareBitmapAsync();

        var ocrEngine = OcrEngine.TryCreateFromUserProfileLanguages();
        var ocrResult = await ocrEngine.RecognizeAsync(softwareBmp);

        foreach (var line in ocrResult.Lines) text.AppendLine(line.Text);
    }

    return text.ToString();
}

Using the code above, if a device's first preferred language is simplified Chinese, and the text in the image is also simplified Chinese, then the text extraction will be done successfully. The sample project's MainWindowViewModel uses the ExtractText() method that requires a language code to be passed as a parameter.

使用上面的代码，如果设备的首选语言是简体中文，并且图像中的文本也是简体中文，则文本提取将成功完成。示例项目的MainWindowViewModel使用ExtractText()方法，该方法要求将语言代码作为参数传递。

public class MainWindowViewModel : ViewModelBase
{
    private readonly IDialogService dialogService;
    private readonly IOcrService ocrService;

    public MainWindowViewModel(IDialogService dialogSvc, IOcrService ocrSvc)
    {
        dialogService = dialogSvc;
        ocrService = ocrSvc;
    }

    // Language codes of installed languages.
    public List<string> InstalledLanguages => GlobalizationPreferences.Languages.ToList();

    private string _imageLanguageCode;
    public string ImageLanguageCode
    {
        get => _imageLanguageCode;
        set
        {
            _imageLanguageCode = value;
            OnPropertyChanged();
        }
    }

    private string _selectedImage;
    public string SelectedImage
    {
        get => _selectedImage;
        set
        {
            _selectedImage = value;
            OnPropertyChanged();
        }
    }

    private string _extractedText;
    public string ExtractedText
    {
        get => _extractedText;
        set
        {
            _extractedText = value;
            OnPropertyChanged();
        }
    }

    #region Select Image Command

    private RelayCommand _selectImageCommand;
    public RelayCommand SelectImageCommand =>
        _selectImageCommand ??= new RelayCommand(_ => SelectImage());

    private void SelectImage()
    {
        string image = dialogService.OpenFile("Select Image",
            "Image (*.jpg; *.jpeg; *.png; *.bmp)|*.jpg; *.jpeg; *.png; *.bmp");

        if (string.IsNullOrWhiteSpace(image)) return;

        SelectedImage = image;
        ExtractedText = string.Empty;
    }

    #endregion

    #region Extract Text Command

    private RelayCommandAsync _extractTextCommand;
    public RelayCommandAsync ExtractTextCommand =>
        _extractTextCommand ??= new RelayCommandAsync(ExtractText, _ => CanExtractText());

    private async Task ExtractText()
    {
        ExtractedText = await ocrService.ExtractText(SelectedImage, ImageLanguageCode);
    }

    private bool CanExtractText() => !string.IsNullOrWhiteSpace(ImageLanguageCode) &&
                                     !string.IsNullOrWhiteSpace(SelectedImage);

    #endregion

    #region Copy Text to Clipboard Command

    private RelayCommand _copyTextToClipboardCommand;
    public RelayCommand CopyTextToClipboardCommand => _copyTextToClipboardCommand ??=
        new RelayCommand(_ => CopyTextToClipboard(), _ => CanCopyTextToClipboard());

    private void CopyTextToClipboard() => Clipboard.SetData(DataFormats.Text, _extractedText);

    private bool CanCopyTextToClipboard() => !string.IsNullOrWhiteSpace(_extractedText);

    #endregion
}

结论 (Conclusion)

As you can see, using the WinRT OCR API is quite a simple affair and it also works quite well in most cases. In comparison to Tesseract, I think it's a far better option, especially if you intend on extracting text from natural scene images.

如您所见，使用WinRT OCR API相当简单，并且在大多数情况下也能很好地工作。与Tesseract相比，我认为这是一个更好的选择，尤其是如果您打算从自然场景图像中提取文本。