光学字符识别技术的功能_光学音乐识别技术的发展和主要挑战-CSDN博客

本文探讨了光学字符识别(OCR)技术的功能，以及光学音乐识别(OMR)技术的最新发展和所面临的挑战。随着人工智能的进步，这两种技术在大数据和编程语言的支持下，正在不断提升准确性和实用性。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

光学字符识别技术的功能

Recently I got my very first paper accepted to the International Conference on Technologies for Music Notation and Representation (TENOR) 2020. The journey of getting published was very insightful and it will serve as my own guide to publishing in the future.

最近，我收到了我的第一篇论文，该论文被2020年国际音乐符号表示法 (TENOR) 大会接受。出版的过程非常有见地，它将作为我自己将来的出版指南。

The paper summarises prior work and takes a position in progressing the field of my research topic — Optical Music Recognition (OMR). You can read more about OMR in my previous article. I have heard the pros and cons of publishing a position paper at the beginning of my academic journey. However, writing this paper made me doubt myself which always resulted in learning more.

本文总结了先前的工作，并在推进我的研究主题-光学音乐识别(OMR)领域中占据一席之地。您可以在上一篇文章中阅读有关OMR的更多信息。我在学业之初就听说过发表立场书的利弊。但是，撰写本文使我怀疑自己，这总是导致学习更多。

Back to the actual content of the paper, I try to summarize the four main stages of the OMR pipeline variety of published work in each stage. Furthermore, I try to capture the paradigm shift in the methods used in OMR from conventional computer vision systems to end-to-end deep learning networks.

回到本文的实际内容，我尝试总结OMR管道在每个阶段发布的各种作品的四个主要阶段。此外，我试图捕捉OMR中使用的方法从传统的计算机视觉系统到端到端深度学习网络的范式转变。

Image for post — Overall OMR traditional pipeline [13]

Initially, the four stages of OMR included image preprocessing, musical object detection, musical symbol reconstruction and finally encoding the musical knowledge into a machine-readable file. In the image preprocessing stage mainly enhancement, de-skewing, blurring, noise removal and binarisation were applied [1, 2, 3, 4, 5]. Binarisation is the process of converting an image to binary (only black and white pixels). Initially, such processes were performed using traditional techniques such as choosing a binarisation threshold based on the global histogram of the image. Later on, for instance, binarisation is done using sectional auto-encoders [6, 7]. These encoders learn an end-to-end transformation for the binarisation.

最初，OMR的四个阶段包括图像预处理，音乐对象检测，音乐符号重构，最后将音乐知识编码为机器可读文件。在图像预处理阶段，主要应用了增强，去歪斜，模糊，噪声消除和二值化[1,2,3,4,5]。二值化是将图像转换为二进制(仅黑白像素)的过程。最初，使用传统技术(例如，基于图像的全局直方图选择二值化阈值)执行此类过程。稍后，例如，使用分段自动编码器[6，7]进行二值化。这些编码器学习二进制化的端到端转换。

Moving on to musical symbol detection, this stage has three substages: staff-processing, musical symbol processing and finally classification. In staff-processing, staff lines are first detected and depending on the study removed. Lately, Pacha et al. using object detection techniques proved that removing staff lines does not guarantee better performance [8].

继续进行音乐符号检测，此阶段包括三个子阶段：人员处理，音乐符号处理以及最终分类。在人员处理中，首先会检测人员线，具体取决于删除的研究。最近，Pacha等。使用对象检测技术证明，删除人员线并不能保证更好的性能[8]。

The musical object detection stage has largely benefited from the state of the art in computer vision, especially from object detection in general. Models such as Fast R-CNNs, Faster R-CNNs, Single Shot Detectors (SSD) were used to detect musical objects. They use pre-trained models which are later fine-tuned in a handwritten sheet music dataset MUSCIMA++ [9]. This work draws a baseline on using deep learning in object detection in sheet music.

音乐对象检测阶段很大程度上受益于计算机视觉的最新技术水平，特别是总体上受益于对象检测。快速R-CNN，快速R-CNN，单发检测器(SSD)等模型用于检测音乐对象。他们使用预先训练的模型，随后在手写活页乐谱数据集MUSCIMA ++ [9]中对其进行微调。这项工作为在乐谱中的对象检测中使用深度学习奠定了基础。

One of the most complicated stages is reconstructing structural and semantic relationships between the musical symbols. This step was usually done using musical knowledge, rules and heuristics [10, 12]. Recently, this stage was also exposed to deep learning methods and end-to-end learning [11]. However, a major problem here is finding representations that can capture both structural and semantic relationships in music. This is due to the fact that music has a very complex structure with the symbols having spatial relationships and long-term dependencies. These relationships build up the music a structure and their semantic meaning is the music itself. As such, finding a representation that embeds all this information is very challenging.

最复杂的阶段之一是重构音乐符号之间的结构和语义关系。通常使用音乐知识，规则和启发式方法来完成此步骤[10，12]。最近，该阶段还暴露于深度学习方法和端到端学习[11]。但是，这里的主要问题是找到可以捕捉音乐中结构和语义关系的表现形式。这是由于音乐的结构非常复杂，其符号具有空间关系和长期依赖性。这些关系构成了音乐的一种结构，它们的语义就是音乐本身。因此，找到嵌入所有这些信息的表示形式非常具有挑战性。

Ultimately, the goal is to encode all retrieved relationships into a machine-readable file. There is a variety of such formats. While some formats encode instrument, pitch, velocity and onsets, those can only facilitate replayability. Other formats can encode more information which facilitates not only replayability but also an approximation on how the symbols looked on the sheet.

最终，目标是将所有检索到的关系编码为机器可读文件。有多种这样的格式。虽然某些格式对乐器，音高，速度和起音进行编码，但这些格式只能提高可重播性。其他格式可以对更多信息进行编码，这不仅有助于重播性，而且还可以简化符号在纸张上的外观。

To conclude the main challenges on OMR today are the lack of a bigger labelled dataset, music objects and staff lines detection, semantic meaning reconstruction, and lack of standardisation, evaluation metrics and the output representation [13].

总结当今OMR的主要挑战是缺少更大的标记数据集，音乐对象和谱线检测，语义含义重构以及缺乏标准化，评估指标和输出表示[13]。

Read more here: https://arxiv.org/abs/2006.07885

在此处阅读更多信息： https : //arxiv.org/abs/2006.07885