abbyy_abbyy neoml我们如何制作开源机器学习库以及为什么需要它

最新推荐文章于 2022-03-17 16:56:35 发布

weixin_26630173

最新推荐文章于 2022-03-17 16:56:35 发布

阅读量542

点赞数

文章标签： python 机器学习 java 人工智能

原文链接：https://towardsdatascience.com/abbyy-neoml-how-we-made-the-open-source-machine-learning-library-and-why-we-need-it-dc0a13e4c3f

版权

abbyy

The framework provides software developers with powerful deep learning and traditional machine learning algorithms for creating applications that fuel digital transformation.

该框架为软件开发人员提供了强大的深度学习和传统的机器学习算法，可用于创建推动数字化转型的应用程序。

ABBYY, a Digital Intelligence company, launched NeoML, an open-source library for building, training, and deploying machine learning models. Available now on GitHub, NeoML supports both deep learning and traditional machine learning algorithms. The cross-platform framework is optimized for applications that run in cloud environments, on desktop and mobile devices. Compared to a popular open-source library, NeoML offers 15–20% faster performance for pre-trained image processing models running on any device according to the tests shown below. The combination of higher inference speed with platform-independence makes the library ideal for mobile solutions that require both seamless customer experience and on-device data processing.

数字情报公司ABBYY推出了NeoML ， NeoML是一个用于构建，训练和部署机器学习模型的开源库。 NeoML现在可以在GitHub上使用，它支持深度学习和传统机器学习算法。跨平台框架针对在云环境，台式机和移动设备上运行的应用程序进行了优化。与流行的开源库相比，根据下面显示的测试，NeoML对于在任何设备上运行的预训练图像处理模型提供了15-20％的性能提升。较高的推理速度与平台独立性的结合使该库成为需要无缝客户体验和设备上数据处理的移动解决方案的理想选择。

NeoML is a cross-platform C ++ library that allows you to organize a complete development cycle of ML-models. Its main focus is on the simple and effective launch of ready-made models on various platforms. Even if these models are created by other frameworks.

NeoML是一个跨平台的C ++库，可让您组织ML模型的完整开发周期。它的主要重点是在各种平台上简单有效地启动现成模型。即使这些模型是由其他框架创建的。

“The launch of NeoML reflects our commitment to contribute to industry-wide AI innovation,” said Ivan Yamshchikov, AI Evangelist at ABBYY. “ABBYY has a proven track record of technological innovation with over 400 patents and patent applications. Sharing our framework allows developers to leverage its inference speed, cross-platform capabilities, and especially its potential on mobile devices, while their feedback and contribution will grow and improve the library. We are thrilled to promote advancements in AI and support machine learning being applied to increasingly high-value and impactful use cases.”

“ NeoML的发布反映了我们为全行业AI创新做出贡献的承诺，” ABBYY的AI传播者Ivan Yamshchikov说。 “ ABBYY在技术创新方面拥有良好的记录，拥有400多项专利和专利申请。共享我们的框架使开发人员可以利用其推理速度，跨平台功能，尤其是其在移动设备上的潜力，同时他们的反馈和贡献将不断增长并改善库。我们很高兴促进AI的进步，并支持将机器学习应用于越来越高价值和影响力的用例。”

You may ask: why do we need another machine learning library?

您可能会问：为什么我们需要另一个机器学习库？

Below I will answer this question, tell you how we created our library at ABBYY, what difficulties we encountered, and what happened in the end.

下面我将回答这个问题，告诉您我们如何在ABBYY创建我们的库，遇到了什么困难以及最终发生了什么。

ABBYY从机器学习开始的地方 (Where ABBYY started with Machine Learning)

Machine Learning and the development of artificial intelligence has long been a part of ABBYY’s Digital Intelligence technology. Over time, it became clear that work with ML needed to be unified. We began to think about how to do improve our machine learning factory in the cleanest, simplest, and efficient manner. Almost all of the code in the company is written in C ++ — which means we needed a C / C ++ solution. However, there was no single C ++ framework satisfying all of our needs. Now of course, there were separate libraries implementing various functionalities. For example, Liblinear, XGBoost, Scikit-learn, Libsvm, Caffe, TensorFlow, etc. We began to analyze their capabilities.

机器学习和人工智能的发展长期以来一直是ABBYY数字智能技术的一部分。随着时间的流逝，很明显，需要统一使用ML。我们开始考虑如何以最干净，最简单和有效的方式改善我们的机器学习工厂。公司中几乎所有的代码都是用C ++编写的-这意味着我们需要一个C / C ++解决方案。但是，没有哪个C ++框架可以满足我们的所有需求。当然，现在有单独的库实现各种功能。例如，Liblinear，XGBoost，Scikit-learn，Libsvm，Caffe，TensorFlow等。我们开始分析其功能。

Most libraries were suitable for research purposes, but not for production. Their code required a substantial revision: logging, error handling, memory management. In addition, a lot of extra functionality, different build systems, additional dependencies. Not everyone had a C ++ interface. Libraries evolved and changed rapidly, and were not always predictable; their performance and stability raised questions, and no one promised support. We had no choice but to start our own development, and so we decided to create our own library, collecting in it everything that we needed, and we then decided for ourselves its future path.

大多数库适合用于研究目的，但不适用于生产。他们的代码需要大量修订：日志记录，错误处理，内存管理。此外，还有许多额外的功能，不同的构建系统，其他依赖项。并非每个人都有C ++接口。图书馆的发展和变化Swift，而且并非总是可预测的。他们的性能和稳定性引起了质疑，没有人答应支持。我们别无选择，只能开始自己的开发，因此我们决定创建自己的库，在其中收集我们需要的所有内容，然后我们为自己决定了它的未来之路。

经典算法 (Classic algorithms)

The open-source libraries Liblinear, Libsvm, Scikit-learn, and XGBoost already existed and were a good help. We started to utilize these, but after analyzing their capabilities, we implemented similar ideas with the things we needed and added several optimizations. For example, we only work with samples that fit in memory, only on the CPU, and without low-level optimizations. The speed of the classical algorithms was therefore not a bottleneck in our problems, so we did not make serious efforts to optimize them, but we managed to exceed the speed of the above analogs.

开源库Liblinear，Libsvm，Scikit-learn和XGBoost已经存在，这是一个很好的帮助。我们开始利用这些功能，但是在分析了它们的功能之后，我们根据需要实现了类似的想法，并添加了一些优化措施。例如，我们仅使用适合内存的样本，仅在CPU上使用，而没有低级优化。因此，经典算法的速度并不是我们所遇到问题的瓶颈，因此我们并未认真努力对其进行优化，但我们设法超过了上述类似算法的速度。

Unification has yielded good results: training has become faster and quality is higher. The speed of development has also increased. Each programmer no longer needs to reinvent the wheel — one can now simply use the default settings and immediately get a result that they would have to spend at least a couple of days of experiments on before.

统一产生了良好的结果：培训变得更快，质量更高。发展速度也加快了。每个程序员不再需要重新发明轮子，现在，他们可以简单地使用默认设置并立即得到一个结果，即他们之前至少要花费几天的实验时间。

So in the library appeared methods for solving problems of classification, regression, and clustering.

因此，库中出现了解决分类，回归和聚类问题的方法。

The implementation of classical algorithms was, in my opinion, not a very difficult task, the situation with neural networks was more interesting.

在我看来，经典算法的实现不是一项艰巨的任务，而神经网络的情况则更为有趣。

神经网络 (Neural networks)

Classical algorithms are essentially a set of independent methods that use common primitives. But the implementation of neural networks is much more difficult. In addition to mathematical and algorithmic problems, non-obvious architectural and low-level optimization problems arise in it.

经典算法本质上是一组使用通用基元的独立方法。但是神经网络的实现要困难得多。除了数学和算法问题外，还出现了非显而易见的体系结构和底层优化问题。

Having looked at the Caffe and TensorFlow libraries existing at that time, we decided that the ideas of Caffe were closer to our vision. That is why we have data represented by blobs, not tensors. We wanted to operate with higher-level concepts, modify the network during training, be able to finish it up in the process of use, and organize the calculations on the GPU transparently for the user.

看了当时存在的Caffe和TensorFlow库之后，我们认为Caffe的想法更接近我们的愿景。这就是为什么我们用斑点而不是张量表示数据的原因。我们希望使用更高层次的概念，在培训期间修改网络，能够在使用过程中完善网络，并为用户透明地在GPU上组织计算。

In NeoML, a network is a directional graph whose vertices indicate layers, and edges indicate data transfers from the outputs of some layers to the inputs of others. A layer is an element that performs some operation. An operation can be anything from changing the shape of the input data or calculating a simple mathematical function to convolution or LSTM. Layers can be added and removed from the network at any time. All data in the network — inputs, outputs, and data transmitted between layers — is presented in the form of blobs. A blob is a continuous stretch of memory. The library does not work with blob memory directly but through a special platform-independent interface. Thus, the independence of the algorithmic part from the device on which the calculations are directly performed is achieved. For example, by implementing this interface using CUDA, you can calculate on the GPU.

在NeoML中，网络是一个有向图，其顶点表示层，边表示从某些层的输出到其他层的输入的数据传输。图层是执行某些操作的元素。从更改输入数据的形状或计算简单的数学函数到卷积或LSTM，任何操作都可以。可以随时在网络中添加和删除层。网络中的所有数据(输入，输出和层之间传输的数据)均以斑点形式表示。斑点是一段连续的记忆。该库不直接与Blob内存一起使用，而是通过与平台无关的特殊接口来使用。因此，实现了算法部分与直接在其上执行计算的设备的独立性。例如，通过使用CUDA实现此接口，您可以在GPU上进行计算。

If we talk about the architecture of networks, we started with convolutional networks, we added various convolutional layers, pools, fully connected layers, activations, and loss functions. A simple gradient descent was used as an optimizer.

如果我们谈论网络的体系结构，我们从卷积网络开始，我们添加了各种卷积层，池，完全连接的层，激活和损失函数。一个简单的梯度下降被用作优化器。

A little later, the recurrent networks LSTM and GRU, advanced optimizers, even more activations and loss functions, CTC, CRF, etc. were supported.

不久后，支持循环网络LSTM和GRU，高级优化器，甚至更多的激活和丢失功能，CTC，CRF等。

Currently, the library has about 100 different types of layers, which allows us to implement almost all modern network architectures.

当前，该库具有大约100种不同类型的层，这使我们能够实现几乎所有现代网络体系结构。

We try to expand the functionality as the new architecture proves its effectiveness in our tasks.

随着新架构证明其在我们的任务中的有效性，我们尝试扩展功能。

Then the struggle for efficiency began.

然后，为效率而奋斗。

CPU计算 (CPU calculations)

A neural network is often a huge amount of computation, and without low-level optimizations, you will get nowhere. First of all, we started optimizing for x86-processors for Windows — this is our main platform, and on it we wanted to be as successful as possible.

神经网络通常是大量的计算，并且没有低级的优化，您将一无所获。首先，我们开始针对Windows的x86处理器进行优化-这是我们的主要平台，在此平台上，我们希望尽可能取得成功。

Most operations in neural networks somehow come down to BLAS (Basic Linear Algebra Subprograms), and the best BLAS for x86 is, of course, Intel MKL. We started using it. The remaining operations had to be implemented independently using SIMD. We used only SSE instructions, there were experiments with AVX / AVX2, but they didn’t give much gain on our operations, and we decided to refuse them in order to reduce the cost of support. When Intel released MKL-DNN, we were delighted: finally, you can not write all this yourself! But, unfortunately, comparisons showed that our bundles work about 20% faster, and this idea has yet to be abandoned.

神经网络中的大多数操作都归结为BLAS(基本线性代数子程序)，而x86的最佳BLAS当然是Intel MKL。我们开始使用它。其余操作必须使用SIMD独立实现。我们只使用了SSE指令，曾进行过AVX / AVX2的实验，但是它们并没有给我们带来太多收益，因此我们决定拒绝这些指令以降低支持成本。英特尔发布MKL-DNN时，我们感到非常高兴：最后，您无法自己编写所有这些内容！但是，不幸的是，比较表明，我们的捆绑包的运行速度提高了约20％，而且这一想法尚未被放弃。

At the moment, NeoML works pretty well on x86, but there is still much room for optimization, which we plan to do in future releases.

目前，NeoML在x86上可以很好地运行，但是仍有很大的优化空间，我们计划在将来的版本中进行优化。

GPU计算 (GPU Computing)

For us, GPU computing is mainly about learning. Training most often takes place within the company or in our cloud. Here we can choose the equipment on which we will do this, and this simplifies life: for example, it is not necessary to support SSE in case the client does not have AVX. Therefore, they decided to implement the computing engine for the GPU using CUDA and conduct calculations on the Nvidia graphics cards supporting it. We made this decision, among other things, due to the availability of specialized libraries: cuDNN, cuBLAS, cuSparse, etc. Although in the future we abandoned cuDNN in favor of our own implementations due to constant errors and inefficient operation. The rest of the libraries show themselves very well, we did not succeed in writing our own kernels.

对于我们来说，GPU计算主要是关于学习。培训最经常在公司内部或我们的云中进行。在这里，我们可以选择将在其上进行操作的设备，这可以简化使用寿命：例如，如果客户端没有AVX，则无需支持SSE。因此，他们决定使用CUDA为GPU实现计算引擎，并在支持该引擎的Nvidia图形卡上进行计算。我们之所以做出此决定，主要是由于拥有专用库：cuDNN，cuBLAS，cuSparse等。尽管在将来，由于不断的错误和低效的操作，我们放弃了cuDNN来支持我们自己的实现。其余的库显示得很好，我们没有成功编写自己的内核。

The result of the introduction of the GPU was immediately noticeable. The training of many networks has accelerated by an order of magnitude. Thanks to this, development went faster, and the quality of the models improved.

引入GPU的结果立即引人注目。许多网络的训练已加快了一个数量级。因此，开发速度加快了，模型的质量也提高了。

Having obtained decent results on the main platform and having established effective training, we thought about distributing the library to other platforms.

在主要平台上取得不错的成绩并建立有效的培训之后，我们考虑将图书馆分发到其他平台。

跨平台 (Cross platforms)

The main development in ABBYY is on Windows, the server for training and testing is also on Windows, and therefore the first versions of the library worked only for this OS. However, ABBYY’s products also work on other platforms, and soon we began to port our library to Linux and macOS. The transfer was easy enough, since the only Intel MKL dependency we needed had versions for these OSs, and training with CUDA support was not necessary. The only difficulties were differences in Microsoft Visual Studio compilers with GCC and Clang, but they did not take much time.

ABBYY的主要开发是在Windows上，用于培训和测试的服务器也在Windows上，因此该库的第一个版本仅适用于此OS。但是，ABBYY的产品也可以在其他平台上使用，很快我们就开始将库移植到Linux和macOS。转移非常容易，因为我们所需的唯一Intel MKL依赖项具有适用于这些操作系统的版本，因此不需要进行CUDA支持的培训。唯一的困难是Microsoft Visual Studio编译器与GCC和Clang的区别，但是它们并不需要太多时间。

Now we are actively using the Linux-version of the library for comparative measurements with competitors because Windows support often leaves much to be desired. In addition, there are tasks for learning networks in the cloud. Therefore, in the next releases, we will have a version of NeoML that supports CUDA on Linux.

现在，我们正在积极地使用该库的Linux版本与竞争对手进行比较，因为Windows支持通常尚待改进。此外，还有用于在云中学习网络的任务。因此，在接下来的版本中，我们将提供一个支持Linux上CUDA的NeoML版本。

移动平台 (Mobile platforms)

ABBYY is developing and selling SDKs for image processing and text recognition, including those that work on phones. Therefore, with the advent of neural networks in these SDKs, the question arose about their effective launch on mobile platforms. At this moment, we once again thought about whether to use a third-party solution. Having assessed the integration of TensorFlow Lite for Android and Core ML for iOS, we came to the conclusion that working with several frameworks at the same time will be unreasonably expensive, and it’s better to refine your own, even if it is inferior inefficiency.

ABBYY正在开发和销售用于图像处理和文本识别的SDK，包括那些可在手机上使用的SDK。因此，随着这些SDK中神经网络的出现，人们开始质疑它们在移动平台上的有效启动。此刻，我们再次考虑是否要使用第三方解决方案。在评估了适用于Android的TensorFlow Lite和适用于iOS的Core ML的集成之后，我们得出的结论是，同时使用多个框架将花费不合理的成本，并且即使效率低下，也最好对自己的框架进行优化。

And we started work on creating a “computing engine” for ARM. Replacing SSE with NEON, and MKL with Eigen, in a couple of weeks we made the first version of the library running on the ARM CPU. It turned out that the resulting solution completely suits us in terms of efficiency; it even surpassed analogs in speed. Of course, since then both TF Lite and Core ML have made great strides, but we have also made a number of significant optimizations, most of which overlapped with the x86 version and were not very expensive. However, there were some ARM-specific optimizations. The most serious of them is our own matrix multiplication, thanks to which we exceeded the speed of the Eigen library by about 20%, and as a result, refused to use it.

我们开始为ARM创建“计算引擎”。在几周内，我们用NEON替换了SSE，用Eigen替换了MKL，我们制作了在ARM CPU上运行的库的第一个版本。事实证明，最终的解决方案在效率方面完全适合我们；它的速度甚至超过了同类产品。当然，从那时起，TF Lite和Core ML都取得了长足的进步，但是我们也进行了许多重大的优化，其中大多数与x86版本重叠，并且价格并不昂贵。但是，有一些特定于ARM的优化。其中最严重的是我们自己的矩阵乘法，这使得我们将本征库的速度提高了约20％，因此拒绝使用它。

At the moment, NeoML runs on the CPU in approximately the same way as compared to its peers, which completely suits us.

目前，NeoML在CPU上的运行方式与其同类产品大致相同，这完全适合我们。

Also, to simplify the launch of ready-made models on iOS and Android, we added inference wrappers for the ObjectiveC and Java languages.

另外，为了简化在iOS和Android上现成模型的发布，我们为ObjectiveC和Java语言添加了推理包装。

移动GPU (Mobile GPUs)

Almost all modern Android and iOS phones are equipped with a separate GPU. Interestingly, we thought and began to study how we can begin to use it. The first experiments were done with RenderScript and yielded absolutely no results, everything was terribly slow … However, experiments with OpenCL, Vulkan, and Metal showed good results. On large networks, the GPU could provide benefits 5–7 times. On small ones, the CPU was still faster due to overhead; and not every GPU was profitable even on large networks; only expensive chips on top models worked well. In addition, it turned out that for different families of GPUs you need to write different code: for example, shaders optimized for Adreno will not necessarily work just as well on Mali. In general, now for us, the topic of using GPUs in mobile devices is controversial, but potentially very promising. Currently, we have implemented computing engines running on Vulkan and Metal, and we use them in a limited number of tasks while continuing to work on their development. I must say that computing on a mobile GPU is quite a capacious topic, in many respects different from computing on desktop counterparts, and the story about it is worthy of a separate article.

几乎所有现代Android和iOS手机都配备了单独的GPU。有趣的是，我们思考并开始研究如何开始使用它。最初的实验是使用RenderScript进行的，但完全没有结果，一切都非常缓慢……但是，使用OpenCL，Vulkan和Metal进行的实验却显示出了不错的结果。在大型网络上，GPU可以提供5至7倍的收益。在小型服务器上，由于开销，CPU仍然更快。而且即使在大型网络上也不是每个GPU都能盈利。只有顶级型号上昂贵的芯片才能运作良好。此外，事实证明，对于不同系列的GPU，您需要编写不同的代码：例如，针对Adreno优化的着色器在Mali上不一定能很好地工作。总的来说，现在对我们而言，在移动设备中使用GPU的话题是有争议的，但可能非常有前途。当前，我们已经实现了在Vulkan和Metal上运行的计算引擎，并且在继续开发它们的同时，将它们用于有限的任务中。我必须说，在移动GPU上的计算是一个很大的话题，在许多方面与台式机上的计算有所不同，有关它的故事值得在另一篇文章中介绍。

ONNX (ONNX)

So, we have got a completely self-sufficient framework. With it, we learn our own networks, easily integrate them into desktop applications, and transfer them to mobile platforms without additional costs. One problem remains: while reading new articles, exploring new architectures and examples of their use, our data scientists constantly encounter other frameworks. And in order to develop a model for solving any problem effectively and with speed, they need to be able to convert models from third-party frameworks into ours.

因此，我们有了一个完全自给自足的框架。有了它，我们就可以学习自己的网络，轻松地将它们集成到桌面应用程序中，然后将它们转移到移动平台上而无需支付额外费用。问题仍然存在：在阅读新文章，探索新架构及其使用示例时，我们的数据科学家会不断遇到其他框架。为了开发有效有效地解决任何问题的模型，他们需要能够将模型从第三方框架转换为我们的框架。

The new ONNX format was a big improvement, and although the format is still young and its support in many frameworks still leaves much to be desired, it is actively being developed, and we consider it the best solution to this problem today. We supported the ability to download neural network models from ONNX to our library. Of course, we did not support the whole format: it has a rather large specification and several versions, but this is not the main thing. The semantics of its use is different in different frameworks. For example, the same model can look completely different if you upload it to ONNX with different frameworks. We decided to focus on PyTorch in this matter. The ONNX models of others, of course, will also work, but perhaps not as efficiently.

新的ONNX格式是一个很大的改进，尽管该格式还很年轻，并且它在许多框架中的支持仍然有很多不足之处，但正在积极开发中，我们认为它是当今解决此问题的最佳解决方案。我们支持将神经网络模型从ONNX下载到我们的库的能力。当然，我们不支持整个格式：它具有相当大的规范和多个版本，但这并不是主要内容。在不同的框架中，其用法的语义不同。例如，如果您使用不同的框架将其上传到ONNX，则该模型看起来可能会完全不同。我们决定在此问题上专注于PyTorch。当然，其他ONNX模型也可以使用，但效率可能不高。

As a result, the model development process may look, for example, like this: the first experiments with the model are done on PyTorch, then the model is stored in ONNX, loaded from ONNX into NeoML, the NeoML model is retrained, its speed and quality are measured, and then the model goes either revision or in production.

结果，模型的开发过程可能看起来像这样：首先在PyTorch上进行模型的实验，然后将模型存储在ONNX中，从ONNX加载到NeoML中，对NeoML模型进行重新训练，提高其速度对质量和质量进行测量，然后对模型进行修订或投入生产。

Now we have everything you need to support the full development cycle of ML models.

现在，我们拥有支持ML模型的整个开发周期所需的一切。

开源的 (Open-source)

What did we decide to do next? We decided to make an open-source library so that others could benefit from

我们接下来决定做什么？我们决定建立一个开源库，以便其他人可以从中受益

This is a new step in the development of the library. We share our best practices with the community, and in response, we would like to receive comments and suggestions to make our library even faster and more convenient.

这是库开发的新步骤。我们与社区共享最佳实践，因此，我们希望收到评论和建议，以使我们的图书馆更快，更方便。

The library already has several unique features and can become an effective means of launching models in various applications on various platforms. We hope that developers with similar scripts will appreciate NeoML and, possibly, will join the work on the library in the near future.

该库已经具有一些独特的功能，可以成为在各种平台上的各种应用程序中启动模型的有效手段。我们希望使用类似脚本的开发人员会喜欢NeoML，并且可能会在不久的将来加入该库的工作。

比较测量 (Comparative measurements)

We try to regularly compare the effectiveness of our library on our tasks with peers (most often it is TensorFlow) in order to understand our current level. Here, for example, I will compare the speed of direct access of a public network from the TorchVision package of the MobileNetV2 architecture, trained to classify the ImageNet dataset. The dimensions of the network input are 224x224x3. The measurements were made on the desktop CPU and several mobile phones that are now at my fingertips (as you know, the post was created during self-isolation).

我们试图定期将我们的库与同伴(大多数情况下是TensorFlow)进行比较，以了解我们的当前水平。例如，在这里，我将比较从经过培训以对ImageNet数据集进行分类的MobileNetV2体系结构的TorchVision包直接访问公共网络的速度。网络输入的尺寸为224x224x3。这些测量是在台式机CPU和几部移动电话上进行的，而这些移动电话现在就在我的指尖(您知道，该帖子是在自我隔离期间创建的)。

On a PC with a Core-i5–4400 processor running Ubuntu 20.04 we have the following results for 10,000 network launches:

在装有运行Ubuntu 20.04的Core-i5-4400处理器的PC上，对于10,000次网络启动，我们得到以下结果：

The memory consumption is as follows:

内存消耗如下：

On Android phones with 10,000 starts, the results are as follows:

在启动次数为10,000的Android手机上，结果如下：

On iOS phones:

在iOS手机上：

It’s worth noting that the runtime measurements on phones — a thankless task. If you wish, you can measure almost any result, and measurements on several streams are even less indicative (therefore, not shown here). But the overall picture with a fairly large number of launches can still be seen.

值得注意的是，手机上的运行时间测量是一项不值一提的任务。如果您愿意，您几乎可以测量任何结果，并且对多个流的测量甚至没有指示性(因此，此处未显示)。但是，仍然可以看到带有大量发射的整体图景。

For detailed analysis and optimization, we usually use various processor counters, such as cpu_cycles, cpu_instructions, cache_access, cache_miss, branch_count, branch_miss, bus_cycles, etc. You can also see from them that both libraries work approximately the same.

为了进行详细的分析和优化，我们通常使用各种处理器计数器，例如cpu_cycles，cpu_instructions，cache_access，cache_miss，branch_count，branch_miss，bus_cycles等。从它们中还可以看到，两个库的工作原理大致相同。

使用功能强大的NeoML框架来构建，训练和部署机器学习模型 (Use the powerful NeoML framework to build, train, and deploy machine learning models)

Neural networks with support for over 100 layer types
支持100多种图层类型的神经网络
CPU and GPU support, fast inference
CPU和GPU支持，快速推断
Languages: C++, Java, Objective С
语言：C ++，Java，ObjectiveС
Traditional machine learning: 20+ algorithms (classification, regression, clustering, etc)
传统机器学习：20多种算法(分类，回归，聚类等)
ONNX support
ONNX支持
Cross-platform: the same code can be run at Windows, Linux, macOS, iOS, and Android
跨平台：相同的代码可以在Windows，Linux，macOS，iOS和Android上运行

部署到任何地方 (Deploy anywhere)

NeoML is used by ABBYY engineers for computer vision and natural language tasks, including image preprocessing, classification, document layout analysis, OCR, and data extraction from structured and unstructured documents. You can deploy models in the cloud, on-prem, in the browser, or on-device.

ABBYY工程师将NeoML用于计算机视觉和自然语言任务，包括图像预处理，分类，文档布局分析，OCR以及从结构化和非结构化文档中提取数据。您可以在云，本地，浏览器或设备中部署模型。

下一步是什么 (What’s next)

NeoML supports the Open Neural Network Exchange (ONNX), a global open ecosystem for interoperable ML models, which improves compatibility of tools making it easier for developers to use the right combinations to achieve their goals. The ONNX standard is supported jointly by Microsoft, Facebook, and other partners as an open-source project.

NeoML支持开放神经网络交换(ONNX) ，这是一个可互操作的ML模型的全球开放生态系统，它提高了工具的兼容性，使开发人员更容易使用正确的组合来实现其目标。 Microsoft，Facebook和其他合作伙伴作为一个开源项目共同支持ONNX标准。

ABBYY invites developers, data scientists, and business analysts to use and contribute to NeoML on GitHub, where its code is licensed under the Apache License 2.0. The company offers personalized developer support, ongoing review of reports, regular updates, and performance enhancements. Going forward, ABBYY plans to add new algorithms and architectures, as well as further increase the speeds achievable using the framework algorithms.

ABBYY邀请开发人员，数据科学家和业务分析人员在GitHub上使用NeoML并作出贡献，该代码的代码已根据Apache License 2.0许可。该公司提供个性化的开发人员支持，报告的持续审查，定期更新和性能增强。展望未来，ABBYY计划添加新的算法和体系结构，并进一步提高使用框架算法可达到的速度。

Summing up, we can say that we got a decent solution that allows us to organize a full cycle of development and implementation of ML-models. At the moment, NeoML is used in almost all products of the company and every day proves its effectiveness.

综上所述，我们可以说我们得到了一个不错的解决方案，它使我们能够组织ML模型的开发和实现的完整周期。目前，NeoML已在公司的几乎所有产品中使用，并且每天都在证明其有效性。

Machine Learning is one of ABBYY’s top priorities. We plan to develop our library by regularly releasing new versions. In the upcoming releases, we want to add a Python wrapper, support new network architectures, expand support for the ONNX format, and, of course, work to increase productivity.

机器学习是ABBYY的头等大事。我们计划通过定期发布新版本来开发我们的库。在即将发布的版本中，我们希望添加Python包装器，支持新的网络体系结构，扩展对ONNX格式的支持，并且当然要努力提高生产率。

If your work encounters scenarios similar to ours, then visit our Github and try NeoML. We welcome any feedback. Also, write in the comments what your pipeline looks like and what problems you encounter in it!

如果您的工作遇到与我们类似的情况，请访问我们的Github并尝试NeoML。我们欢迎任何反馈。另外，在注释中写出您的管道的外观以及遇到的问题！