vba数组dim_NDArray — —一个基于Java的N-Dim数组工具包

最新推荐文章于 2022-05-08 14:35:50 发布

weixin_26713521

最新推荐文章于 2022-05-08 14:35:50 发布

阅读量591

点赞数

文章标签： java leetcode

原文链接：https://towardsdatascience.com/ndarray-a-java-based-n-dim-array-toolkit-60b4035b10b8

版权

本文介绍了vba数组dim，这是一个基于Java的N-Dimensional数组工具包，适用于多维数据操作。文章翻译自Medium上的文章，探讨了该工具包在数据科学领域的应用。

摘要由CSDN通过智能技术生成

vba数组dim

介绍 (Introduction)

Within many development languages, there is a popular paradigm of using N-Dimensional arrays. They allow you to write numerical code that would otherwise require many levels of nested loops in only a few simple operations. Because of the ability to parallelize, it often runs even faster than the standard looping as well. This is now standard practice in many fields such as data science, graphics, and deep learning, but can be used in applications far beyond this. In Python, the standard library for NDArrays is called NumPy. However, there is no equivalent standard library in Java. One offering for Java developers interested in working with NDArrays is AWS’s Deep Java Library (DJL). Although it also contains Deep Learning, the core is a powerful NDArray system that can be used on its own to bring this paradigm into Java. With support for several Deep Learning Frameworks (PyTorch, TensorFlow, MXNet), DJL can allow the NDArray operations to run at a large-scale and across multiple platforms. No matter whether you are running on CPU or GPU, PC or Android, it simply works. In this tutorial, we will walk through how you can leverage the NDArray from DJL to write your NumPy code in Java and apply NDArray into a real-world application.

在许多开发语言中，存在使用N维数组的流行范例。它们使您能够编写数字代码，而这些数字代码仅需执行几个简单的操作就需要多层嵌套循环。由于具有并行化的能力，它通常也比标准循环运行得更快。现在，这是许多领域(例如数据科学，图形和深度学习)的标准做法，但可以用于远远超出此范围的应用程序中。在Python中，NDArrays的标准库称为NumPy。但是，Java中没有等效的标准库。 AWS的Deep Java Library(DJL)是对有兴趣使用NDArrays的Java开发人员提供的一种服务。尽管它还包含深度学习，但其核心是功能强大的NDArray系统，可以单独使用以将该范例引入Java。借助对几种深度学习框架(PyTorch，TensorFlow，MXNet)的支持，DJL可以使NDArray操作在多个平台上大规模运行。无论您是在CPU还是GPU，PC或Android上运行，它都可以正常工作。在本教程中，我们将逐步介绍如何利用DJL中的NDArray来用Java编写NumPy代码并将NDArray应用到实际应用程序中。

建立 (Setup)

You can use the following configuration in a gradle project. Or, you can skip the setup and try it directly in our interactive online console.

您可以在gradle项目中使用以下配置。或者，您可以跳过设置并直接在我们的网站中尝试交互式在线控制台。

plugins {
    id 'java'
}
repositories {                           
    jcenter()
}
dependencies {
    implementation "ai.djl:api:0.6.0"
    // PyTorch
    runtimeOnly "ai.djl.pytorch:pytorch-engine:0.6.0"
    runtimeOnly "ai.djl.pytorch:pytorch-native-auto:1.5.0"
}

That’s it, now we can start our implementation.

就是这样，现在我们可以开始实施了。

基本操作 (Basic operation)

Let’s first create a try block to create a scope for our code (If you are using the interactive console, you can skip this step):

让我们首先创建一个try块来为我们的代码创建作用域(如果使用交互式控制台，则可以跳过此步骤)：

try(NDManager manager = NDManager.newBaseManager()) {
}

NDManager helps manage the memory usage of the NDArrays. It creates them and helps clear them as well. Once you finish using an NDManager, it will clear all of the NDArrays that were created within it’s scope as well. NDManager helps the overall system utilize memory efficiently by tracking the NDArray usage. For comparison, let’s see how the code looks in Python’s NumPy as well. We will start by importing the NumPy library with the standard alias.

NDManager有助于管理的内存使用情况NDArrays 。它创建它们并帮助清除它们。使用完NDManager后，它还将清除在其作用域内创建的所有NDArray。 NDManager通过跟踪NDArray的使用情况来帮助整个系统有效地利用内存。为了进行比较，让我们看看代码在Python的NumPy中的外观。我们将从导入具有标准别名的NumPy库开始。

import NumPy as np

In the following sections, we are going to compare the implementation and result between NumPy and DJL’s NDArray.

在以下各节中，我们将比较NumPy和DJL的NDArray之间的实现和结果。

NDArray创建 (NDArray Creation)

ones is an operation to generate N-dim array filled with 1. NumPy

ones是生成填充1的N维数组的操作。NumPy

nd = np.ones((2, 3))
```
[[1. 1. 1.]
 [1. 1. 1.]]
```

NDArray

NDArray nd = manager.ones(new Shape(2, 3));
/*
ND: (2, 3) cpu() float32
[[1., 1., 1.],
 [1., 1., 1.],
]
*/

You can also try out random generation. For example, we will generate random uniform data from 0 to 1. NumPy

您也可以尝试随机生成。例如，我们将生成从0到1的随机统一数据。

nd = np.random.uniform(0, 1, (1, 1, 4))
# [[[0.7034806  0.85115891 0.63903668 0.39386125]]]

NDArray

NDArray nd = manager.randomUniform(0, 1, new Shape(1, 1, 4));
/*
ND: (1, 1, 4) cpu() float32
[[[0.932 , 0.7686, 0.2031, 0.7468],
 ],
]
*/

This is just a quick demo of some commonly used functions. The NDManager now offers more than 20 NDArray creation methods that cover most of the methods available in NumPy.

这只是一些常用功能的快速演示。 NDManager现在提供了20多种NDArray创建方法，涵盖了NumPy中可用的大多数方法。

数学运算 (Math operation)

We can also try some math operations using NDArrays. Assume we are trying to do a transpose and add a number to each element of the NDArray. We can achieve this by doing the following: NumPy

我们也可以尝试使用NDArrays进行一些数学运算。假设我们正在尝试进行转置并将数字添加到NDArray的每个元素中。我们可以通过执行以下操作来实现此目的：NumPy

nd = np.arange(1, 10).reshape(3, 3)
nd = nd.transpose()
nd = nd + 10
```
[[11 14 17]
 [12 15 18]
 [13 16 19]]
```

NDArray

NDArray nd = manager.arange(1, 10).reshape(3, 3);
nd = nd.transpose();
nd = nd.add(10);
/*
ND: (3, 3) cpu() int32
[[11, 14, 17],
 [12, 15, 18],
 [13, 16, 19],
]
*/

DJL now supports more than 60 different NumPy math methods covering most of the basic and advanced math functions.

DJL现在支持涵盖大多数基本和高级数学功能的60多种NumPy数学方法。

获取并设置 (Get and Set)

One of the most powerful features of NDArray is its flexible data indexing inspired by a similar feature in NumPy. Let’s assume we would like to filter all values in a matrix that are smaller than 10. NumPy

NDArray的最强大功能之一是其灵活的数据索引，其灵感来自于NumPy中的类似功能。假设我们要过滤矩阵中小于10的所有值。

nd = np.arange(5, 14)
nd = nd[nd >= 10]
# [10 11 12 13]

NDArray:

NDArray：

NDArray nd = manager.arange(5, 14);
nd = nd.get(nd.gte(10));
/*
ND: (4) cpu() int32
[10, 11, 12, 13]
*/

Now let’s try to do something more complicated. Assume we have 3x3 matrix and we would like to multiply the second column by 2. NumPy

现在，让我们尝试做一些更复杂的事情。假设我们有3x3矩阵，我们想将第二列乘以2。

nd = np.arange(1, 10).reshape(3, 3)
nd[:, 1] *= 2
```
[[ 1  4  3]
 [ 4 10  6]
 [ 7 16  9]]
```

NDArray

NDArray nd = manager.arange(1, 10).reshape(3, 3);
nd.set(new NDIndex(":, 1"), array -> array.mul(2));
/*
ND: (3, 3) cpu() int32
[[ 1,  4,  3],
 [ 4, 10,  6],
 [ 7, 16,  9],
]
*/

In the above example, we introduce a concept in Java called NDIndex. It mirrors most of the NDArray get/set functionalities that NumPy supports. By simply passing a String representation, developers can do all kinds of array manipulations seamlessly in Java.

在上面的示例中，我们介绍了Java中的一个名为NDIndex的概念。它镜像了NumPy支持的大多数NDArray获取/设置功能。通过简单地传递String表示，开发人员可以在Java中无缝地进行各种数组操作。

实际应用 (Real world application)

These operations are really helpful when we need to manipulate a huge dataset. Let’s walk through a specific use case: Token Classification. In this case, developers were trying to do Sentiment Analysis on the text information they gathered from the users through applying a Deep Learning algorithm to it. NDArray operations were applied in the preprocessing and post-processing to encode and decode information.

当我们需要处理庞大的数据集时，这些操作非常有用。让我们来看一个特定的用例：令牌分类。在这种情况下，开发人员试图通过对其应用深度学习算法，对从用户那里收集的文本信息进行情感分析。在预处理和后处理中应用了NDArray操作以对信息进行编码和解码。

代币化 (Tokenization)

Before we feed the data into an NDArray, we tokenize the input text into numbers. The tokenizer in the code block below is a Map<String, Integer> that serves as a vocabulary to convert text into a corresponding vector.

在将数据输入NDArray之前，我们将输入文本标记为数字。下面的代码块中的tokenizer是Map<String, Integer> ，用作将文本转换为相应向量的词汇表。

String text = "The rabbit cross the street and kick the fox";
String[] tokens = text.toLowerCase().split(" ");
int[] vector = new int[tokens.length];
/*
String[9] { "the", "rabbit", "cross", "the", "street",
"and", "kick", "the", "fox" }
*/
for (int i = 0; i < tokens.length; i++) {
    vector[i] = tokenizer.get(tokens[i]);
}
vector
/*
int[9] { 1, 6, 5, 1, 3, 2, 8, 1, 12 }
*/

处理中 (Processing)

After that, we create an NDArray. To proceed further, we need to create a batch of tokens and apply some transformations to them.

之后，我们创建一个NDArray 。为了进一步进行，我们需要创建一批令牌并对其进行一些转换。

NDArray array = manager.create(vector);
array = array.reshape(new Shape(vector.length, 1)); // form a batch
array = array.div(10.0);
/*
ND: (9, 1) cpu() float64
[[0.1],
 [0.6],
 [0.5],
 [0.1],
 [0.3],
 [0.2],
 [0.8],
 [0.1],
 [1.2],
]
*/

Then, we can send this data to a deep learning model. To achieve the same thing in pure Java would require far more work. If we are trying to implement the reshape function above, we need to create an N-dimensional array in Java that looks like: List<List<List<...List<Float>...>>> to cover all the different dimensions. We would then have to dynamically insert a new List<Float> containing the elements to build resulting data structure.

然后，我们可以将这些数据发送到深度学习模型。要在纯Java中实现同一目标，将需要做更多的工作。如果我们尝试实现上述的reshape函数，则需要在Java中创建一个N维数组，其外观如下： List<List<List<...List<Float>...>>>以涵盖所有不同的尺寸。然后，我们将不得不动态插入一个新的List<Float>其中包含用于构建结果数据结构的元素。

为什么要使用NDArray？ (Why should I use NDArray?)

With the previous walkthrough, you should have a basic experience using NDArray in Java. To summarize, here is the three key advantages using it:

在上一个演练中，您应该具有在Java中使用NDArray的基本经验。总结一下，这是使用它的三个主要优点：

Easy: Access to 60+ operators in Java with a simple input and the same output.
简易：使用简单的输入和相同的输出即可访问Java中的60多个运算符。
Fast: Full support for the most used deep learning frameworks including TensorFlow, PyTorch, and MXNet. Now, you can get your computation accelerated by MKLDNN on CPU, CUDA on GPU and lots more.
快速：全面支持最常用的深度学习框架，包括TensorFlow，PyTorch和MXNet。现在，您可以通过CPU上的MKLDNN，GPU上的CUDA以及更多功能来加速计算。
Deep Learning ready: It supports high dimensional arrays and sparse NDArray inputs*. You can apply this toolkit on all platforms including Apache Spark and Apache Beam for large-scale data processing. It’s a perfect tool for data preprocessing and post-processing.
深度学习就绪：它支持高维数组和稀疏NDArray输入*。您可以将此工具包应用于所有平台，包括Apache Spark和Apache Beam，以进行大规模数据处理。这是进行数据预处理和后处理的理想工具。

*Sparse currently only covers COO in PyTorch and CSR/Row_Sparse in MXNet.

* Sparse当前仅涵盖PyTorch中的COO和MXNet中的CSR / Row_Sparse。

关于NDArray和DJL (About NDArray and DJL)

After trying NDArray creation and operation, you might wonder how DJL implement NDArray to achieve these behaviors. In this section, we will briefly walkthrough the architecture of NDArray.

在尝试了NDArray的创建和操作之后，您可能想知道DJL如何实现NDArray来实现这些行为。在本节中，我们将简要介绍NDArray的体系结构。

NDArray架构 (NDArray Architecture)

As shown above, there are three key layers to the NDArray. The Interface layer contains NDArray, it is a Java Interface that defines what the NDArray should look like. We carefully evaluated it and made all functions’ signature general enough and easy to use. In the EngineProvider layer, there are different engine’s implementation to the NDArray. This layer served as an interpretation layer that maps Engine specific behavior to NumPy behavior. As a result, all engines implementation are behaved the same way as NumPy have. In the C++ Layer, we built JNI and JNA that expose C++ methods for Java to call. It would ensure we have enough methods to build the entire NDArray stack. Also it ensures the best performance by calling directly from Java to C++ since all Engines are implemented in C/C++.

如上所示，NDArray有三个关键层。接口层包含NDArray，它是一个Java接口，用于定义NDArray的外观。我们对其进行了仔细评估，使所有功能的签名足够通用且易于使用。在EngineProvider层中，NDArray有不同的引擎实现。该层用作将引擎特定行为映射到NumPy行为的解释层。结果，所有引擎实现的行为都与NumPy相同。在C ++层中，我们构建了JNI和JNA，它们公开了Java调用的C ++方法。这将确保我们有足够的方法来构建整个NDArray堆栈。由于所有引擎都是在C / C ++中实现的，因此它还可以通过直接从Java调用C ++来确保最佳性能。

关于DJL (About DJL)

Deep Java Library (DJL) is a Deep Learning Framework written in Java, supporting both training and inference. DJL is built on top of modern Deep Learning frameworks (TenserFlow, PyTorch, MXNet, etc). You can easily use DJL to train your model or deploy your favorite models from a variety of engines without any additional conversion. It contains a powerful ModelZoo design that allows you to manage trained models and load them in a single line. The built-in ModelZoo currently supports more than 70 pre-trained and ready to use models from GluonCV, HuggingFace, TorchHub and Keras. The addition of the NDArray makes DJL the best toolkit in Java to run your Deep Learning application. It can automatically identify the platform you are running on and figure out whether to leverage GPU to run your application. From the most recent release, DJL 0.6.0 officially supports MXNet 1.7.0, PyTorch 1.5.0 and TensorFlow 2.2.0. We also have experimental support for PyTorch on Android. Follow our GitHub, demo repository, Slack channel and twitter for more documentation and examples of DJL!

深度Java库(DJL)是用Java编写的深度学习框架，同时支持训练和推理。 DJL建立在现代深度学习框架(TenserFlow，PyTorch，MXNet等)之上。您可以轻松地使用DJL训练模型或从各种引擎部署您喜欢的模型，而无需进行任何其他转换。它包含一个功能强大的ModelZoo设计，使您可以管理经过训练的模型并将其加载到一行中。内置的ModelZoo目前支持来自GluonCV，HuggingFace，TorchHub和Keras的70多种预训练并可以使用的模型。 NDArray的添加使DJL成为Java中运行深度学习应用程序的最佳工具包。它可以自动识别您正在运行的平台，并确定是否利用GPU来运行您的应用程序。从最新版本开始，DJL 0.6.0正式支持MXNet 1.7.0，PyTorch 1.5.0和TensorFlow 2.2.0。我们还在Android上提供了对PyTorch的实验性支持。请关注我们的GitHub ，演示存储库， Slack频道和Twitter ，以获取DJL的更多文档和示例！