使用InterSystems IRIS数据平台进行AI机器人化

固定术语 (Fixing the terminology)

A robot is not expected to be either huge or humanoid, or even material (in disagreement with Wikipedia, although the latter softens the initial definition in one paragraph and admits virtual form of a robot). A robot is an automate, from an algorithmic viewpoint, an automate for autonomous (algorithmic) execution of concrete tasks. A light detector that triggers street lights at night is a robot. An email software separating e-mails into “external” and “internal” is also a robot.

预计机器人不会是巨大的或人形的,甚至不会是材料的(与Wikipedia不同 ,尽管Wikipedia在一个段落中软化了最初的定义并接受了机器人的虚拟形式)。 从算法的角度来看,机器人是自动化的,用于自动化(算法)执行具体任务的自动化。 一个在晚上触发路灯的光检测器是一个机器人。 将电子邮件分为“外部”和“内部”的电子邮件软件也是一个机器人。

Artificial intelligence (in an applied and narrow sense, Wikipedia interpreting it differently again) is algorithms for extracting dependencies from data. It will not execute any tasks on its own, for that one would need to implement it as concrete analytic processes (input data, plus models, plus output data, plus process control). The analytic process acting as an “artificial intelligence carrier” can be launched by a human or by a robot. It can be stopped by either of the two as well. And managed by any of them too.

人工智能(从狭义的意义上讲, Wikipedia再次对人工智能进行了解释)是从数据中提取依赖项的算法。 它不会自己执行任何任务,因为那将需要将其实现为具体的分析过程(输入数据,模型,输出数据以及过程控制)。 可以由人类或机器人启动充当“人工智能载体”的分析过程。 也可以通过两者之一停止它。 并由其中任何一个进行管理。

与环境的互动 (Interaction with the environment)

Artificial intelligence needs data that is suitable for analysis. When an analyst starts developing an analytic process, the data for the model is prepared by the analyst himself. Usually, he builds a dataset that has enough volume and features to be used for model training and testing. Once the accuracy (and in less frequent cases, the “local stability” in time) of the obtained result becomes satisfactory, a typical analyst considers his work done. Is he right? In the reality, the work is only half-done. It remains to secure an “uninterrupted and efficient running” of the analytic process – and that is where our analyst may experience difficulties.

人工智能需要适合分析的数据。 当分析人员开始开发分析过程时,该模型的数据由分析人员自己准备。 通常,他构建的数据集具有足够的容量和特征,可用于模型训练和测试。 一旦获得的结果的准确性(在较不频繁的情况下,及时的“局部稳定性”)变得令人满意,那么典型的分析师就会认为他的工作已经完成。 是吗 实际上,这项工作仅完成了一半。 确保分析过程“不间断且高效地运行”仍然是我们的工作–这就是我们的分析师可能会遇到困难的地方。

The tools used for developing artificial intelligence and machine learning mechanisms, except for some most simple cases, are not suitable for efficient interaction with external environment. For example, we can (for a short period of time) use Python to read and transform sensor data from a production process. But Python will not be the right tool for overall monitoring of the situation and switching control among several production processes, scaling corresponding computation resources up and down, analyzing and treating all types of “exceptions” (e.g., non-availability of a data source, infrastructure failure, user interaction issues, etc.). To do that we will need a data management and integration platform. And the more loaded, the more variative will be our analytic process, the higher will be set the bar of our expectations from the platform’s integration and “DBMS” components. An analyst that is bred on scripting languages and traditional development environments to build models (including utilities like “notebooks”) will be facing the near impossibility to secure his analytical process an efficient productive implementation.

除一些最简单的情况外,用于开发人工智能和机器学习机制的工具不适合与外部环境进行有效交互。 例如,我们可以(在短时间内)使用Python读取和转换生产过程中的传感器数据。 但是Python并不是用于全面监控情况并在多个生产过程之间切换控制,上下缩放相应的计算资源,分析和处理所有类型的“异常”(例如,数据源不可用,基础架构故障,用户交互问题等)。 为此,我们将需要一个数据管理和集成平台。 而且,负载越多,我们的分析过程就越可变,对平台集成和“ DBMS”组件的期望值越高,就越高。 一位受脚本语言和传统开发环境熏陶以构建模型(包括诸如“笔记本”之类的实用程序)的分析师将几乎无法确保自己的分析过程高效,高效地实施。

适应性和适应性 (Adaptability and adaptiveness)

Environment changeability manifests itself in different ways. In some cases, will change the essence and nature of the things managed by artificial intelligence (e.g., entry by an enterprise into new business areas, requirements imposed by national and international regulators, evolution of customer preferences relevant for the enterprise, etc.). In the other cases – the information signature of the data coming from external environment will become different (e.g., new equipment with new sensors, more performant data transmission channels, availability of new data “labeling” technologies, etc.).

环境的可变性以不同的方式表现出来。 在某些情况下,将改变由人工智能管理的事物的本质和性质(例如,企业进入新的业务领域,国家和国际监管机构施加的要求,与企业相关的客户偏好的演变等)。 在其他情况下–来自外部环境的数据的信息签名将变得不同(例如,具有新传感器的新设备,性能更高的数据传输通道,新数据“标记”技术的可用性等)。

Can an analytic process “reinvent itself” as the external environment structure changes? Let us simplify the question: how easy is it to adjust the analytic process if the external environment structure changes? Based on our experience, the answer that follows is plain and sad: in most known implementations (not by us!) it will be required to at least rewrite the analytic process, and most probably rewrite the AI it contains. Well, end-to-end rewriting may not be the final verdict, but doing the programing to add something that reflects the new reality or changing the “modeling part” may indeed be needed. And that could mean a prohibitive overhead – especially if environment changes are frequent.

随着外部环境结构的变化,分析过程能否“重塑自我”? 让我们简化一下问题:如果外部环境结构发生变化,调整分析过程有多容易? 根据我们的经验,得出的答案是显而易见的和可悲的:在大多数已知的实现方式中(不是我们自己!),将需要至少重写分析过程,并且很可能重写其中包含的AI。 嗯,端到端重写可能不是最终结论,但确实可能需要进行编程以添加一些反映新现实的内容或更改“建模部分”。 这可能意味着过高的开销-尤其是在环境变化频繁的情况下。

代理商:自治的极限? (Agency: the limit of autonomy?)

The reader may have noticed already that we proceed in the direction of a more and more complex reality proposed to artificial intelligence. While taking a note of possible “instrument-side consequences”. In a hope for our being finally able to provide a response to emerging challenges.

读者可能已经注意到,我们朝着人工智能提出的越来越复杂的现实的方向前进。 同时注意可能的“仪器方面的后果”。 希望我们最终能够对新出现的挑战作出回应。

We are now approaching the necessity to equip an analytic process with the level of autonomy such that it can cope with not just changeability of the environment, but also with the uncertainty of its state. No reference to a quantum nature of the environment is intended here (we will discuss it in one of our further publications), we simply consider the probability for an analytic process to encounter the expected state at the expected moment in the expected “volume”. For example: the process “thought” that it would manage to complete a model training run before the arrival of new data to apply the model to, but “failed” to complete it (e.g., for several objective reasons, the training sample contained more records than usually). Another example: the labeling team has added a batch of new press in the process, a vectorization model has been trained using that new material, while the neural network is still using the previous vectorization and is treating as “noise” some extremely relevant information. Our experience shows that overcoming such situations requires splitting what previously used to be a single analytic process in several autonomous components and creating for each of the resulting agent processes its « buffered projection » of the environment. Let us call this action (goodbye, Wikipedia) agenting of an analytical process. And let us call agency the quality acquired by an analytical process (or rather to a system of analytical processes) due to agenting.

现在,我们正在为分析过程配备自治级别的必要性,这样它不仅可以应对环境的可变性,而且可以应对其状态的不确定性。 这里不打算提及环境的量子性质(我们将在其他出版物中进行讨论),我们仅考虑分析过程在预期“体积”中的预期时刻遇到预期状态的概率。 例如:在将新数据应用到模型之前,它“设法”完成了模型训练的运行,但是“失败”了以完成模型(例如,出于几个客观原因,训练样本包含了更多内容)记录比通常)。 另一个示例:贴标团队在此过程中添加了一批新印刷机,使用该新材料训练了矢量化模型,而神经网络仍在使用以前的矢量化,并将一些极为相关的信息视为“噪音”。 我们的经验表明,要克服这种情况,需要将以前曾经是单个分析过程的部分拆分为几个自治组件,并为每个生成的代理程序创建其对环境的“缓冲投影”。 让我们称此动作为分析过程的代理(再见,维基百科)。 让我们称代理为由于代理而通过分析过程(或分析过程系统)获得的质量。

机器人的任务 (A task for the robot)

At this point, we will try to come up with a task that would need a robotized AI with all the qualities mentioned above. It will not take as a long journey to get to ideas, especially because of a wealth of some very interesting cases and solutions for those cases published in the Internet – we will simply re-use one of such cases/solutions (to obtain both the task and the solution formulation). The scenario we have chosen is about classification of postings (“tweets”) in the Twitter social network, based on their sentiment. To train the models, we have rather large samples of “labeled” tweets (i.e. with sentiment specified), while classification will be performed on “unlabeled” tweets (i.e. without sentiment specified):

在这一点上,我们将尝试提出一项任务,该任务需要具有上述所有特性的机器人化AI。 无需花费很长的时间就可以找到想法,尤其是由于存在一些非常有趣的案例以及针对这些案例的解决方案,这些案例和解决方案已在Internet上发布-我们将简单地重用其中一种案例(解决方案)任务和解决方案的制定)。 我们选择的场景是基于Twitter社交网络中帖子的分类(“ tweets”)。 为了训练模型,我们有相当大的“标签”推文样本(即指定了情感),而分类将在“未标签”推文(即未指定情感)下进行:

image
Figure 1 Sentiment-based text classification (sentiment analysis) task formulation 图1基于情感的文本分类(情感分析)任务表述

An approach to creating mathematical models able to learn from labeled texts and classify unlabeled texts with unknown sentiment, is presented in a great example published on the Web.

在Web上发布的一个很好的例子中提出了一种创建数学模型的方法,该模型可以从带有标签的文本中学习并分类具有未知情感的未带有标签的文本。

The data for our scenario has been kindly made available from the Web.

可以从Web上获得我们场景的数据

With all the above at hands, we could be starting to “assemble a robot” – however, we prefer complicating the classical task by adding a condition: both labeled and unlabeled data are fed to the analytical process as standard-size files as the process “consumes” the already fed files. Therefore, our robot will need to begin operating on minimal volumes of training data and continually improve classification accuracy by repeating model training on gradually growing data volumes.

有了上述所有内容,我们可能会开始“组装机器人”,但是,我们更喜欢通过添加条件来使经典任务复杂化:将标记和未标记的数据都作为标准大小文件作为分析过程送入分析过程“消费”已经送入的文件。 因此,我们的机器人将需要开始使用最小量的训练数据进行操作,并通过对逐渐增长的数据量重复进行模型训练来不断提高分类准确性。

前往InterSystems研讨会 (To InterSystems workshop)

We will demonstrate, taking the scenario just formulated as an example, that InterSystems IRIS and ML Toolkit, a set of extensions, can robotize artificial intelligence. And achieve an efficient interaction with the external environment for the analytic processes we create, while keeping them adaptable, adaptive and agent (the «three А»).

我们将以刚刚提出的场景为例,演示InterSystems IRIS和ML Toolkit(一组扩展)可以使人工智能机器人化。 并与我们创建的分析过程实现与外部环境的有效交互,同时保持它们的适应性,适应性和代理性(“三个А”)。

Let us begin with agency. We deploy four business processes in the platform:

让我们从代理开始。 我们在平台中部署了四个业务流程:

image
Figure 2 Configuration of an agent-based system of business processes with a component for interaction with Python 图2使用代理与Python交互的组件的基于代理的业务流程系统的配置
  • 发电机 (GENERATOR)

    – as previously generated files get consumed by the other processes, generates new files with input data (labeled – positive and negative tweets – as well as unlabeled tweets)

    –随着先前生成的文件被其他进程占用,生成带有输入数据的新文件(标记为-正向和负面的推文-以及未标记的推文)

  • 缓冲 (BUFFER)

    – as already buffered records are consumed by the other processes, reads new records from the files created by GENERATOR and deletes the files after having read records from them

    –由于已经缓冲的记录被其他进程占用,因此从GENERATOR创建的文件中读取新记录,并在从文件中读取记录后将其删除

  • 分析仪 (ANALYZER)

    – consumes records from the unlabeled buffer and applies to them the trained RNN (recurrent neural network), transfers the “applied” records with respective “probability to be positive” values added to them, to the monitoring buffer; consumes records from labeled (positive and negative) buffers and trains the neural network based on them

    –消耗来自未标记缓冲区的记录,并将经过训练的RNN(递归神经网络)应用于它们,将“应用的”记录以及已添加的相应“概率为正”值传输到监视缓冲区; 消耗来自标记(正和负)缓冲区的记录,并基于它们来训练神经网络

  • 监控 (MONITOR)

    – consumes records processed and transferred to its buffer by ANALYZER, evaluates the classification error metrics demonstrated by the neural network after the last training, and triggers new training by ANALYZER

    –消耗由ANALYZER处理并传输到其缓冲区的记录,评估上次训练后由神经网络演示的分类误差指标,并由ANALYZER触发新的训练

Our agent-based system of processes can be illustrated as follows:

我们基于代理的流程系统可以说明如下:

image
Figure 3 Data flows in the agent-based system 图3基于代理的系统中的数据流

All the processes in our system are functioning independently one from another but are listening to each other’s signals. For example, a signal for GENERATOR process to start creating a new file with records is the deletion of the previous file by BUFFER process.

我们系统中的所有进程都彼此独立运行,但是正在听对方的信号。 例如,GENERATOR进程开始使用记录创建新文件的信号是BUFFER进程删除了先前的文件。

Now let us look at adaptiveness. The adaptiveness of the analytic process in our example is implemented via “encapsulation” of the AI as a component that is independent from the logic of the carrier process and whose main functions – training and prediction – are isolated one from another:

现在让我们看一下适应性。 在我们的示例中,分析过程的自适应性是通过AI的“封装”实现的,它是独立于载体过程逻辑的组件,并且其主要功能(训练和预测)彼此隔离:

image
Figure 4 Isolation of the AI’s main functions in an analytic process – training and prediction using mathematical models 图4在分析过程中AI主要功能的隔离-使用数学模型进行训练和预测

Since the above-quoted fragment of ANALYZER process is a part of the “endless loop” (that is triggered at the process startup and is functioning till the whole agent-based system is shut down), and since the AI functions are executed concurrently, the process is capable of adapting the use of AI to the situation: training models if the need arises, predicting based on the available version of trained models, otherwise. The need to train the models is signaled by the adaptive MONITOR process that functions independently from ANALYZER process and applies its criteria to estimate the accuracy of the models trained by ANALYZER:

由于上面提到的ANALYZER流程片段是“无限循环”的一部分(在流程启动时触发,并且在整个基于代理的系统关闭之前一直起作用),并且由于AI功能是同时执行的,该过程能够使AI的使用适应以下情况:如果需要,则对模型进行训练,否则根据可用的模型进行预测。 适应性MONITOR流程独立于ANALYZER过程发挥作用,并根据其准则估算由ANALYZER训练的模型的准确性,从而表明需要训练模型。

image
Figure 5 Recognition of the model type and application of the respective accuracy metrics by MONITOR process 图5通过MONITOR流程识别模型类型和相应精度指标的应用

We continue with adaptability. An analytic process in InterSystems IRIS is a business process that has a graphical or XML representation in a form of a sequence of steps. The steps in their turn can be sequences of other steps, loops, condition checks and other process controls. The steps can execute code or transmit information (can be code as well) for treatment by other processes and external systems.

我们继续保持适应性。 InterSystems IRIS中的分析过程是一种业务过程,具有以步骤序列的形式进行的图形或XML表示。 这些步骤依次可以是其他步骤,循环,条件检查和其他过程控制的序列。 这些步骤可以执行代码或传输信息(也可以是代码),以供其他进程和外部系统处理。

If there is a necessity to change an analytical process, we have a possibility to do that in either the graphical editor or in the IDE. Changing the analytical process in the graphical editor allows adapting process logic without programing:

如果有必要更改分析过程,我们可以在图形编辑器或IDE中进行。 通过在图形编辑器中更改分析过程,无需编程即可调整过程逻辑:

image
Figure 6 ANALYZER process in the graphical editor with the menu open for adding process controls 图6图形编辑器中的ANALYZER流程,打开了用于添加流程控件的菜单

Finally, it is interaction with the environment. In our case, the most important element of the environment is the mathematical toolset Python. For interaction with Python and R, the corresponding functional extensions were developed – Python Gateway and R Gateway. Enabling of a comfortable interaction with a concrete toolset is their key functionality. We could already see the component for interaction with Python in the configuration of our agent-based system. We have demonstrated that business processes that contain AI implemented using Python language, can interact with Python.

最后,它是与环境的相互作用。 在我们的案例中,环境中最重要的元素是数学工具集Python。 为了与Python和R交互,开发了相应的功能扩展– Python GatewayR Gateway 。 他们的主要功能是与具体工具集实现舒适的交互。 在基于代理的系统的配置中,我们已经可以看到与Python交互的组件。 我们已经证明,包含使用Python语言实现的AI的业务流程可以与Python交互。

ANALYZER process, for instance, carries the model training and prediction functions implemented in InterSystems IRIS using Python language, like it is shown below:

例如,ANALYZER流程带有使用Python语言在InterSystems IRIS中实现的模型训练和预测功能,如下所示:

image
Figure 7 Model training function implemented in ANALYZER process in InterSystems IRIS using Python 图7使用Python在InterSystems IRIS的ANALYZER流程中实现的模型训练功能

Each of the steps in this process is responsible for a specific interaction with Python: a transfer of input data from InterSystems IRIS context to Python context, a transfer of code for execution to Python, a return of output data from Python context to InterSystems IRIS context.

此过程中的每个步骤都负责与Python进行特定的交互:将输入数据从InterSystems IRIS上下文传输到Python上下文,将要执行的代码传输到Python,将输出数据从Python上下文返回到InterSystems IRIS上下文。

The most used type of interactions in our example is the transfer of code for execution in Python:

在我们的示例中,最常用的交互类型是在Python中执行代码的传输:

image
Figure 8 Python code deployed in ANALYZER process in InterSystems IRIS is sent for execution to Python 图8将InterSystems IRIS的ANALYZER流程中部署的Python代码发送给Python执行

In some interactions there is a return of output data from Python context to InterSystems IRIS context:

在某些交互中,输出数据从Python上下文返回到InterSystems IRIS上下文:

image
Figure 9 Visual trace of ANALYZER process session with a preview of the output returned by Python in one of the process steps 图9 ANALYZER流程会话的可视化轨迹,其中显示了在流程步骤之一中Python返回的输出的预览

启动机器人 (Launching the robot)

Launching the robot right here in this article? Why not, here is the recording from our webinar in which (besides other interesting AI stories relevant for robotization!) the example discussed in our article was demoed. The webinar time being always limited, unfortunately, and we still prefer showcasing our work as illustratively though briefly as possible – and we are therefore sharing below a more complete overview of the outputs produced (7 training runs, including the initial training, instead of just 3 in the webinar):

在本文中启动机器人吗? 为什么不呢,这是我们网络研讨会的录音 ,其中演示了本文讨论的示例(除了其他有趣的与机器人有关的AI故事!)。 不幸的是,网络研讨会的时间总是有限的,我们仍然希望尽可能简短地展示我们的工作,因此,我们在下面分享了所产生成果的更完整概述(7次培训,包括初始培训,而不是仅仅网络研讨会中的3):

image
Figure 10 Robot reaching a steady AUC above 0.8 on prediction 图10机器人在预测时达到稳定的AUC高于0.8

These results are in line with our intuitive expectations: as the training dataset gets filled with “labeled” positive and negative tweets, the accuracy of our classification model improves (this is proven by the gradual increase of the AUC values shown on prediction).

这些结果符合我们的直觉期望:随着训练数据集充满“标记的”正面和负面推文,我们的分类模型的准确性提高了(这已通过预测中显示的AUC值逐渐增加得到证明)。

What conclusions can we make at the end of the article:

我们可以在文章末尾得出什么结论:

• InterSystems IRIS is a powerful platform for robotization of the processes involving artificial intelligence

•InterSystems IRIS是一个强大的平台,可对涉及人工智能的流程进行机器人化

• Artificial intelligence can be implemented in both the external environment (e.g., Python or R with their modules containing ready-to-use algorithms) and in InterSystems IRIS platform (using native function libraries or by writing algorithms in Python and R languages). InterSystems IRIS secures interaction with external AI toolsets allowing to combine their capabilities with its native functionality

•人工智能既可以在外部环境(例如Python或R及其模块中包含即用型算法)中实现,也可以在InterSystems IRIS平台中实现(使用本机函数库或通过用Python和R语言编写算法)来实现。 InterSystems IRIS可确保与外部AI工具集的交互,从而将其功能与其本机功能相结合

• InterSystems IRIS robotizes AI by applying “three A”: adaptable, adaptive and agent business processes (or else, analytic processes)

•InterSystems IRIS通过应用“三个A”来使AI机器人化:自适应,自适应和代理业务流程(或其他分析流程)

• InterSystems IRIS operates external AI (Python, R) via kits of specialized interactions: transfer/return of data, transfer of code for execution, etc. One analytic process can interact with several mathematical toolsets

•InterSystems IRIS通过专用交互工具包来操作外部AI(Python,R):数据的传输/返回,执行代码的传输等。一个分析过程可以与多个数学工具集进行交互

• InterSystems IRIS consolidates on a single platform input and output modeling data, maintains historization and versioning of calculations

•InterSystems IRIS在单一平台上整合了输入和输出建模数据,保持了计算的历史化和版本控制

• Thanks to InterSystems IRIS, artificial intelligence can be both used as specialized analytic mechanisms, or built in OLTP and integration solutions

•借助InterSystems IRIS,人工智能既可以用作专门的分析机制,也可以内置在OLTP和集成解决方案中

For those who have read this article and got interested by the capabilities of InterSystems IRIS as a platform for developing and deploying machine learning and artificial intelligence mechanisms, we propose a further discussion of the potential scenarios that are relevant to your company, and a collaborative definition of the next steps. The contact e-mail of our AI/ML expert team is MLToolkit@intersystems.com.

对于那些已阅读本文并对InterSystems IRIS作为开发和部署机器学习和人工智能机制的平台的功能感兴趣的人,我们建议对与您公司相关的潜在方案进行进一步讨论,并提出协作定义下一步。 我们的AI / ML专家团队的联系电子邮件是MLToolkit@intersystems.com

翻译自: https://habr.com/en/company/intersystems/blog/478822/

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值