基于深度学习机器翻译系统_基于格理论的机器学习系统对象表示

最新推荐文章于 2022-10-25 01:41:57 发布

cullen2012

最新推荐文章于 2022-10-25 01:41:57 发布

阅读量342

点赞数

文章标签： python 机器学习人工智能深度学习 java

原文链接：https://habr.com/en/post/510746/

版权

基于深度学习机器翻译系统

This is a fourth article in the series of works (see also first one, second one, and third one) describing Machine Learning system based on Lattice Theory named 'VKF-system'. The program uses Markov chain algorithms to generate causes of the target property through computing random subset of similarities between some subsets of training objects. This article describes bitset representations of objects to compute these similarities as bit-wise multiplications of corresponding encodings. Objects with discrete attributes require some technique from Formal Concept Analysis. The case of objects with continuous attributes asks for logistic regression, entropy-based separation of their ranges into subintervals, and a presentation corresponding to the convex envelope for subintervals those similarity is computed.

这是这一系列工作的第四篇文章(另请参阅第一篇，第二篇和第三篇 )，描述了基于格子理论的名为“ VKF-system”的机器学习系统。该程序使用马尔可夫链算法通过计算训练对象某些子集之间的相似性随机子集来生成目标属性的原因。本文介绍了对象的位集表示形式，以将这些相似度计算为相应编码的按位乘法。具有离散属性的对象需要形式概念分析中的某些技术。具有连续属性的对象的情况要求逻辑回归，将其范围划分为子区间的基于熵的分离，并计算与子区间的凸包络相对应的表示形式，以得出这些相似性。

1离散属性 (1 Discrete attributes)

To encode object with only discrete attributes we need to compute an auxiliary bitset representations of values of each attribute. We assume that an expert is able to relate these values with respect to some 'general/special' partial order. The ordering must form a low semi-lattice after addition the special value 'null' (with shorthand '_' in some cases) to denote a trivial (absent) similarity between values of the given attribute of comparable objects.

为了仅使用离散属性编码对象，我们需要计算每个属性值的辅助位集表示形式。我们假设专家能够将这些值与某些“一般/特殊”偏序相关联。在添加特殊值“空”(在某些情况下为简写“ _”)以表示可比较对象的给定属性的值之间的平凡(不存在)相似性之后，该排序必须形成一个低半格。

The representation of whole object is a concatenation of encodings of values of its attributes in some fixed order. Then bit-wise multiplication over long bitset strings reduces to multiplications over values of each attribute. Hence encoding must replace similarity between values by bit-wise multiplication.

整个对象的表示是以某种固定顺序对属性值进行编码的串联。然后，长位集字符串上的按位乘法将减少为每个属性值的乘法。因此，编码必须通过逐位乘法来替换值之间的相似性。

Since any low semi-lattice easily converts into a lattice (with additional top element, if it is absent), Formal Concept Analysis (FCA) provides all essential tools.

由于任何低半格容易转换为晶格(如果没有，则带有附加的顶部元素)，因此形式概念分析(FCA)提供了所有必不可少的工具。

Modern formulation of Fundamental Theorem of FCA asserts that For every finite lattice let be a (super)set of all -irreducible elements and be a (super)set of all -irreducible elements. For the sample generates the all candidates lattice that is isomorphic to the original lattice .

FCA基本定理的现代表述断言，对于每个有限晶格让成为所有人的(超级)集合 -不可约元素和成为所有人的(超级)集合 -不可约元素。对于样本生成所有候选晶格与原始晶格同构。

Element of lattice is called -irreducible, if and for all and imply . Element of lattice is called -irreducible, if and for all and imply .

元件晶格叫做 - 不可约 ，如果并为所有人和意味着。元件晶格叫做 - 不可约 ，如果并为所有人和意味着。

The lattice below contains red vertices as -irreducible elements and blue vertices as -irreducible ones.

下面的晶格包含红色顶点，如 -不可约元素和蓝色顶点 -不可约的。

Fundamental theorem (initially proved by Prof. Rudolf Wille through the sample ) implies minimal sample of the form

基本定理(最初由Rudolf Wille教授通过样本证明) )表示该形式的最小样本

G\M	h	i	j	k
a	1	1	1	0
b	0	1	1	1
c	1	1	0	0
d	1	0	1	0
f	0	1	0	1
g	0	0	1	1

G \ M	H	一世	Ĵ	ķ
一个	1个	1个	1个	0
b	0	1个	1个	1个
C	1个	1个	0	0
d	1个	0	1个	0
F	0	1个	0	1个
G	0	0	1个	1个

to generate the lattice of all the candidates that is isomorphic to the original lattice.

生成与原始晶格同构的所有候选晶格。

Note, that the Wille's sample uses 121 bites, and the new one needs only 24 bites!

请注意，Wille的样本使用121个叮咬，而新样本仅需要24个叮咬！

The author proposed the following algorithm to encode values by bitsets:

作者提出了以下算法来按位集编码值：

Topological sort of elements of the semi-lattice.
半格的元素的拓扑排序。
In the matrix of order look for columns that coincide with bit-wise multiplication of previous ones (every such column corresponds to -reducible element).
在顺序矩阵中寻找与先前的按位乘法相符的列(每个这样的列对应于 -可还原元素)。
All founded ( -reducible) columns are removed.
全部成立( (可还原)列被删除。
Rows of remaining matrix form codes of the corresponding values.
其余矩阵行形成相应值的代码。

This algorithm is a part of both 'vkfencoder' (as vkfencoder.XMLImport class constructor) and 'vkf' (as vkf.FCA class constructor) CPython libraries. The difference is sources: vkf.FCA reads a MariaDB database table and vkfencoder.XMLImport reads an XML file.

此算法是“ vkfencoder”(作为vkfencoder.XMLImport类构造函数)和“ vkf”(作为vkf.FCA类构造函数)CPython库的一部分。区别在于来源：vkf.FCA读取MariaDB数据库表，而vkfencoder.XMLImport读取XML文件。

2连续属性 (2 Continuous attributes)

We discuss steps of encodings of continuous attributes case in the order of their inventions. At first, we apply an idea of C4.5 system of decision trees learning to separation of variable's domain into subintervals through entropy considerations. After that we encode appearance of value in some subinterval by bitset in such a way that bit-wise multiplication corresponds to convex envelope of compared subintervals. At last, we consider how to combine several attributes to obtain their disjunction and implications. The key is to compute logistic regression between them.

我们按其发明的顺序讨论了连续属性案例编码的步骤。首先，我们将C4.5决策树学习系统的思想通过熵考虑应用于将变量域分离为子区间。之后，我们通过按位集在某个子间隔中对值的出现进行编码，以使按位乘法对应于比较子间隔的凸包络。最后，我们考虑如何组合多个属性以获得它们的分离和含义。关键是计算它们之间的逻辑回归。

2.1熵方法 (2.1 Entropy approach)

When we have a continuous attribute its range must be separated into several subintervals of possible different influence on the target property. To choose correct thresholds we relate this attribute and the target property through entropy.

当我们拥有一个连续属性时，必须将其范围分为几个可能对目标属性产生不同影响的子间隔。为了选择正确的阈值，我们通过熵将该属性与目标属性相关联。

Let be a disjoint union of training examples and counter-examples . Interval of values of continuous attribute generates three subsets , and .

让成为培训实例的脱节联合和反例。间隔连续属性值生成三个子集和。

Entropy of interval of values of continuous attribute is

区间熵连续属性值是

Mean information for partition of interval of values of continuous attribute is

分区的均值信息 间隔连续属性值是

Threshold is a value with minimal mean information.

阈值是一个值最少的平均信息。

For continuous attribute denote by and let be an arbitrary number greater than . Thresholds are computed sequentially by splitting the most entropy subinterval.

对于连续属性表示通过然后让大于。门槛通过分割最大熵子间隔来顺序计算。

2.2凸包络的位集编码 (2.2 Bitset encodings for convex envelope)

We represent continuous attribute value by bitset string of length , where is a number of thresholds. Bitset may be considered as a string of indicator (Boolean) variables

我们用长度的位集字符串表示连续的属性值，在哪里是许多阈值。位集可以视为指示符(布尔)变量的字符串

where .

哪里。

Then string is a bitset representation of continuous attribute on element .

然后串是连续属性的位集表示在元素上。

The next Lemma asserts that the result of bit-wise multiplication of bitset representations is convex envelope of its arguments' intervals.

下一个引理断言，位集表示的按位乘法结果是其参数间隔的凸包络。

Let represent and represent . Then

让代表和代表。然后

corresponds to .

对应于。

Note that the trivial similarity corresponds to the trivial condition .

注意琐碎的相似性对应于琐碎的条件。

2.3连续属性之间的关系 (2.3 Relations between continuous attributes)

The FCA-based approach to Machine Learning naturally considers a conjunction of several binary attributes as a possible cause of the target property. In the case of discrete attribute an expert has opportunity to express disjunction of the values through additional values (see lattice structures in paragraph 1). The case of continuous attributes is different. So we need some technique to include this case too.

基于FCA的机器学习方法自然会将几种二进制属性的结合视为目标属性的可能原因。在离散属性的情况下，专家将有机会通过附加值来表示值的相异(请参阅第1段的格结构)。连续属性的情况不同。因此，我们也需要一些技巧来包括这种情况。

The key was the following Lemma

关键是以下引理

The disjunction of propositional variables is equivalent to satisfaction of inequality for any .

命题变量的析取等于不平等的满足对于任何。

Since we restrict ourselves to two target classes, we look for a classifier

由于我们将自己限制为两个目标类别，因此我们寻找分类器

Classifier is a map R , where is a domain of objects to classify (described by continuous attributes) and are the target labels.

分类器是一张地图 [R ，在哪里是要分类的对象的域(由连续属性)和是目标标签 。

As usual, we assume the existence of some probability distribution of which can be decomposed as

像往常一样，我们假设存在一定概率分布可以分解为

where is a marginal distribution of objects and is a conditional distribution of labels on given object, i.e. for every the following decomposition

哪里是对象的边际分布，是标签在给定对象上的条件分布，即对于每个对象以下分解

holds.

持有。

Error probability of classifier is

分类器的错误概率 是

Bayes classifier with respect to corresponds to

贝叶斯分类器 关于对应于

We remind well-known Theorem on optimality of Bayes classifier

我们提醒您有关贝叶斯分类器最优性的著名定理

The Bayes classifier has the minimal error probability:

贝叶斯分类器具有最小的错误概率：

Bayes Theorem implies

贝叶斯定理暗示

where and is the well-known logistic function.

哪里和是众所周知的逻辑函数 。

2.4属性之间的逻辑回归 (2.4 Logistic regression between attributes)

Let approximate unknown by linear combination of basis functions ( ) with respect to unknown weights .

让近似未知通过线性组合基本功能 ( )关于未知权重。

For training sample introduce signs . Then

训练样本引入标志。然后

Hence the logarithm of likelihood

因此，似然的对数

is concave.

是凹的。

Newton-Raphson method leads to iterative procedure

牛顿-拉夫森法导致迭代过程

With help of we obtain

在...的帮助下我们获得

where is diagonal matrix with elements and is vector with coordinates .

哪里是带有元素的对角矩阵和是带有坐标的向量。

where are iterative calculated weights.

哪里是迭代计算的权重。

As usual, the ridge regression helps to avoid ill-conditioned situation

与往常一样，脊回归有助于避免病情恶化

In the computer program 'VKF system' we use standard basis: constant 1 and attributes themselves.

在计算机程序“ VKF系统”中，我们使用标准基础：常量1和属性本身。

At last, we need a criterion for significance of regression. For logistic regression two types of criteria were applied:

最后，我们需要一个回归显着性的标准。对于逻辑回归，应用了两种类型的标准：

Criterion of Cox-Snell declares attribute significant, if

Cox-Snell准则声明属性重要的，如果

McFadden criterion declares attribute significant, if

McFadden准则声明属性重要的，如果

结论 (Conclusion)

The 'VKF-system' was applied to Wine Quality dataset from Machine Learning repository (University California Irvine). The experiments demonstrated the prospects of the proposed approach. For high-quality red wines (with rating >7), all examples were classified correctly.

“ VKF系统”已应用于机器学习存储库(加州大学欧文分校)的葡萄酒质量数据集。实验证明了该方法的前景。对于优质红酒(评级> 7)，所有示例均正确分类。

The disjunction situation (from paragraph 2.3) arose with 'alcohol' and 'sulphates' relationship. Positive (although slightly different) weights correspond to different scales of measurement of different attributes, and the threshold was strictly between 0 and 1. The situation with 'citric acid' and 'alcohol' was similar.

(第2.3段中的)分离状态是由于“酒精”和“硫酸盐”的关系引起的。正(尽管略有不同)权重对应于不同属性的不同测量范围，并且阈值严格在0和1之间。“柠檬酸”和“酒精”的情况相似。

The situation with the pair ('pH', 'alcohol') was radically different. The weight of 'alcohol' was positive, whereas the weight for 'pH' was negative. But with an obvious logical transformation we get the implication ('pH' 'alcohol').

配对的情况(“ pH”，“酒精”)完全不同。 “酒精”的重量为正，而“ pH”的重量为负。但是通过明显的逻辑转换，我们得到了暗示(“ pH” '醇')。

The author would like to thanks his colleagues and students for support and stimulus.

作者要感谢他的同事和学生的支持和刺激。