机器学习的Web服务器'VKF-solver'

Their training requires constantly increasing volume of samples, and they also do not be able to explain why a particular decision was made. Structural approaches to Machine Learning avoiding these drawbacks exist, the software implementation of one of which is described in the article. This is an English translation of original post by the author.

他们的培训需要不断增加的样本量,而且他们也无法解释为什么做出特定决定。 存在避免这些缺点的机器学习结构方法,本文描述了其中一种方法的软件实现。 这是作者原帖的英文翻译。

We describe one of national approaches to Machine Learning, called «VKF-method of Machine Learning based on Lattice Theory». The origin and choice of the name are explained at the end of this article.

我们描述了一种称为“基于格理论的机器学习的VKF方法”的国家机器学习方法。 名称的由来和选择在本文末尾进行了说明。

1.方法说明 (1. Method description)

The initial system was created by the author as a console C++ application, then it obtained MariaDB DBMS databases support (through the mariadb++ library), then it was converted into a CPython library (using the pybind11 package).

最初的系统由作者创建为控制台C ++应用程序,然后(通过mariadb ++库)获得了MariaDB DBMS数据库支持,然后将其转换为CPython库(使用pybind11软件包)。

Several datasets from the UCI machine learning repository were selected as approbation of the concept. The mushrooms dataset contains descriptions of 8124 mushrooms in North America, the system achieves 100% results. More precisely, the initial data was randomly divided into a training sample (2,088 edible and 1,944 poisonous mushrooms) and a test sample (2,120 edible and 1,972 poisonous mushrooms). After computing about 100 hypotheses about the causes of edibility, all test cases were predicted correctly. Since the algorithm uses a coupled Markov chain, a sufficient number of hypotheses may vary. It was often enough to generate 50 random hypotheses. Note that when generating the causes of poisonous fungi the number of required hypotheses is grouped around 120, however, all test cases are predicted correctly in this case too.

选择了UCI机器学习存储库中的几个数据集作为对该概念的认可。 蘑菇数据集包含北美8124个蘑菇的描述,系统可达到100%的结果。 更准确地说,将初始数据随机分为一个训练样本(2,088个可食用蘑菇和1,944个有毒蘑菇)和测试样本(2,120个可食用和1,972个有毒蘑菇)。 在计算出有关可食性原因的大约100个假设之后,所有测试用例都得到了正确的预测。 由于该算法使用耦合马尔可夫链,因此可能会有足够多的假设。 通常足以产生50个随机假设。 请注意,在生成有毒真菌的原因时,所需假设的数量大约为120,但是,在这种情况下,所有测试用例也都可以正确预测。

Kaggle.com has a competition Mushroom Classification where quite a few authors have achieved 100% accuracy. However most of the solutions are neural networks. Our approach allows the mushroom picker to remember about 50 rules. Moreover, the most features are insignificant, hence each hypothesis is a conjunction of a small number of values of essential features, which makes them easy to remember. After that, the human can go for mushrooms without being afraid to take a toadstool or skip an edible mushroom.

Kaggle.com有一个竞争性蘑菇分类 ,许多作者都达到了100%的准确性。 但是,大多数解决方案是神经网络。 我们的方法允许蘑菇采摘者记住大约50条规则。 而且,大多数特征是微不足道的,因此每个假设都是少量基本特征值的结合,这使它们易于记忆。 在那之后,人类可以去采蘑菇,而不必担心会拿伞菌或跳过可食用的蘑菇。

Here is a positive hypothesis that leads to assumption of edibility of a mushroom:

这是一个导致蘑菇可食用性假设的正面假设:

[('gill_attachment', 'free'), ('gill_spacing', 'close'), ('gill_size', 'broad'), ('stalk_shape', 'enlarging'), ('stalk_surface_below_ring', 'scaly'), ('veil_type', 'partial'), ('veil_color', 'white'), ('ring_number', 'one'), ('ring_type', 'pendant')]

[('gill_attachment','free'),('gill_spacing','close'),('gill_size','broad'),('stalk_shape','enlarging'),('stalk_surface_below_ring','scaly') ,('veil_type','partial'),('veil_color','white'),('ring_number','one'),('ring_type','pendant')]]

Please note that only 9 of the 22 features are listed, since the similarity between the edible mushrooms that generate this cause is empty on the remaining 13 attributes.

请注意,仅列出了22个特征中的9个,因为产生该原因的可食用蘑菇之间的相似性在其余13个属性上为空。

Second dataset was SPECT Hearts. There, the accuracy of predicting test examples reached 86.1%, which turned out to be slightly higher than the results (84%) of the CLIP3 Machine Learning system (Cover Learning with Integer Programming, version 3), used by the authors of the data. I believe that due to the structure of the description of heart tomograms, which are already pre-encoded by binary attributes, it is unpossible to significantly improve the quality of the forecast.

第二个数据集是SPECT Hearts。 在那里,预测测试示例的准确性达到了86.1%,这比数据作者使用的CLIP3机器学习系统(带整数编程的封面学习,版本3)的结果(84%)略高。 。 我相信,由于心脏断层图像的描述结构已经通过二进制属性进行了预编码,因此不可能显着提高预测质量。

Recently the author discovered (and implemented) an extension of his approach to the processing of data described by a continuous (numeric) features. In some ways, his approach is similar to the C4.5 system of Decision Trees Learning. This variant was tested on the Wine Quality dataset. The data describes the quality of Portuguese wines. The results are encouraging: if you take high-quality red wines, the hypotheses fully explain their high ratings.

最近,作者发现(并实现了)他的方法的扩展,以处理由连续(数字)特征描述的数据。 在某些方面,他的方法类似于“决策树学习”的C4.5系统。 此变体已在Wine Quality数据集中进行了测试。 数据描述了葡萄牙葡萄酒的质量。 结果令人鼓舞:如果您选择高品质的红酒,则这些假说充分说明了它们的高评价。

2.框架选择 (2. Framework choice)

Now students at Intelligent systems Department of RSUH is developing a serie of web servers for different research areas (using Nginx + Gunicorn + Django).

现在,RSUH智能系统系的学生正在为不同的研究领域(使用Nginx + Gunicorn + Django)开发一系列Web服务器。

Here I'll describe a different variant (based on aiohttp, aiojobs, and aiomysql). The aiomcache module is rejected due to well-known security problems.

在这里,我将描述一个不同的变体(基于aiohttp,aiojobs和aiomysql)。 由于众所周知的安全问题,aiomcache模块被拒绝了。

Proposed variant has several advantages:

提议的变体具有几个优点:

  1. it uses asynchronous framework aiohttp;

    它使用异步框架aiohttp;
  2. it admits the Jinja2 templates;

    它承认Jinja2模板;
  3. it works with a pool of connections to DB through aiomysql;

    它与通过aiomysql到数据库的连接池一起工作;
  4. it generates several processes by aiojobs.aiohttp.spawn.

    它通过aiojobs.aiohttp.spawn生成几个进程。

It has a obvious disadvatages (with respect to Django):

它有一个明显的缺点(相对于Django):

  1. no Object Relational Mapping (ORM);

    没有对象关系映射(ORM);
  2. more difficult integration with Nginx as a proxy;

    与Nginx作为代理的集成更加困难;
  3. no Django Template Language (DTL).

    没有Django模板语言(DTL)。

Each of the two options targets on different strategies for working with the web server. Synchronous strategy (in Django) is aimed at single-user mode, in which the expert works with a single database at each time. Although the probabilistic procedures of the VKF method are well parallelized, nevertheless, it is theoretically possible that Machine Learning procedures will take a significant amount of time. Therefore, the second option is aimed at several experts, each of whom can simultaneously work (in different browser tabs) with different databases that differ not only in data, but also in the way they are represented (different lattices on the values of discrete features, different significant regressions, and the number of thresholds for continuous ones). In this case, after starting the VKF computation in one tab, the expert can switch to another, where she will prepare or analyze the experiment with other data and/or parameters.

这两个选项中的每一个都针对使用Web服务器的不同策略。 同步策略(在Django中)针对单用户模式,其中专家每次都使用一个数据库。 尽管VKF方法的概率过程可以很好地并行化,但是从理论上讲,机器学习过程可能会花费大量时间。 因此,第二种选择针对的是几位专家,他们每个人都可以同时(在不同的浏览器选项卡中)使用不同的数据库,这些数据库不仅在数据方面而且在表示方式上也不同(离散要素值上的不同格子) ,不同的显着回归以及连续阈值的数量)。 在这种情况下,在一个选项卡中启动VKF计算后,专家可以切换到另一个选项卡,在那里她将使用其他数据和/或参数来准备或分析实验。

There is an auxiliary (service) database 'vkf' with two tables 'users' and 'experiments' to account for multiple users, experiments, and different stages they are at. The table 'user' stores the login and password of all registered users. The table 'experiments' saves a status of these tables in addition to the names of auxiliary and main tables for each experiment. We rejected aiohttp_session module, because we still need to use the Nginx proxy server to protect critical data.

有一个辅助(服务)数据库“ vkf”,其中有两个表“ users”和“ experiments”以说明多个用户,实验及其所处的不同阶段。 表“用户”存储所有注册用户的登录名和密码。 除了每个实验的辅助表和主表的名称外,“实验”表还保存了这些表的状态。 我们拒绝了aiohttp_session模块,因为我们仍然需要使用Nginx代理服务器来保护关键数据。

The structure of the table 'experiments' are the following:

表“实验”的结构如下:

  • id int(11) NOT NULL PRIMARY KEY

    id int(11)非空主键
  • expName varchar(255) NOT NULL

    expName varchar(255)非空
  • encoder varchar(255)

    编码器varchar(255)
  • goodEncoder tinyint(1)

    goodEncoder tinyint(1)
  • lattices varchar(255)

    格子varchar(255)
  • goodLattices tinyint(1)

    goodLattence tinyint(1)
  • complex varchar(255)

    复杂varchar(255)
  • goodComplex tinyint(1)

    goodComplex tinyint(1)
  • verges varchar(255)

    verges varchar(255)
  • goodVerges tinyint(1)

    goodVerges tinyint(1)
  • vergesTotal int(11)

    vergesTotal int(11)
  • trains varchar(255) NOT NULL

    训练varchar(255)NOT NULL
  • goodTrains tinyint(1)

    goodtrains tinyint(1)
  • tests varchar(255)

    测试varchar(255)
  • goodTests tinyint(1)

    goodTests tinyint(1)
  • hypotheses varchar(255) NOT NULL

    假设varchar(255)NOT NULL
  • goodHypotheses tinyint(1)

    goodHypotheses tinyint(1)
  • type varchar(255) NOT NULL

    输入varchar(255)NOT NULL

It should be noted that there are some sequences of data preparation for ML experiments, which, unfortunately, differ radically for the discrete and continuous cases.

应该注意的是,机器学习实验有一些数据准备的顺序,不幸的是,对于离散和连续的情况,这些顺序是根本不同的。

The case of mixed attributes combines both types of requirements.

混合属性的情况结合了两种类型的需求。

discrete: => goodLattices (semi-automatic)

离散的:=> goodLattices(半自动)

discrete: goodLattices => goodEncoder (automatic)

离散的:goodLattices => goodEncoder(自动)

discrete: goodEncoder => goodTrains (semi-automatic)

离散的:goodEncoder => goodTrains(半自动)

discrete: goodEncoder, goodTrains => goodHypotheses (automatic)

离散量:goodEncoder,goodTrains => goodHypotheses(自动)

discrete: goodEncoder => goodTests (semi-automatic)

离散的:goodEncoder => goodTests(半自动)

discrete: goodTests, goodEncoder, goodHypotheses = > (automatic)

离散的:goodTests,goodEncoder,goodHypotheses =>(自动)

continuous: => goodVerges (manual)

连续的:=> goodVerges(手动)

continuous: goodVerges => goodTrains (manual)

连续的:goodVerges => goodTrains(手动)

continuous: goodTrains => goodComplex (automatic)

连续的:goodTrains => goodComplex(自动)

continuous: goodComplex, goodTrains => goodHypotheses (automatic)

连续的:goodComplex,goodTrains => goodHypotheses(自动)

continuous: goodVerges => goodTests (manual)

连续的:goodVerges => goodTests(手动)

continuous: goodTests, goodComplex, goodHypotheses = > (automatic)

连续:goodTests,goodComplex,goodHypotheses =>(自动)

Machine Learning library has name vkf.cpython-36m-x86_64-linux-gnu.so under Linux or vkf. cp36-win32.pyd under OS Windows. (36 is a version of Python that this library was built for).

在Linux或vkf下,机器学习库的名称为vkf.cpython-36m-x86_64-linux-gnu.so。 Windows OS下的cp36-win32.pyd。 (36是为此库构建的Python版本)。

The term «automatic» means using of this library, «semi-automatic» means usage of an auxiliary library 'vkfencoder.cpython-36m-x86_64-linux-gnu.so'. Finally, the «manual» mode corresponds to extern program to process data with continuous features and are now being transferred into the vkfencoder library.

术语“自动”表示使用此库,“半自动”表示使用辅助库“ vkfencoder.cpython-36m-x86_64-linux-gnu.so”。 最后,“手动”模式对应于extern程序,以处理具有连续特征的数据,现在正被传输到vkfencoder库中。

3.实施细节 (3. Implementation details)

We follow the «View/Model/Control» paradigm during web server creation.

在Web服务器创建期间,我们遵循《视图/模型/控件》范例。

Python code is distributed between 5 files:

Python代码分布在5个文件之间:

  1. app.py — initialization

    app.py-初始化
  2. control.py — coroutines of Machine Learning procedures

    control.py —机器学习过程的协程
  3. models.py — data manipulation and DB connections

    models.py —数据操作和数据库连接
  4. settings.py — application settings

    settings.py-应用程序设置
  5. views.py — vizualizations and routes.

    views.py —可视化和路线。

File 'app.py' has a standard form:

文件“ app.py”具有标准格式:

#! /usr/bin/env python
import asyncio
import jinja2
import aiohttp_jinja2

from settings import SITE_HOST as siteHost
from settings import SITE_PORT as sitePort

from aiohttp import web
from aiojobs.aiohttp import setup

from views import routes

async def init(loop):
    app = web.Application(loop=loop)
    # install aiojobs.aiohttp
    setup(app)
    # install jinja2 templates
    aiohttp_jinja2.setup(app, 
        loader=jinja2.FileSystemLoader('./template'))
    # add routes from api/views.py
    app.router.add_routes(routes)
    return app

loop = asyncio.get_event_loop()
try:
    app = loop.run_until_complete(init(loop))
    web.run_app(app, host=siteHost, port=sitePort)
except:
    loop.stop()

I don't think anything needs to be explained here. The next file is 'views.py':

我认为这里不需要解释任何事情。 下一个文件是“ views.py”:

import aiohttp_jinja2
from aiohttp import web#, WSMsgType
from aiojobs.aiohttp import spawn#, get_scheduler
from models import User
from models import Expert
from models import Experiment
from models import Solver
from models import Predictor

routes = web.RouteTableDef()

@routes.view(r'/tests/{name}', name='test-name')
class Predict(web.View):
    @aiohttp_jinja2.template('tests.html')
    async def get(self):
        return {'explanation': 'Please, confirm prediction!'}

    async def post(self):
        data = await self.request.post()
        db_name = self.request.match_info['name']
        analogy = Predictor(db_name, data)
        await analogy.load_data()
        job = await spawn(self.request, analogy.make_prediction())
        return await job.wait()

@routes.view(r'/vkf/{name}', name='vkf-name')
class Generate(web.View):
    #@aiohttp_jinja2.template('vkf.html')
    async def get(self):
        db_name = self.request.match_info['name']
        solver = Solver(db_name)
        await solver.load_data()
        context = { 'dbname': str(solver.dbname),
                    'encoder': str(solver.encoder),
                    'lattices': str(solver.lattices),
                    'good_lattices': bool(solver.lattices),
                    'verges': str(solver.verges),
                    'good_verges': bool(solver.good_verges),
                    'complex': str(solver.complex),
                    'good_complex': bool(solver.good_complex),
                    'trains': str(solver.trains),
                    'good_trains': bool(solver.good_trains),
                    'hypotheses': str(solver.hypotheses),
                    'type': str(solver.type)
            }
        response = aiohttp_jinja2.render_template('vkf.html', 
            self.request, context)
        return response
            
    async def post(self):
        data = await self.request.post()
        step = data.get('value')
        db_name = self.request.match_info['name']
        if step is 'init':
            location = self.request.app.router['experiment-name'].url_for(
                name=db_name)
            raise web.HTTPFound(location=location)
        solver = Solver(db_name)
        await solver.load_data()
        if step is 'populate':
            job = await spawn(self.request, solver.create_tables())
            return await job.wait()                
        if step is 'compute':
            job = await spawn(self.request, solver.compute_tables())
            return await job.wait()                
        if step is 'generate':
            hypotheses_total = data.get('hypotheses_total')
            threads_total = data.get('threads_total')
            job = await spawn(self.request, solver.make_induction(
                hypotheses_total, threads_total))
            return await job.wait()                

@routes.view(r'/experiment/{name}', name='experiment-name')
class Prepare(web.View):
    @aiohttp_jinja2.template('expert.html')
    async def get(self):
        return {'explanation': 'Please, enter your data'}

    async def post(self):
        data = await self.request.post()
        db_name = self.request.match_info['name']
        experiment = Experiment(db_name, data)
        job = await spawn(self.request, experiment.create_experiment())
        return await job.wait()

I have shortened this file by dropping classes that serve auxiliary routes:

我通过删除提供辅助路由的类来缩短此文件:

  1. The 'Auth' class corresponds to the root route '/' and outputs a request form for user identification. If the user is not registered, there is a SignIn button that redirects the user to the '/signin' route. If a user with the entered username and password is detected, it is redirected to the route '/user/{name}'.

    “ Auth”类对应于根路由“ /”,并输出用于用户标识的请求表。 如果未注册用户,则有一个登录按钮,可将用户重定向到“ / signin”路由。 如果检测到具有输入的用户名和密码的用户,则会将其重定向到路由“ / user / {name}”。
  2. The 'SignIn' class processes the '/signin' route and returns the user to the root route after successful registration.

    “ SignIn”类处理“ / signin”路由,并在成功注册后将用户返回到根路由。
  3. The 'Select' class processes the '/user/{name}' routes and requests which experiment and stage the user wants to perform. After checking for such a DB experiment, the user is redirected to the route '/vkf/{name}' or '/experiment/{name}' (depending on existence of the declared experiment).

    “选择”类处理“ /用户/ {名称}”路由,并请求用户要执行的实验和阶段。 在检查了这样的数据库实验之后,将用户重定向到路由“ / vkf / {name}”或“ / experiment / {name}”(取决于所声明的实验的存在)。

The remaining classes correspond routes that are responsible for Machine Learning procedures:

其余类对应于负责机器学习过程的路由:

  1. The 'Prepare' class processes the '/experiment/{name}' routes and collects the names of service tables and numeric parameters necessary to run the VKF-method procedures. After saving this information in the database, the user is redirected to the route '/vkf/{name}'.

    “ Prepare”类处理“ / experiment / {name}”路由,并收集服务表的名称和运行VKF方法过程所需的数字参数。 在将此信息保存在数据库中之后,用户将被重定向到路由“ / vkf / {name}”。
  2. The 'Generate' class processes routes '/vkf/{name}' and starts various stages of the VKF method induction procedure, depending on the data preparation by an expert.

    “生成”类根据专家的数据准备处理路由“ / vkf / {name}”并启动VKF方法归纳过程的各个阶段。
  3. The 'Predict' class processes the routes '/tests/{name}' and starts the procedure of the VKF prediction by analogy.

    “ Predict”类处理路由“ / tests / {name}”,并以类推方式启动VKF预测过程。

To pass a large number of parameters to a form of vkf.html the system uses aiohttp_jinja2 construction

要将大量参数传递给vkf.html形式,系统使用aiohttp_jinja2构造

response = aiohttp_jinja2.render_template('vkf.html', self.request, context)
return response

Note the usage of spawn from aiojobs.aiohttp:

请注意aiojobs.aiohttp中spawn的用法:

job = await spawn(self.request, 
    solver.make_induction(hypotheses_total, threads_total))
return await job.wait()

This is necessary to safely call coroutines defined in the file 'models.py', processing user and experiment data stored in a database managed by the MariaDB DBMS:

为了安全地调用文件'models.py'中定义的协程,处理存储在MariaDB DBMS管理的数据库中的用户数据和实验数据,这是必需的:

import aiomysql
from aiohttp import web

from settings import AUX_NAME as auxName
from settings import AUTH_TABLE as authTable
from settings import AUX_TABLE as auxTable
from settings import SECRET_KEY as secretKey
from settings import DB_HOST as dbHost

from control import createAuxTables
from control import createMainTables
from control import computeAuxTables
from control import induction
from control import prediction

class Experiment():
    def __init__(self, dbName, data, **kw):
        self.encoder = data.get('encoder_table')
        self.lattices = data.get('lattices_table')
        self.complex = data.get('complex_table')
        self.verges = data.get('verges_table')
        self.verges_total = data.get('verges_total')
        self.trains = data.get('training_table')
        self.tests = data.get('tests_table')
        self.hypotheses = data.get('hypotheses_table')
        self.type = data.get('type')
        self.auxname = auxName
        self.auxtable = auxTable
        self.dbhost = dbHost
        self.secret = secretKey
        self.dbname = dbName

    async def create_db(self, pool):
        async with pool.acquire() as conn:
            async with conn.cursor() as cur:
                await cur.execute("CREATE DATABASE IF NOT EXISTS " +
                    str(self.dbname)) 
                await conn.commit() 
        await createAuxTables(self)
 
    async def register_experiment(self, pool):
        async with pool.acquire() as conn:
            async with conn.cursor() as cur:
                sql = "INSERT INTO " + str(self.auxname) + "." + 
                    str(self.auxtable)
                sql += " VALUES(NULL, '" 
                sql += str(self.dbname) 
                sql += "', '" 
                sql += str(self.encoder) 
                sql += "', 0, '" #goodEncoder
                sql += str(self.lattices) 
                sql += "', 0, '" #goodLattices
                sql += str(self.complex) 
                sql += "', 0, '" #goodComplex 
                sql += str(self.verges_total) 
                sql += "', 0, " #goodVerges
                sql += str(self.verges_total) 
                sql += ", '" 
                sql += str(self.trains) 
                sql += "', 0, '" #goodTrains 
                sql += str(self.tests) 
                sql += "', 0, '" #goodTests 
                sql += str(self.hypotheses) 
                sql += "', 0, '" #goodHypotheses 
                sql += str(self.type)
                sql += "')"
                await cur.execute(sql)
                await conn.commit() 

    async def create_experiment(self, **kw):
        pool = await aiomysql.create_pool(host=self.dbhost, 
            user='root', password=self.secret)
        task1 = self.create_db(pool=pool)
        task2 = self.register_experiment(pool=pool)
        tasks = [asyncio.ensure_future(task1), 
            asyncio.ensure_future(task2)]
        await asyncio.gather(*tasks)
        pool.close()
        await pool.wait_closed()
        raise web.HTTPFound(location='/vkf/' + self.dbname)        

class Solver():
    def __init__(self, dbName, **kw):
        self.auxname = auxName
        self.auxtable = auxTable
        self.dbhost = dbHost
        self.dbname = dbName
        self.secret = secretKey

    async def load_data(self, **kw):    
        pool = await aiomysql.create_pool(host=dbHost, 
            user='root', password=secretKey, db=auxName)
        async with pool.acquire() as conn:
            async with conn.cursor() as cur:
                sql = "SELECT * FROM "
                sql += str(auxTable)
                sql += " WHERE  expName='"
                sql += str(self.dbname)
                sql += "'"
                await cur.execute(sql)
                row = cur.fetchone()
                await cur.close()
        pool.close()
        await pool.wait_closed()
        self.encoder = str(row.result()[2])
        self.good_encoder = bool(row.result()[3])
        self.lattices = str(row.result()[4])
        self.good_lattices = bool(row.result()[5])
        self.complex = str(row.result()[6])
        self.good_complex = bool(row.result()[7])
        self.verges = str(row.result()[8])
        self.good_verges = bool(row.result()[9])
        self.verges_total = int(row.result()[10])
        self.trains = str(row.result()[11])
        self.good_trains = bool(row.result()[12])
        self.hypotheses = str(row.result()[15])
        self.good_hypotheses = bool(row.result()[16])
        self.type = str(row.result()[17])

    async def create_tables(self, **kw):
        await createMainTables(self)
        pool = await aiomysql.create_pool(host=self.dbhost, user='root', 
            password=self.secret, db=self.auxname)
        async with pool.acquire() as conn:
            async with conn.cursor() as cur:
                sql = "UPDATE "
                sql += str(self.auxtable)
                sql += " SET encoderStatus=1 WHERE dbname='"
                sql += str(self.dbname)
                sql += "'"
                await cur.execute(sql) 
                await conn.commit() 
                await cur.close()
        pool.close()
        await pool.wait_closed()
        raise web.HTTPFound(location='/vkf/' + self.dbname)        

    async def compute_tables(self, **kw):
        await computeAuxTables(self)
        pool = await aiomysql.create_pool(host=self.dbhost, user='root', 
            password=self.secret, db=self.auxname)
        async with pool.acquire() as conn:
            async with conn.cursor() as cur:
                sql = "UPDATE "
                sql += str(self.auxtable)
                sql += " SET complexStatus=1 WHERE dbname='"
                sql += str(self.dbname)
                sql += "'"
                await cur.execute(sql) 
                await conn.commit() 
                await cur.close()
        pool.close()
        await pool.wait_closed()
        raise web.HTTPFound(location='/vkf/' + self.dbname)        

    async def make_induction(self, hypotheses_total, threads_total, **kw):
        await induction(self, hypotheses_total, threads_total)
        pool = await aiomysql.create_pool(host=self.dbhost, user='root', 
            password=self.secret, db=self.auxname)
        async with pool.acquire() as conn:
            async with conn.cursor() as cur:
                sql = "UPDATE "
                sql += str(self.auxtable)
                sql += " SET hypothesesStatus=1 WHERE dbname='"
                sql += str(self.dbname)
                sql += "'"
                await cur.execute(sql) 
                await conn.commit() 
                await cur.close()
        pool.close()
        await pool.wait_closed()
        raise web.HTTPFound(location='/tests/' + self.dbname)        

class Predictor():
    def __init__(self, dbName, data, **kw):
        self.auxname = auxName
        self.auxtable = auxTable
        self.dbhost = dbHost
        self.dbname = dbName
        self.secret = secretKey
        self.plus = 0
        self.minus = 0

    async def load_data(self, **kw):    
        pool = await aiomysql.create_pool(host=dbHost, user='root', 
            password=secretKey, db=auxName)
        async with pool.acquire() as conn:
            async with conn.cursor() as cur:
                sql = "SELECT * FROM "
                sql += str(auxTable)
                sql += " WHERE dbname='"
                sql += str(self.dbname)
                sql += "'"
                await cur.execute(sql) 
                row = cur.fetchone()
                await cur.close()
        pool.close()
        await pool.wait_closed()
        self.encoder = str(row.result()[2])
        self.good_encoder = bool(row.result()[3])
        self.complex = str(row.result()[6])
        self.good_complex = bool(row.result()[7])
        self.verges = str(row.result()[8])
        self.trains = str(row.result()[11])
        self.tests = str(row.result()[13])
        self.good_tests = bool(row.result()[14])
        self.hypotheses = str(row.result()[15])
        self.good_hypotheses = bool(row.result()[16])
        self.type = str(row.result()[17])

    async def make_prediction(self, **kw):
        if self.good_tests and self.good_hypotheses:
            await induction(self, 0, 1)
            await prediction(self)
            message_body = str(self.plus)
            message_body += " correct positive cases. "
            message_body += str(self.minus)
            message_body += " correct negative cases."
            raise web.HTTPException(body=message_body)
        else:
            raise web.HTTPFound(location='/vkf/' + self.dbname)

Again some auxiliary classes are ommited:

再次省略一些辅助类:

  1. The 'User' class corresponds to a site user. It allows user to register and log in as an expert.

    “用户”类对应于站点用户。 它允许用户以专家身份注册和登录。
  2. The 'Expert' class allows exptert to select one of the experiments.

    “专家”类别允许exptert选择其中一项实验。

The remaining classes correspond to the main procedures:

其余的类对应于主要过程:

  1. The 'Experiment' class allows expert to set the names of key and auxiliary tables and parameters necessary for conducting VKF experiments.

    “实验”类允许专家设置进行VKF实验所需的关键和辅助表的名称以及参数。
  2. The 'Solver' class is responsible for inductive generalization in the VKF method.

    “ Solver”类负责VKF方法中的归纳概括。
  3. The 'Predictor' class is responsible for predictions by analogy in the VKF method.

    'Predictor'类通过VKF方法类推负责预测。

It is important to use create_pool() procedure from aiomysql. It creates multiple connections to a database simulteneously. To safe termination of database communication, the system uses insure_future() and gather() procedures from the asyncio module.

使用来自aiomysql的create_pool()过程很重要。 它同时创建到数据库的多个连接。 为了安全地终止数据库通信,系统使用asyncio模块中的insure_future()和collect()过程。

pool = await aiomysql.create_pool(host=self.dbhost, 
    user='root', password=self.secret)
task1 = self.create_db(pool=pool)
task2 = self.register_experiment(pool=pool)
tasks = [asyncio.ensure_future(task1), 
    asyncio.ensure_future(task2)]
await asyncio.gather(*tasks)
pool.close()
await pool.wait_closed()

Construction row = cur.fetchone() returns future, hence row.result() corresponds to record of the table from which field values can be extracted (for example, str(row.result()[2]) extracts the table name with encoding of discrete feature values).

构造row = cur.fetchone()返回future,因此row.result()对应于可以从中提取字段值的表的记录(例如str(row.result()[2])提取表名,其中离散特征值的编码)。

pool = await aiomysql.create_pool(host=dbHost, user='root', 
    password=secretKey, db=auxName)
async with pool.acquire() as conn:
    async with conn.cursor() as cur:
        await cur.execute(sql) 
        row = cur.fetchone()
        await cur.close()
pool.close()
await pool.wait_closed()
self.encoder = str(row.result()[2])

Key system parameters are imported from file '.env' or (if it absent) from file 'settings.py' directly.

关键系统参数是从文件“ .env”导入的,或者(如果不存在)从文件“ settings.py”导入的。

from os.path import isfile
from envparse import env

if isfile('.env'):
    env.read_envfile('.env')

AUX_NAME = env.str('AUX_NAME', default='vkf')
AUTH_TABLE = env.str('AUTH_TABLE', default='users')
AUX_TABLE = env.str('AUX_TABLE', default='experiments')
DB_HOST = env.str('DB_HOST', default='127.0.0.1')
DB_HOST = env.str('DB_PORT', default=3306)
DEBUG = env.bool('DEBUG', default=False)
SECRET_KEY = env.str('SECRET_KEY', default='toor')
SITE_HOST = env.str('HOST', default='127.0.0.1')
SITE_PORT = env.int('PORT', default=8080)

It is important to note that localhost must be specified by ip address, otherwise aiomysql will try to connect to the database via a Unix socket, which may not work under OS Windows.

重要的是要注意,必须通过ip地址指定本地主机,否则aiomysql将尝试通过Unix套接字连接到数据库,该套接字在OS Windows下可能不起作用。

Finally, file 'control.py' has a form:

最后,文件“ control.py”具有以下形式:

import os
import asyncio
import vkf

async def createAuxTables(db_data):
    if  db_data.type is not "discrete":
        await vkf.CAttributes(db_data.verges, db_data.dbname, 
            '127.0.0.1', 'root', db_data.secret)
    if  db_data.type is not "continuous":
        await vkf.DAttributes(db_data.encoder, db_data.dbname, 
            '127.0.0.1', 'root', db_data.secret)
        await vkf.Lattices(db_data.lattices, db_data.dbname, 
            '127.0.0.1', 'root', db_data.secret) 

async def createMainTables(db_data):
    if  db_data.type is "continuous":
        await vkf.CData(db_data.trains, db_data.verges, 
            db_data.dbname, '127.0.0.1', 'root', db_data.secret)
        await vkf.CData(db_data.tests, db_data.verges, 
            db_data.dbname, '127.0.0.1', 'root', db_data.secret)
    if  db_data.type is "discrete":
        await vkf.FCA(db_data.lattices, db_data.encoder, 
            db_data.dbname, '127.0.0.1', 'root', db_data.secret)
        await vkf.DData(db_data.trains, db_data.encoder, 
            db_data.dbname, '127.0.0.1', 'root', db_data.secret)
        await vkf.DData(db_data.tests, db_data.encoder, 
            db_data.dbname, '127.0.0.1', 'root', db_data.secret)
    if  db_data.type is "full":
        await vkf.FCA(db_data.lattices, db_data.encoder, 
            db_data.dbname, '127.0.0.1', 'root', db_data.secret)
        await vkf.FData(db_data.trains, db_data.encoder, db_data.verges, 
            db_data.dbname, '127.0.0.1', 'root', db_data.secret)
        await vkf.FData(db_data.tests, db_data.encoder, db_data.verges, 
            db_data.dbname,'127.0.0.1', 'root', db_data.secret)

async def computeAuxTables(db_data):
    if  db_data.type is not "discrete":
        async with vkf.Join(db_data.trains, db_data.dbname, '127.0.0.1', 
            'root', db_data.secret) as join:
            await join.compute_save(db_data.complex, db_data.dbname, 
                '127.0.0.1', 'root', db_data.secret)
        await vkf.Generator(db_data.complex, db_data.trains, db_data.verges, 
            db_data.dbname, db_data.dbname, db_data.verges_total, 1, 
            '127.0.0.1', 'root', db_data.secret)

async def induction(db_data, hypothesesNumber, threadsNumber):
    if  db_data.type is not "discrete":
        qualifier = await vkf.Qualifier(db_data.verges, 
            db_data.dbname, '127.0.0.1', 'root', db_data.secret)
        beget = await vkf.Beget(db_data.complex, db_data.dbname, 
            '127.0.0.1', 'root', db_data.secret)
    if  db_data.type is not "continuous":
        encoder = await vkf.Encoder(db_data.encoder, db_data.dbname, 
            '127.0.0.1', 'root', db_data.secret)
    async with vkf.Induction() as induction: 
        if  db_data.type is "continuous":
            await induction.load_continuous_hypotheses(qualifier, beget, 
                db_data.trains, db_data.hypotheses, db_data.dbname, 
                '127.0.0.1', 'root', db_data.secret)
        if  db_data.type is "discrete":
            await induction.load_discrete_hypotheses(encoder, 
                db_data.trains, db_data.hypotheses, db_data.dbname, 
                '127.0.0.1', 'root', db_data.secret)
        if  db_data.type is "full":
            await induction.load_full_hypotheses(encoder, qualifier, beget, 
                db_data.trains, db_data.hypotheses, db_data.dbname, 
                '127.0.0.1', 'root', db_data.secret)
        if hypothesesNumber > 0:
            await induction.add_hypotheses(hypothesesNumber, threadsNumber)
            if  db_data.type is "continuous":
                await induction.save_continuous_hypotheses(qualifier, 
                    db_data.hypotheses, db_data.dbname, '127.0.0.1', 'root', 
                    db_data.secret)
            if  db_data.type is "discrete":
                await induction.save_discrete_hypotheses(encoder, 
                    db_data.hypotheses, db_data.dbname, '127.0.0.1', 'root', 
                    db_data.secret)
            if  db_data.type is "full":
                await induction.save_full_hypotheses(encoder, qualifier, 
                    db_data.hypotheses, db_data.dbname, '127.0.0.1', 'root', 
                    db_data.secret)

async def prediction(db_data):
    if  db_data.type is not "discrete":
        qualifier = await vkf.Qualifier(db_data.verges, 
            db_data.dbname, '127.0.0.1', 'root', db_data.secret)
        beget = await vkf.Beget(db_data.complex, db_data.dbname, 
            '127.0.0.1', 'root', db_data.secret)
    if  db_data.type is not "continuous":
        encoder = await vkf.Encoder(db_data.encoder, 
            db_data.dbname, '127.0.0.1', 'root', db_data.secret)
    async with vkf.Induction() as induction: 
        if  db_data.type is "continuous":
            await induction.load_continuous_hypotheses(qualifier, beget, 
                db_data.trains, db_data.hypotheses, db_data.dbname, 
                '127.0.0.1', 'root', db_data.secret)
        if  db_data.type is "discrete":
            await induction.load_discrete_hypotheses(encoder, 
                db_data.trains, db_data.hypotheses, db_data.dbname, 
                '127.0.0.1', 'root', db_data.secret)
        if  db_data.type is "full":
            await induction.load_full_hypotheses(encoder, qualifier, beget, 
                db_data.trains, db_data.hypotheses, db_data.dbname, 
                '127.0.0.1', 'root', db_data.secret)
        if  db_data.type is "continuous":
            async with vkf.TestSample(qualifier, induction, beget, 
                db_data.tests, db_data.dbname, '127.0.0.1', 'root', 
                db_data.secret) as tests:
                #plus = await tests.correct_positive_cases()
                db_data.plus = await tests.correct_positive_cases()
                #minus = await tests.correct_negative_cases()
                db_data.minus = await tests.correct_negative_cases()
        if  db_data.type is "discrete":
            async with vkf.TestSample(encoder, induction, 
                db_data.tests, db_data.dbname, '127.0.0.1', 'root', 
                db_data.secret) as tests:
                #plus = await tests.correct_positive_cases()
                db_data.plus = await tests.correct_positive_cases()
                #minus = await tests.correct_negative_cases()
                db_data.minus = await tests.correct_negative_cases()
        if  db_data.type is "full":
            async with vkf.TestSample(encoder, qualifier, induction, 
                beget, db_data.tests, db_data.dbname, '127.0.0.1', 
                'root', db_data.secret) as tests:
                #plus = await tests.correct_positive_cases()
                db_data.plus = await tests.correct_positive_cases()
                #minus = await tests.correct_negative_cases()
                db_data.minus = await tests.correct_negative_cases()

I retain this file, since here you can see the names of and call order of arguments of the VKF method procedures from the library 'vkf.cpython-36m-x86_64-linux-gnu.so'. All arguments after dbname can be omitted, since the default values in the CPython library are set to standard values.

我保留了此文件,因为在这里您可以从库“ vkf.cpython-36m-x86_64-linux-gnu.so”中看到VKF方法过程的名称和调用顺序。 dbname之后的所有参数都可以省略,因为CPython库中的默认值设置为标准值。

4.一些评论 (4. Some comments )

Anticipating the question of professional programmers about why the logic of controlling the VKF experiment is brought out (through numerous if), and not hidden through polymorphism in types, we should answer this: unfortunately, dynamic typing of the Python language does not allow you to remove the decision about the type of object used, that is, in any case, this sequence of nested if will occur. Therefore, the author preferred to use explicit (C-like) syntax to make the logic as transparent (and efficient) as possible.

预见到专业程序员的问题,为什么控制VKF实验的逻辑为什么会被带出(如果有的话),而不是通过类型的多态性隐藏起来,我们应该回答:不幸的是,Python语言的动态类型不允许您删除有关使用的对象类型的决定,即在任何情况下都将发生此嵌套嵌套序列。 因此,作者更喜欢使用显式(类似于C的)语法,以使逻辑尽可能透明(且高效)。

Let me comment on the missing components:

让我对缺少的组件发表评论:

  1. Now databases for discrete attributes only experiments are prepared through using the library 'vkfencoder.cpython-36m-x86_64-linux-gnu.so' (students make the web interface for it, and the author calls the corresponding methods directly, since he is still working on the localhost). For continuous features, work is underway to incorporate corresponding methods in 'vkfencoder.cpython-36m-x86_64-linux-gnu.so' too.

    现在,通过使用库“ vkfencoder.cpython-36m-x86_64-linux-gnu.so”准备了仅用于离散属性的数据库的实验(学生为此创建了Web界面,作者直接调用了相应的方法,因为他仍然在本地主机上工作)。 对于连续功能,也正在进行将相应方法合并到“ vkfencoder.cpython-36m-x86_64-linux-gnu.so”中的工作。
  2. Hypotheses are currently displayed by third-party MariaDB client programs (the author uses DBeaver 7.1.1 Community, but there are a large number of analogues). Students are developing a prototype system using the Django framework, where ORM will allow to view of hypotheses in a convenient way for experts.

    当前由第三方MariaDB客户端程序显示假设(作者使用DBeaver 7.1.1社区,但有大量类似物)。 学生们正在使用Django框架开发原型系统,ORM将允许专家方便地查看假设。
5.方法的历史 (5. History of the method )

The author has been engaged in data mining for more than 30 years. After graduating Mathematics Department of Lomonosov Moscow State University, he was invited to a group of researchers under the leadership of Professor Victor K. Finn (VINITI SSSR Academy of Science). Viktor K. Finn has been researching plausible reasoning and its formalization by means of multi-valued logics since the early 80's of the last century.

作者从事数据挖掘已超过30年。 莫斯科国立罗蒙诺索夫大学数学系毕业后,他受邀参加由Victor K. Finn教授(VINITI SSSR科学院)领导的一组研究人员。 自上世纪80年代初以来,Viktor K. Finn一直在通过多值逻辑研究合理的推理及其形式化。

The key ideas proposed by V. K. Finn are the following:

VK Finn提出的关键思想如下:

  1. Using the binary similarity operation (originally, the intersection operation in Boolean algebra);

    使用二进制相似性运算(最初是布尔代数中的交集运算);
  2. The idea of rejecting the generated similarity of a group of training examples if it is embedded in an example of the opposite sign (counter-example);

    如果嵌入一组相反符号的示例(反示例)中,则拒绝一组训练示例的生成相似性;
  3. The idea of predicting the target property of test examples by taking into account the arguments for and against;

    通过考虑赞成和反对的论点来预测测试示例的目标属性的想法;
  4. The idea of checking the completeness of a set of hypotheses by finding the reasons (among the generated similarities) for the presence or absence of a target property for every training example.

    通过查找每个训练示例的目标属性存在或不存在的原因(在生成的相似性中)来检查一组假设的完整性的想法。

It should be noted that V. K. Finn attributes some of his ideas to foreign authors. Perhaps only the logic of argumentation is rightfully considered to have been invented by himself. The idea of accounting for counter-examples V.K. Finn borrowed, according to him, from K.R. Popper. The origins of verification of the completeness of inductive generalization were (completely obscure, in my opinion) works of the American mathematician and logician C.S. Peirce. He considers the generation of hypotheses about causes using the similarity operation to be borrowed from the ideas of the British economist, philosopher and logician J.S. Mill. Therefore, he created a set of ideas called «JSM-method» in honor of J.S. Mill.

应当指出,VK Finn将他的某些观点归功于外国作家。 也许只有论辩的逻辑才被认为是他本人发明的。 据他介绍,反例会计的想法VK Finn向KR Popper借来了。 归纳概括的完整性的验证起源(在我看来,是完全模糊的)是美国数学家和逻辑学家CS皮尔斯的作品。 他认为使用相似性运算生成关于原因的假设是从英国经济学家,哲学家和逻辑学家JS Mill的思想中借鉴而来的。 因此,为了纪念JS Mill,他提出了一套名为“ JSM方法”的想法。

Strange, but much more useful ideas of Professor Rudolf Wille (Germany) which are appeared in the late 70-ies of the XX century and form a modern part of algebraic Lattice Theory (so-called Formal Concept Analysis, FCA) are not respected by Professor V.K. Finn. In my opinion, the reason for this is its unfortunate name, hence it is rejected by a person who graduated first from the faculty of philosophy, and then from the Engineer requalification program at Mathematics Department of Lomonosov Moscow State University.

鲁道夫·威尔(Rudolf Wille)教授(德国)提出的奇怪但有用得多的想法,在20世纪70年代后期出现,并形成了代数格理论的现代部分(所谓的形式概念分析,FCA),并未受到尊重。 VK Finn教授。 我认为,这样做的原因是它的名字很不幸,因此被首先从哲学系毕业,然后从罗蒙诺索夫莫斯科国立大学数学系的工程师资格认证课程毕业的人拒绝了。

As a continuation of the work of his teacher, the author named his approach «VKF-method» in his honor. However, in Russian language there is another interpretation — a probabilistic-combinatorial formal ('veroyatnostno kombinatnyi formalnyi') method of Machine Learning based on Lattice Theory.

作为老师工作的延续,作者以他的方式命名为“ VKF方法”。 但是,在俄语中还有另一种解释-基于格理论的机器学习的概率组合形式(“ veroyatnostn​​o kombinatnyiformalnyi”)方法。

Now V. K. Finn's group works at Dorodnitsyn Computing Center of Russian Academy of Sciences and at Intelligent Systems Department of Russian State University for Humanities (RSUH).

现在,VK Finn的小组在俄罗斯科学院Dorodnitsyn计算中心和俄罗斯国立人文大学(RSUH)智能系统系工作。

For more information about the mathematics of the VKF-solver, see dissertations of the author or his video lectures at Ulyanovsk State University (the author is grateful to A. B. Verevkin and N. G. Baranets for organizing lectures and processing their recordings).

有关VKF解算器数学的更多信息,请参见作者的论文或他在Ulyanovsk州立大学的视频讲座 (作者感谢AB Verevkin和NG Baranets组织讲座和处理其录音)。

The full package of source files is stored on Bitbucket.

完整的源文件包存储在Bitbucket上

Source files (in C++) for the vkf library are in the process of being approved for placement on savannah.nongnu.org. If the decision is positive, the download link will be added here.

vkf库的源文件(在C ++中)正在批准中,可放置在savannah.nongnu.org上。 如果决定是肯定的,则将在此处添加下载链接。

Finally, a final note: the author started learning Python on April 6, 2020. Prior to this, the only language he programmed in was C++. But this fact does not absolve him of charges of possible inaccuracy of the code.

最后,最后一点:作者于2020年4月6日开始学习Python。在此之前,他编程的唯一语言是C ++。 但是,这一事实并不能免除他对代码可能不正确的指控。

The author expresses his heartfelt gratitude to Tatyana A. Volkova robofreak for her support, constructive suggestions, and critical comments that made it possible to significantly improve the presentation (and even significantly simplfy the code). However, the author is solely responsible for the remaining errors and decisions made (even against her advice).

作者对Tatyana A. Volkova robofreak的支持,建设性建议和批评性意见表示由衷的感谢, 这些支持,有建设性的建议和批评性评论使我们可以大大改善演示文稿(甚至大大简化代码)。 但是,作者对其余的错误和做出的决定(即使违反她的建议)承担全部责任。

翻译自: https://habr.com/en/post/509480/

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值