创建一个学生管理系统数据库_建立一个问答系统

最新推荐文章于 2023-10-23 11:14:08 发布

weixin_26632361

最新推荐文章于 2023-10-23 11:14:08 发布

阅读量1.7k

点赞数

文章标签：数据库 mysql python sql java

原文链接：https://medium.com/swlh/building-a-question-answering-system-bb49c320140c

版权

创建一个学生管理系统数据库

Over three months we had the chance to design and implement a question answering project with Serviceware SE.Question Answering is a common task in natural language processing („NLP“) — a subfield of machine learning — in which the system processes a question and retrieves an answer for the user.

在三个月的时间里，我们有机会使用Serviceware SE设计和实施了一个问题解答项目。问题解答是自然语言处理(NLP)(机器学习的一个子领域)的一项常见任务，系统在其中处理问题并检索给用户的答案。

The idea is that the user does not need to search for his answer in a long text or catalogue of articles („dataset“), as he would with a normal search engine, but rather receives a preferably short answer to his question. This is a great opportunity for customers as well as companies: Customers safe time not contacting call centres by servicing themselves and companies can focus their efforts on more sophisticated requests.

这样的想法是，用户不需要像普通搜索引擎那样在长文本或商品目录(“数据集”)中搜索其答案，而是可以收到对其问题最好的简短答案。对于客户和公司而言，这都是一个巨大的机会：客户可以在不浪费时间通过服务自己与呼叫中心联系，公司可以将精力集中在更复杂的请求上。

This field of NLP is subject to extensive research and over the past few years, massive progress has been made. Recent models from Google — like BERT — exceed human-level precision in answering questions, when trained properly.But still: It is far from being solved. For example, a shortcoming of Google’s solution is, that it only works on single paragraphs. Only if a paragraph is known in which the answer likely is, the model can answer the question.

NLP的这一领域需要广泛的研究，并且在过去的几年中，已经取得了巨大的进步。经过正确培训，Google的最新模型(例如BERT)在回答问题方面超出了人类水平的精确度。但仍然：远没有解决。例如，Google解决方案的一个缺点是，它仅适用于单个段落。仅当知道可能答案的段落时，模型才能回答问题。

To extend this approach we built a question answering system, which can not only answer a question on a single paragraph, but on a whole dataset. Because this should only work on a specific domain, for example a documentation database, it is called a closed domain question answering system.

为了扩展这种方法，我们构建了一个问答系统，该系统不仅可以在单个段落上回答问题，而且可以在整个数据集上回答问题。因为这仅适用于特定领域，例如文档数据库，所以它被称为封闭域问答系统。

它是如何工作的 (How does it work)

But how does it work? We divided our application into three parts:

但是它如何工作？我们将应用程序分为三个部分：

First we compare our question to each and every document in the used dataset and determine the 20 most likely documents to contain the answer.Secondly we put all the paragraphs of these 20 documents together and, again, comparing those with the question and determine 20 paragraphs to likely contain the answer.

首先，我们将问题与使用的数据集中的每个文档进行比较，并确定包含答案的20个最有可能的文档;其次，我们将这20个文档的所有段落放在一起，然后再次与问题进行比较，确定20个段落可能包含答案。

Lastly we take BERT (the model by google) and try to extract an answer for every paragraph and select the answer in which the confidence score is highest — the answer where the model is most certain to have found the right answer.The first two steps are called retrieval and the last one reading.

最后，我们采用BERT(由Google提供的模型)，并尝试为每个段落提取答案，然后选择置信度得分最高的答案-模型最确定能找到正确答案的答案。前两个步骤称为检索和最后阅读。

For comparing documents or paragraphs to our question we build vectors on our question as well as every paragraph and document. Then we put every vector in a large vector room and determine the similarity between the vectors and the question vector by calculating the angle between those vectors via the cosine similarity.

为了将文档或段落与我们的问题进行比较，我们在问题以及每个段落和文档上构建向量。然后，我们将每个向量放在一个大向量空间中，并通过余弦相似度计算这些向量之间的夹角，从而确定向量与问题向量之间的相似度。

但是为什么这样做呢？ (But why does this work?)

Because these vectors and their direction represent different features: They can indicate which words or which meaning/content a given text has. For example one direction can mean baseball and another natural language processing. „Machine learning“, for example, would definitely be more similar to „NLP“ than „baseball“.

因为这些向量及其方向表示不同的特征：它们可以指示给定文本具有的单词或含义/内容。例如，一个方向可以表示棒球，而另一种自然语言则可以。例如，“机器学习”肯定比“棒球”更类似于“ NLP”。

We choose those documents/paragraphs for the next step which have the highest similarity.

我们选择下一步具有最高相似性的文档/段落。

For the reading part we took a pre-trained multi-lingual model by google and trained it to answer questions on the english wikipedia. As previous studies have shown the cross learning effects to other languages are significant. So we can answer questions in multiple languages.

对于阅读部分，我们采用了谷歌预先训练的多语言模型，并对其进行了训练，可以回答英语维基百科上的问题。如先前的研究表明，与其他语言的交叉学习效果显着。因此，我们可以用多种语言回答问题。

结论 (Conclusion)

As a result we can answer questions on a given database in multiple languages. The results were quite promising. It works more accurate and is even faster than existing solutions (like cdQA by PNB Paribas).But while delivering promising results, our solution faces some challenges. In a customer facing situation one had to reduce the number of false positives to (almost) zero, because giving a false answer to a customer could have economical importance.

结果，我们可以用多种语言回答给定数据库上的问题。结果是很有希望的。它的工作原理比现有解决方案(例如PNB Paribas的cdQA)更准确，甚至更快，但是在提供可喜的结果的同时，我们的解决方案也面临一些挑战。在面对客户的情况下，必须将误报的数量减少到(几乎)零，因为给客户一个错误的答案可能具有经济重要性。

To accomplish this, there are different approaches. For example:- Intern beta testing, to see which answers get asked in support,- pre screening, to let the customer see only the right answers,- using the model only monolingual,- and of course training on questions specific to the given datasets.

为此，有不同的方法。例如：-实习生beta测试，以查看支持人员询问的答案，-预先筛选，让客户仅看到正确的答案，-仅使用单一语言的模型，以及当然针对给定数据集的问题进行培训。

It is an exciting development and promising path, that we, thanks to Serviceware and TU Darmstadt, were able to explore. We were astonished how far we could get in merely three months. Probably because of the good support of Serviceware and their amazing team.

这是一个令人兴奋的发展和充满希望的道路，我们借助Serviceware和TU Darmstadt得以探索。我们惊讶地发现，在短短三个月内我们能走多远。可能是由于Serviceware及其出色团队的良好支持。

A special thanks to Niklas and Adrian, as well as Luisa.

特别感谢Niklas和Adrian以及Luisa。

And of course to Ji Yune Whang, Joshua Bodemann, Marcel Nawrath, Sebastian Marcus Meier and Wladislav Miretski — we really have been a great team.

当然，对于Ji Yune Whang，Joshua Bodemann，Marcel Nawrath，Sebastian Marcus Meier和Wladislav Miretski来说，我们确实是一支很棒的团队。