How to create your own Question-Answering system easily with python

本文介绍了cdQA-suite,这是一个用于构建封闭领域问答系统(CDQA)的端到端软件套件。cdQA由Retriever和Reader两部分组成,可以利用预训练的深度学习模型如BERT进行答案检索。此外,文章还提到了cdQA-annotator工具用于数据注释,以及cdQA-ui作为用户界面,便于与网站集成。
摘要由CSDN通过智能技术生成

The history of Machine Comprehension (MC) has its origins along with the birth of first concepts in Artificial Intelligence (AI). The brilliant Allan Turing proposed in his famous article “Computing Machinery and Intelligence” what is now called the Turing test as a criterion of intelligence. Almost 70 years later, Question Answering (QA), a sub-domain of MC, is still one of the most difficult tasks in AI.

However, since last year, the field of Natural Language Processing (NLP) has experienced a fast evolution thanks to the development in Deep Learning research and the advent of Transfer Learning techniques. Powerful pre-trained NLP models such as OpenAI-GPT, ELMo, BERT and XLNet have been made available by the best researchers of the domain.

With such progress, several improved systems and applications to NLP tasks are expected to come out. One of such systems is the cdQA-suite, a package developed by some colleagues and me in a partnership between Telecom ParisTech, a French engineering school, and BNP Paribas Personal Finance, a European leader in financing for individuals.

Open-domain QA vs. closed-domain QA

When we think about QA systems we should be aware of two different kinds of systems: open-domain QA (ODQA) systems and closed-domain QA (CDQA) systems.

  • Open-domain systems deal with questions about nearly anything, and can only rely on general ontologies and world knowledge. One example of such a system is DrQA, an ODQA developed by Facebook Research that uses a large base of articles from Wikipedia as its source of knowledge. As these documents are related to several different topics and subjects we can understand why this system is considered an ODQA.

  • On the other hand, closed-domain systems deal with questions under a specific domain (for example, medicine or automotive maintenance), and can exploit domain-specific knowledge by using a model that is fitted to a unique-domain database. The cdQA-suite was built to enable anyone who wants to build a closed-domain QA system easily.

cdQA-suite

cdQA
An End-To-End Closed Domain Question Answering System. - cdQA
github.com

The cdQA-suite is comprised of three blocks:

  • cdQA: an easy-to-use python package to implement a QA pipeline
  • cdQA-annotator: a tool built to facilitate the annotation of question-answering datasets for model evaluation and fine-tuning
  • cdQA-ui: a user-interface that can be coupled to any website and can be connected to the back-end system.

I will explain how each module works and how you can use it to build your QA system on your own data.

cdQA

The cdQA architecture is based on two main components: the Retriever and the Reader. You can see below a schema of the system mechanism.

Mechanism of cdQA pipeline
在这里插入图片描述
When a question is sent to the system, the Retriever selects a list of documents in the database that are the most likely to contain the answer. It is based on the same retriever of DrQA, which creates TF-IDF features based on uni-grams and bi-grams and compute the cosine similarity betwee

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值