Abstract
“RAG: Retrieval-Augmented Generation, by incorporating external knowledge with parametric memory of language models, has become the state-of-the-art architecture for open domain QA tasks.”
[However,]
common retrieved knowledge bases are inherently constrained by limited coverage and noisy information, making retrieved-based approaches inadequate to answer implicit reasoning questions.
[So, in this paper,]
the authors propose an Induction-Augmented Generation (IAG) framework.
This framework utilizes inductive knowledge along with the retrieved documents for implicit reasoning.
- the authors leverage large language models for deriving such knowledge via a novel prompting method based on inductive reasoning patterns. [key words: prompting method; inductive reasoning]
- On top of the mentioned method, the authors implement two versions of IAG named IAG-GPT and IAG-student.[Key words: IAG-GPT; IAG-student]
- IAG-GPT, directly uses the knowledge generated by GPT-3 for answer prediction; while IAG-student gets rid of dependencies on GPT service at inference time by incorporating a student inductor model.[What does the ‘Inductor model’ look like?; How gets rid of?]
- The inductor is firstly trained via knowledge distillation and further optimized by back-propagating the generator feedback via differentiable beam scores.[What does this mean???]
Experimental results show that IAG outperforms RAG baselines as well as ChatGPT on two open-Domain QA tasks.[Dataset: on 2 open-Domain QA tasks]
The authors best models have won the first place in the official leaderboards of CSQA2.0(since Nov 1, 2022) and StrategyQA(since Jan 8, 2023).
Introduction
[1]
Open-Domain Question Answering (ODQA) has attracted increasing research attention. Compared with the closed-domain setting, the ODQA can empower search engines with the ability to respond to a wider range of user queries.
As a typical knowledge-intensive task, ODQA has been extensively studied within the scope of information retrieval, where access to external knowledge sources such as web pages or knowledge bases is required. [It means to retrieve external knowledge from the additional materials like Google pages.]
Another line of research exploits large language models such as GPT-3 and PaLM as the knowledge source and develops various prompting methods to elicit knowledge from LLMs that is implicitly encoded in the parameters.[It means we use the LLMs as the knowldge retriever to do the external knowledge searching.]
[However, ]
to answer reasoning questions, i.e., questions that demand some degree of reasoning ability, either retrieval-based or prompting-based approaches suffer from their intrinsic limitations.
[2]
On the one hand, although RAG has become the SOTA architecture for ODQA( here actually I think it’s not a specific architecture, it’s a method, a kind of technic), documents retrieved from common knowledge bases generally suffer from the limitations of constrained coverage and noisy information(This sentence I think it is repeating with the former contents, we should write in various expressions.), especially for implicit reasoning questions whose answers are not well documented.
For example,
It’s trivial for an average child to answer questions such as “Can you catch a jellyfish in the dumpster?”
[However,]
it’s unlikely to find the answer directly from the web or books.
As the authors shown in Figure.1, the retrieved documents can hardly answer the question in such cases.
[Hence,]
relying entirely on information retrieval is insufficient to solve implicit reasoning questions.
[3]
On the other hand, prompting-based methods can exploit the condierable amount of knowledge encoded in the parameters of LLMs for QA tasks.
[But,]
the problem of hallucination imposes limitations on LLMs in terms of factuality and credibility.
[So,]
To better control the knowledge elicited from LLMs, various prompting methods such as chain-of-thought(CoT) have been proposed by constructing intermediate reasoning steps until arriving at the conclusion.
[However,]
capability of the LLMs is intrinsically constrained by its parameter size [The most most most important limitation of the prompting method!!!], making it unable to respond correctly to domain-specific questions beyond the scope of its training corpus.
[4]
[In view of the above challenges,]
It requires a new paradigm for building models applicable to reasoning QA.
…
IAG, (the authors method) enhances the conventional RAG with an inductor that generates inductive knowledge w.r.t each question.
To derive such knowledge,
the authors propose a novel prompting method, which is intuitively inspired by the cognitive functions of inductive reasoning, to elicit inductive knowledge from an LLM (i.e. GPT-3) [ actually here uses the help from the external LLM like GPT]
The authors’ first IAG model: IAG-GPT, directly leverages the knowledge statements sampled from GPT-3 as evidence alongside the retrieved documents to feed into the answer generator.[It means to get the concluded knowledge from the GPT and then fusion it with the retrieved documents. But, how does the retrieved documents come from?]
The authors show that IAG-GPT improves over SOTA models on multiple ODQA datasets and has won the first place in the official leaderboards of CSQA2.0 and StrategyQA.
The authors’ IAG-Student, the second variant that gets rid of dependencies on GPT service during inference, by training a student inductor model following a two-step optimization scheme.
[Specifically,]
The model is firstly warmed up through distillation by using the GPT-3 statements as pseudo labels, and then further optimized with a novel TAILBACK approach:
GradienT bAck-propagation dIfferentiabLe BeAm sCores feedbacK.
TAILBACK implements a differentiable beam search algorithm for the inductor and allows the feedback from the generator to be back-propagated to the inductor via beam scores.[How???What does this method mean??]
[5]
Contributions
Related Work
[1] Open-Domain Reasoning QA.
[2] Prompting-Based Reasoning.
3.1 Overview
… In this paper, " the inductor takes the question as input and outputs knowledge statements in the form of inductive reasoning." These statements, together with the retrieved documents, are used as supporting evidence to feed into the generator.
Section 3.2 introduces the inductive prompting method used for enhancing the factuality of the knowledge elicited from LLMs.
Section 3.3 presents the implementations of the proposed IAG framework, IAG-GPT and the IAG-Student.
3.2 Knowledge Elicitation via Inductive Prompting
…
To enhance the credibility of the statements generated by LLMs, the authors propose a prompting method that is intuitively inspired by the idea of inductive reasoning. [Key Words: inductive reasoning]
Inductive reasoning is a method of logical thinking that draws general conclusions from specific observations, during which analogy and genealization are two fundamental cognitive tools. [Key Words: analogy; generalization]
Consider the question shown in Figure.1
Appendix A -Prompting Template
The authors use two prompting templates.
The template used for inductive prompting is presented in Table 5. It consists of 5 demonstrations constructed based on inductive reasoning, appended by the question of interest.
The author also present in Table 6 the trivial prompting template that is used in Section 5.2.[this tool is used to do Contrast Experiment].
3.3 IAG Implementations
3.3.1 IAG-GPT
In this paper, the authors clarify that: “For IAG-GPT, the function of induction is fully delegated to the GPT-3 service API.”
In the authors’ implementations, they propose to enhance the validity of the result by aggregating multiple knowledge statements.
[However,] “instead of explicitly voting on multiple results, we let the generator implicitly reason over the collection of all sampled knowledge statements.”
[Here comes some ways to improve the knowledge! Maybe a good direction to try!!!]
[where does the ‘Generator’ come from??? What is Generator???]
3.3.2 IAG-Student (The most hard part!)
According to the authors, “To get rid of the dependencies on GPT-3 during inference, we replace GPT-3 with a student inductor model (we refer it as inductor for brevity when there’s no confusion).”
[Step 1: Distillation]
For each question-answer pair
(
q
,
a
∗
)
(q, a^*)
(q,a∗) in the training set, the authors sample
N
N
N different knowledge statements from GPT-3 using inductive prompting described in Section 3.2.
The generated statements for the question are denoted as
K
=
{
K
n
}
n
=
1
N
K=\{K_n\}_{n=1}^N
K={Kn}n=1N.
Besides, each question-answer pair is accompanied by the top-
M
M
M documents ranked by the retriever, represented as
R
=
{
R
m
}
m
=
1
M
R=\{R_m\}_{m=1}^M
R={Rm}m=1M.
The authors say: “Instead of directly supervising the inductor using all the knowledge statements generated by GPT-3, we claim that different statements should be distinguished according to their respective confidence during distillation.” The confidence of each statement can be measured by the probability of predicting the ground-truth answer when used as supproting evidence for generator.
[Firstly,]
[Step2: TAILBACK]