Paramita Mirza, et al. ISWC 2018.
对某些术语不能确定其译名,因此暂用英文。
Couting quantifiers play an important role in question answering or knowledge base curation, but are neglected by prior work. This paper develops the first full-fledged system for extracting counting information from text, called CINEX.
CINEX successfully deals with three challenges:
- non-maximal training seeds due to the incompleteness of knowledge bases;
- sparse and skewed observations in text sources;
- high diverstiy of liguistic patterns.
CINEX architecture is shown in figure 1. CINEX can be divided into two important stages: CQ Recgnition and CQ Consolidation. Firstly, CINEX uses the seeds from WIKIDATA and train two different models to generate CQ candidates. The models are CRF++ with n-gram features and bidirectional LSTM-CRF repectively. Then CINEX consolidates the tokens expressing counting or compositionality information into a single prediction based on mention consolidation with confidence scores and count zero.
Figure 1. Overview of CINEX system.