We introduce the LAMA (LAnguage Model Analysis) probe to test the factual and commonsense
knowledge in language models:
It provides a setof knowledge sources which are composed of a corpus of facts. Facts are either
subject-relationobject triples or
question-answer pairs.
We evaluate each model based on how highly it ranks the ground truth token against every other word in a fifixed candidate vocabulary.
assumption:models which rank ground truth tokens high for these cloze statements have more factual knowledge.
4.1 Knowledge Sources
we cover a variety of sources of factual and commonsense knowledge. For each source, we describe the origin of fact triples (or question answer pairs), how we transform them into cloze
templates, and to what extent aligned texts exist in Wikipedia that are known to express a particular fact. We use the latter information in supervised baselines that extract knowledge representations directly from the aligned text.
4.2 Models
4.3 Baselines
freq :For a subject and relation pair......
re:For the relation-based knowledge source......
drqa:for open-domain question answering......
4.4 Metrics
We consider rank-based metrics and compute results per relation along with mean values across all relations. To account for multiple valid objects for a subject-relation pair (i.e., for N-Mrelations), we follow Bordes et al. (2013) and remove from the candidates when ranking at test time all other valid objects in the training data other than the one we test. We use the mean precision at k (P@k). For a given fact, this value is 1 if the object is ranked among the top k results, and 0 otherwise.