Week4-5The Penn treebank

最新推荐文章于 2019-01-12 14:15:42 发布

zypandora

最新推荐文章于 2019-01-12 14:15:42 发布

阅读量324

点赞数

分类专栏： NLP(Michigan)

本文链接：https://blog.csdn.net/zypandora/article/details/50085159

版权

NLP(Michigan) 专栏收录该内容

45 篇文章 0 订阅

订阅专栏

Description

Background

Early 90’s
developed at University of Pennsylvania
Most cited paper in NLP!!!

Size

40000 training sentences
2400 test sentences

Gerne

Mostly Wall Street journal news stories and some spoken conversations

Importance

Helped launch modern automatic methods

Penn Treebank tagsets

这里写图片描述

Peculiarities

Complementizers
- e.g. “that”
Gaps
- NONE
SBAR

The use of Treebank

Disadvantages

A lot more work to annotate 40+ sentences than to write a grammar

Advantages

-Statistics about different constituents and phenomena
- training and evaluating systems
- multilingual version

Evaluation methodology

Classification tasks

Document retrieval
POS tagging
Parsing

Data split

Training
Dev-test
Test

Baseline

dumb baseline
intelligent baseline
human performance(oracle)

New methods

Evaluation methods

Accuracy
Precision and recall

Multiple references

Interjudge agreement

Kappa

κ = P ( A ) - P ( E ) 1 - P ( E )

$\kappa = \frac{P(A) - P(E)}{1-P(E)}$

Agreement vs. expected agreement
-P(A) is the level of agreements of the judges
- P(E) is the expected probability of agreement by chance

Parsing evaluations

Precision and recall
Labeled precision and recall
F1 score
Crossing brackets

zypandora

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Week4-5The Penn treebank

DescriptionBackgroundEarly 90’sdeveloped at University of PennsylvaniaMost cited paper in NLP!!!Size40000 training sentences2400 test sentencesGerneMostly Wall Street journal news stories and s
复制链接

扫一扫