参考文献
Komachi M, Kudo T, Shimbo M, et al. Graph-based Analysis of Semantic Drift in Espresso-like Bootstrapping Algorithms.[C]// Conference on Empirical Methods in Natural Language Processing, EMNLP 2008, Proceedings of the Conference, 25-27 October 2008, Honolulu, Hawaii, Usa, A Meeting of Sigdat, A Special Interest Group of the ACL. 2008:1011-1020.
之前总是在论文中看到“语义漂移”,但是都只是提到了这个概念,并没有详细解释,今天在这篇文章里看到了。特记录一下。
原文:
However, it is known that bootstrapping often acquires instances not related to seed instances. For example, consider the task of collecting the names of common tourist sites from web corpora. Given words like “Geneva” and “Bali” as seed instances,bootstrapping would eventually learn generic patterns such as “pictures” and “photos,” which also co-occur with many other unrelated instances. The subsequent iterations would likely acquire frequent words that co-o