coursera上的公开课《https://www.coursera.org/course/textanalytics》系列,讲的非常不错哦。
1、两种关系:Paradigmatic vs. Syntagmatic(聚合和组合)
• Paradigmatic: A & B have paradigmatic relation if they can be substituted for each other (i.e., A & B are in the same class)
– E.g., “cat” and “dog”; “Monday” and “Tuesday” (聚合:同一类别的,high similar context)
• Syntagmatic: A & B have syntagmatic relation if they can be combined with each other (i.e., A & B are related semantically)
– E.g., “cat” and “sit”; “car” and “drive”(组合:常在一起出现的,high correlated occurrences but relatively low individual occurrences)
2、挖掘Paradigmatic(聚合)关系:
2.1、如何挖掘两个词(例如dog和cat)的聚合关系强不强?
因为聚合关系本质上反映的是context similarity,所以我们可以首先获取所有文档中出现dog、cat的句子的context,dog左边一个词的context、dog右边一个词的context,例如:Left1(“cat”) = {“my”, “his”, “big”, “a”, “the”,…},Right1(“cat”) = {“eats”, “ate”, “is”, “has”, ….},Window(“cat”) = {“my”, “his”, “big”, “eats”, “fish”, …};同理可获得