Exercise 8: TF/IDF ranking

 Exercise 8 - TF/IDF ranking

DIS 2006/2007

Exercise 8: TF/IDF ranking

In this exercise we'll have a look at how the TF/IDF ranking works.

 

There are 5 different documents in the collection:

  • D1 = "If it walks like a duck and quacks like a duck, it must be a duck."
  • D2 = "Beijing Duck is mostly prized for the thin, crispy duck skin with authentic versions of the dish serving mostly the skin."
  • D3 = "Bugs' ascension to stardom also prompted the Warner animators to recast Daffy Duck as the rabbit's rival, intensely jealous and determined to steal back the spotlight while Bugs remained indifferent to the duck's jealousy, or used it to his advantage. This turned out to be the recipe for the success of the duo."
  • D4 = "6:25 PM 1/7/2007 blog entry: I found this great recipe for Rabbit Braised in Wine on cookingforengineers.com."
  • D5 = "Last week Li has shown you how to make the Sechuan duck. Today we'll be making Chinese dumplings (Jiaozi), a popular dish that I had a chance to try last summer in Beijing. There are many recipies for Jiaozi."

 

Task 1. For the query Q = "Beijing duck recipe", find the two top ranked documents according to the TF/IDF rank. Assume the cosine similarity measure and the culinary term set T = {beijing, dish,duck, rabbit, recipe, roast}. Are the top ranked documents relevant to the query?

 

Task 2. Assume that the author of the document D5 goes on to tell more about her summer trip to China before doing the cooking and uses the word Beijing 3 times, instead of just once. What happens to the rank of D5? How can this be interpreted in the vector retrieval model (vectors and angles between them)? Is this change in the ranking of D5 a desirable property of TF/IDF? Why?

 

Solution

Excel sheet with calculations

 
posted on 2012-09-23 08:12  lexus 阅读( ...) 评论( ...) 编辑 收藏

转载于:https://www.cnblogs.com/lexus/archive/2012/09/23/2698677.html

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值