Building decision trees to identify the intent

数据:自于日志,人工进行分类。
Feature Analysis
1.Number of terms in the query (nterms),
2.Number of clicks in query sessions(nclicks),a significant number of navigational queries concentrates only a few clicks per session --》导航类
3.Levenshtein distance:distance function calculated among the terms that   compose the query and the snippets (the snippet is compounded by the excerpt presented with the query result, the title and the URL of the selected  document)其实就是编辑距离,计算的是查询和返回片段之间
4.Number of sessions with less than n clicks over the total of sessions associated   to a query (nCS) 针对一个query所有session中点击 少于nclick 的session个数
5.Number of clicks before the n-th position of the query ranking (nRS)  q的Session中是点击了前面n个结果的Session的比例
6.pagerank  每个分类中文档的PageRank统计

结论:
1.navigational queries generally have fewer   terms than the informational queries,The behavior of this characteristic is not   as clear for the transactional class
2. some informational   queries register more than 9 different sites / pages selected in their sessions,This   usually does not occur in the case of navigational or transactional queries
3. Levenshtein distance calculated between query   terms and snippets is less in the case of navigational queries than for the other  categories
4.a good amount of informational queries   register clicks in pages / sites with low Page Rank, as opposed to transactional  or navigational queries
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值