IR & NLP & TC
文章平均质量分 70
magicblue
这个作者很懒,什么都没留下…
展开
-
Thinking about a paper "A Refinement Approach to Handling Model Misfit in Text Categorization"
in this paper, i think the most interesting thing is why there is no overfitting? overfitting is inevitable when training on training examples too much. because decision line(surface) fits specifi原创 2006-11-30 19:01:00 · 861 阅读 · 0 评论 -
Rethinking "A refinement..."
a paper "Hierarchically classifying documents using very few words" gives a better explanation about the question why refinement works without overfitting. this paper proposes a new classification met原创 2006-12-31 13:58:00 · 767 阅读 · 0 评论 -
Notes about NLP and IR
NLP introduces First Order Predicate Calculus to implementing meaning of our languages, however this method cant completely express the meaning understanded by ourselves because machine has not perce原创 2007-03-31 22:37:00 · 748 阅读 · 0 评论 -
搜索之路在何方
互联网是一个真正改变人们生活方式的发明,与这种改变相提并论的恐怕得拿出电气化了。互联网最初是一个军方项目,但是在民用化之后,其中的信息量不停的增长。随之而来的问题则是如何在海量的信息中找到你所需要的信息,这很重要。搜索技术的发展时间很短,大概只有十多年。直到今天,搜索技术在本质上仍旧是最开始的关键词匹配。获得高质量的查询结果关键在于你要以机器的方式去思考应该如何写出那几个查询词,如原创 2007-07-24 01:24:00 · 806 阅读 · 0 评论 -
Language model of IR
Traditional IR is divided into 2 parts: index and retrival. Things connect them is words counts: tf and idf. Whatever models traditional IR is(vector, probabilistic...), the core context of repres原创 2007-05-11 11:57:00 · 1151 阅读 · 0 评论