- 博客(4)
- 收藏
- 关注
原创 Hadoop- The Definitive Guide 笔记2
PigPig raises the level of abstraction for processing large datasets. With MapReduce, there is a map function and there is a reduce function, and working out how to fit your data processing into this pattern, which often requires multiple MapReduce sta
2010-09-26 15:59:00 3404
原创 Hadoop- The Definitive Guide 笔记
首先我们为什么需要Hadoop?The good news is that Big Data is here. The bad news is that we are struggling to store and analyze it.面对海量的数据,我们需要高效的分析和存储他们,而Hadoop可以做到这点,This, in a nutshell, is what Hadoop provides: a reliable shared storage and analysis system. The sto
2010-09-21 11:12:00 3201
原创 Extracting Information from Text With NLTK
因为现实中的数据多为‘非结构化数据’,比如一般的txt文档,或是‘半结构化数据’,比如html,对于这样的数据需要采用一些技术才能从中提取出有用的信息。如果所有数据都是‘结构化数据’,比如Xml或关系数据库,那么就不需要特别去提取了,可以根据元数据去任意取到你想要的信息。那么就来讨论一下用NLTK来实现文本信息提取的方法,first, the raw text of the document is split into sentences using a sentence segmenter, and ea
2010-09-08 16:58:00 5534 1
原创 Classify Text With NLTK
<br />Classification is the task of choosing the correct class label for a given input.<br />A classifier is called supervised if it is built based on training corpora containing the correct label for each input.<br /> <br />这里就以一个例子来说明怎样用nltk来实现分类器训练和
2010-09-03 18:09:00 9663
空空如也
空空如也
TA创建的收藏夹 TA关注的收藏夹
TA关注的人