自定义博客皮肤VIP专享

*博客头图:

格式为PNG、JPG,宽度*高度大于1920*100像素,不超过2MB,主视觉建议放在右侧,请参照线上博客头图

请上传大于1920*100像素的图片!

博客底图:

图片格式为PNG、JPG,不超过1MB,可上下左右平铺至整个背景

栏目图:

图片格式为PNG、JPG,图片宽度*高度为300*38像素,不超过0.5MB

主标题颜色:

RGB颜色,例如:#AFAFAF

Hover:

RGB颜色,例如:#AFAFAF

副标题颜色:

RGB颜色,例如:#AFAFAF

自定义博客皮肤

-+
  • 博客(4)
  • 收藏
  • 关注

原创 Hadoop- The Definitive Guide 笔记2

PigPig raises the level of abstraction for processing large datasets. With MapReduce, there is a map function and there is a reduce function, and working out how to fit your data processing into this pattern, which often requires multiple MapReduce sta

2010-09-26 15:59:00 3318

原创 Hadoop- The Definitive Guide 笔记

首先我们为什么需要Hadoop?The good news is that Big Data is here. The bad news is that we are struggling to store and analyze it.面对海量的数据,我们需要高效的分析和存储他们,而Hadoop可以做到这点,This, in a nutshell, is what Hadoop provides: a reliable shared storage and analysis system. The sto

2010-09-21 11:12:00 3156

原创 Extracting Information from Text With NLTK

因为现实中的数据多为‘非结构化数据’,比如一般的txt文档,或是‘半结构化数据’,比如html,对于这样的数据需要采用一些技术才能从中提取出有用的信息。如果所有数据都是‘结构化数据’,比如Xml或关系数据库,那么就不需要特别去提取了,可以根据元数据去任意取到你想要的信息。那么就来讨论一下用NLTK来实现文本信息提取的方法,first, the raw text of the document is split into sentences using a sentence segmenter, and ea

2010-09-08 16:58:00 5485 1

原创 Classify Text With NLTK

<br />Classification is the task of choosing the correct class label for a given input.<br />A classifier is called supervised if it is built based on training corpora containing the correct label for each input.<br /> <br />这里就以一个例子来说明怎样用nltk来实现分类器训练和

2010-09-03 18:09:00 9622

空空如也

空空如也

TA创建的收藏夹 TA关注的收藏夹

TA关注的人

提示
确定要删除当前文章?
取消 删除