3h: 中文
前处理4:断句
前处理8:标点清洗
前处理12:停用词清洗
下面测试过了: 过程步骤是0,1,2,3
http://blog.csdn.net/pipisorry/article/details/25909899 re API文档
http://www.cnblogs.com/huxi/archive/2010/07/04/1771073.html
前处理4:断句
前处理8:标点清洗
前处理12:停用词清洗
下面测试过了: 过程步骤是0,1,2,3
http://blog.csdn.net/pipisorry/article/details/25909899 re API文档
http://www.cnblogs.com/huxi/archive/2010/07/04/1771073.html
http://www.cnblogs.com/NeilHappy/archive/2012/07/20/2600111.html 易错点 python
# encoding: UTF-8
import re
fileBefPro=open('E:\\dataMining\\data.txt')
fileAftPro=open('E:\\dataMining\\after.txt','a')
iter_f=iter(fileBefPro)
for line in iter_f:#读一行就操作一行
#在这里进