Python分词统计

最新推荐文章于 2024-06-11 11:51:15 发布

csdn_moming

最新推荐文章于 2024-06-11 11:51:15 发布

阅读量2.2k

点赞数 3

分类专栏： Python Cryptography 文章标签： python utf-8 windows 分词统计

本文链接：https://blog.csdn.net/csdn_moming/article/details/50583566

版权

利用Python切片处理文本非常方便，下面是一个简单的例子，进行分词统计
（需要读取的文件为utf-8编码，运行环境为Windows，版本为python3）

# -*- coding: utf-8 -*-
import re
import os

Total = 0; #总字母数
words = []

#获取所有单词
readfile = open('Data.txt', encoding = 'utf-8')

for line in readfile.readlines():
    lineArr = line.strip().split()
    for word in lineArr:
        data = re.findall(r'[a-zA-Z]*', word)
        for

最低0.47元/天解锁文章

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

csdn_moming

关注关注

3
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
Python分词统计

利用Python切片处理文本非常方便，下面是一个简单的例子，进行分词统计（需要读取的文件为utf-8编码，运行环境为Windows，版本为python3）# -*- coding: utf-8 -*-import reimport osTotal = 0; #总字母数words = []#获取所有单词readfile = open('Data.txt', encoding = 'utf-
复制链接

扫一扫