统计词频

最新推荐文章于 2024-05-11 19:27:04 发布

番茄要去皮

最新推荐文章于 2024-05-11 19:27:04 发布

阅读量8.9k

点赞数 1

分类专栏：文本分类文章标签：统计词频机器学习

本文链接：https://blog.csdn.net/weixin_44766179/article/details/90147506

版权

统计词频

英文文本词频统计

import re
from nltk.stem.wordnet import WordNetLemmatizer
from nltk.tokenize import word_tokenize

# 读取数据
def get_data(file_path):
    with open(file_path, 'r', encoding='utf-8', errors='ignore') as file:
        text = file.read().strip()
    return text

# 英文缩写替换
def replace_abbreviations(text):
    text = text.lower().replace("it's", "it is").replace("i'm", "i am").replace("he's", "he is").replace("she's", "she is")\
        .replace("we're", "we are").replace("they're", "they are").replace("you're", "you are").replace("that's", "that is")\
        .replace("this's", "this is").replace("can't", "can not").replace("don't", "do not").replace("doesn't", "does not")\
        .replace("we've", "we have").replace("i've", " i have").replace("isn't", "is not").replace("won't", "will not")\
        .replace("hasn't", "has not").replace("wasn't", "was not").replace("weren't", "were not").replace("let's", "let us")\
        .replace("didn't", "did not").replace("hadn't", "had not").replace("waht's", "what is").replace("couldn't", "could not")\
        .replace("you'll", "you will").re

最低0.47元/天解锁文章

番茄要去皮

关注

1
点赞
踩
6

收藏

觉得还不错? 一键收藏
0
评论
统计词频

统计词频1、方法1import jiebaimport retext = ['今晚19：30《天下足球》直播互动话题：国家德比，巴萨取胜的关键之处？欢迎积极留言，我们将选择您的精彩留言与全国观众分享。', '德甲前四捉对厮杀，“罗贝里”复活拜仁大胜、门兴多特平分秋色。', '今晚《天下足球》19：30，直播内容：专题《欧洲杯豪门恩怨》；专题《名人堂：苏格拉...
复制链接

扫一扫