自然语言处理简介第一章实例代码

最新推荐文章于 2023-02-20 00:30:00 发布

光英的记忆

最新推荐文章于 2023-02-20 00:30:00 发布

阅读量250

点赞数

分类专栏： NLTK

本文链接：https://blog.csdn.net/qq_29678299/article/details/90487490

版权

import nltkimport urllib.request as urllibimport refrom bs4 import BeautifulSoupimport operatorprint("Python and NLTK installed successfully")# urllib2 is use to download the html content ...

摘要由CSDN通过智能技术生成

import nltk
import urllib.request as urllib
import re
from bs4 import BeautifulSoup

import operator


print("Python and NLTK installed successfully")


# urllib2 is use to download the html content of the web link
response = urllib.urlopen('http://python.org/')
# You can read the entire content of a file using read() method
html = response.read()
print(len(html))
print(html)

# 清理HTML标签
tokens = [tok for tok in html.split()]
print("Total no of tokens :"+ str(len(tokens)))
# First 100 tokens
print(tokens[0:100])

# 存在过量的HTML标签和其他无关紧要的字符
tokens = re.split('\\W+', html.decode('utf-8'))
print(len(tokens))
print(tokens[0:100])

# nltk
clean = BeautifulSoup(html, "html5lib").get_text()
# clean will have entire string removing all the html noise
tokens = [tok for tok in clean.split()]
print(len(tokens))
print(tokens[:100])

freq_dis = {}
for tok in tokens:
    if

最低0.47元/天解锁文章

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

光英的记忆

关注关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
自然语言处理简介第一章实例代码

import nltkimport urllib.request as urllibimport refrom bs4 import BeautifulSoupimport operatorprint("Python and NLTK installed successfully")# urllib2 is use to download the html content ...
复制链接

扫一扫