python抓取中文页面并查找关键字页面

最新推荐文章于 2023-03-26 00:51:13 发布

pandaPwn

最新推荐文章于 2023-03-26 00:51:13 发布

阅读量3.6k

点赞数 2

本文链接：https://blog.csdn.net/ZHUJIANWEILI4/article/details/87344887

版权

//首先，设置文件字符编码格式为UTF-8

#coding: UTF-8

import urllib
import chardet

total_cnt=29
target_str="顺受"

//判断目标字符串编码格式
print chardet.detect(target_str)

for i in range(1, total_cnt+1):

content=urllib.urlopen("http://bbs.tianya.cn/post-16-998474-%s.shtml" % (str(i))).read()

判断网页的编码格式，如果不一样要进行编码格式的转换

#print chardet.detect(content)
if target_str in content:
print "find target url, ind is %d" % (i)
else:
print "still not found"

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

关注关注