爬取小说

最新推荐文章于 2024-05-02 22:24:53 发布

老肥码码码

最新推荐文章于 2024-05-02 22:24:53 发布

阅读量275

点赞数

分类专栏： Python

本文链接：https://blog.csdn.net/lyc44813418/article/details/80581182

版权

Python 专栏收录该内容

61 篇文章 5 订阅

订阅专栏

应用到urllib库和re正则表达式实现小说爬取的功能

from urllib import request
import re

first_url="http://www.freexs.org/novel/0/896/"
html=request.urlopen(first_url).read().decode('gbk')
novel_info={}
novel_info['title']=re.findall(r'<meta name="keywords" content=(.*?)>',html)
#print(novel_info['title'][0])
div_info=re.findall(r'<table width="100%"><tr><td><dl>(.*?)</div>',html)
tag_a=re.findall(r'<dd><a(.*?)</a></dd>',div_info[0])
for i in range(0,len(tag_a)):
    second_url=re.findall(r'href="(.*?)">',tag_a[i])[0]
    #print(second_url)
    url="%s%s"%(first_url,second_url)
    #print(url)
    html2=request.urlopen(url).read().decode('gbk')
    print(html2)

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

老肥码码码

关注关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
爬取小说

应用到urllib库和re正则表达式实现小说爬取的功能from urllib import requestimport refirst_url="http://www.freexs.org/novel/0/896/"html=request.urlopen(first_url).read().decode('gbk')novel_info={}novel_info['title'...
复制链接

扫一扫