面向过程用python爬取网站某一小说并以文本形式存储
代码比较简单,过程如下:
1. 导入requests
import requests
2. 模拟浏览器发送HTTP请求,获得小说主页网页源码
novel_url = 'http://www.xs4.cc/book/9/3802/'
response = requests.get(novel_url)
response.encoding = 'utf-8'
html = response.text
3. 利用正则表达式获取每一章节title和url
div = re.findall(r'<DIV class="clearfix dirconone">.*?</div>',html,re.S)[0]
chapter_list = re.findall(r'<a href="(.*?)" title=".*?">(.*?)</a>',div)