20200205
调用BeautifulSoup和requests库爬取网页内容
import requests
from bs4 import BeautifulSoup
def crawle():
url = 'https://www.kanunu8.com/book3/7474/'
req = requests.get(url=url)
req.encoding = 'gbk'
html = req.text
bf_1 = BeautifulSoup(html, 'lxml')
content_url = bf_1.find_all('tr', align='center')
for center in content_url:
print(content_url)
if __name__ == '__main__':
crawle()
在网上看了个有意思的案例,爬取小说内容,于是自己就找了个网站试试。
谷歌浏览器摁F12,查看网页结构,发现小说内容存储在这个部分。然后改一下find_all的参数就行。