目的:
- 使用python爬取小说的书名和作者
需要使用到的类:
如下:
- UserAgent
说明:用来自动生成User-Agent
- requests
说明:发起资源请求
- etree
说明:选取节点
使用的技术:
- python
- lxml
代码:
from fake_useragent import UserAgent
import requests
#引用etree时,在pycharm中提示错误是正常的
from lxml import etree
#URL
url = 'https://m.qidian.com/rank/readindex/'
#User-Agent
headers = {
"User-Agent":UserAgent().chrome
}
#发送请求
resp = requests.get(url,headers=headers)
#解析HTML
e = etree.HTML(resp.text)
#得到书名
book_names = e.xpath('//div/div/h2/text()')
'''
使用xpath-helper工具,xpath的使用
https://www.runoob.com/xpath/xpath-syntax.html
举例:
//div/ol/li/a/h2/text()
//div/div/h2/text()
//div/div/span[@class="book-author"]/text()
//li/a/div//div/div/span[1]/text()
'''
book_authors1 = e.xpath('//div/div/span[@class="book-author"][1]/text()')
#去掉多余的换行
book_authors2 = []
for author in book_authors1:
if author != '\n ':
book_authors2.append(author)
#打印输出书名和作者
for name ,author in zip(book_names,book_authors2):
print(name, ":", author)