任务描述:
使用requests以及XPath提取景点以及网址和相应景点的评论信息,并保存为txt以及csv文件。
任务实现:
import requests
from lxml import etree
import pandas as pd
url='https://travel.qunar.com/search/place/23-shandong-298984/4-----0/1'
headers={'user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36'}
r=requests.get(url,headers=headers)
r.encoding='utf-8'
print(r.status_code)
r=r.text
tree=etree.HTML(r)
x=tree.xpath('//div[@class="right_bar"]/ul//li')
#print(x)
y=tree.xpath('//div[@class="right_bar"]/ul/li/div[2]/h2/a/@href')
#print(y)
for a in y:
rep=requests.get(a)
rep.encoding='utf-8'
rep=rep.text
trees=etree.HTML(rep)
p=trees.xpath('//*[@id="gs"]/div[1]//p/text()')
for page in range(0,4):
page=str(page)
data={
'poiList':'true',
'sortField':'1',
'rank':'0',
'pageSize':'5',
'page':page
}
response = requests.post(url, data=data, headers=headers)
q=trees.xpath('//*[@id="comment_box"]//li//div[1]//div//div[3]//p[1]/text()')
file = "网址:"+a+",景点信息:"+str(p)+",评论信息:"+str(q)
with open('./评论.txt', 'a', encoding="utf-8") as fp:
fp.write(file + '\n')
with open('./评论.csv', 'a', encoding="utf-8") as f:
f.write(file + '\n')
存储结果:
csv文件:
![](https://i-blog.csdnimg.cn/blog_migrate/93fac7a2dec7cf8fc4803743355d5a1a.png)
txt文件:
![](https://i-blog.csdnimg.cn/blog_migrate/fb374059e8ba2f81a113fc7e82425641.png)