元旦假期，去哪里旅游好呢？Python爬取元旦旅游最全攻略！

最新推荐文章于 2024-09-18 20:39:31 发布

爱摸鱼的菜鸟码农

最新推荐文章于 2024-09-18 20:39:31 发布

阅读量724

点赞数

文章标签： python 数据分析数据挖掘爬虫

原文链接：https://blog.csdn.net/qq_43791724/article/details/111825745

版权

2020还有最后几天就就结束了，您考虑好2021的第一天去哪里旅游了吗，不如来看看使用Python爬取最全攻略！受益的朋友给个三连。转发请求声明。

一、实现思路

首先我们爬取的网站是一个穷游网站： https://place.qyer.com/

爬取页面官网

我这里为大家编写了2个方式第一个就是获取中国范围内的旅游景点，和省级的旅游景点。我这还使用了词汇分析给大家进行展示。

分析页面我要我们要爬取的页面URL
通过requests 发送请求获取数据
解析我们想要的数据，剔除没用的数据
将数据保存到CSV文件
使用词汇分析生成图片

二、代码实现

导入依赖包

import pypinyin
import requests
import parsel
import csv
from concurrent.futures import ThreadPoolExecutor
import jieba
from wordcloud import WordCloud

解析中国的运行代码

nameList=[]
def China(num):
  url="https://place.qyer.com/china/citylist-0-0-"+num
  html= requests.get(url,headers=headers)
  text=html.text
  dom=parsel.Selector(text)
  lilist=dom.xpath("//*[@class='plcCitylist']/li")
  print("正在爬取第%s页"%num)
  for list in lilist:
      # 获取name
      travel_name=list.xpath(".//h3/a/text()").get()
      # 获取去过的人数
      travel_number =list.xpath(".//p[@class='beento']/text()").get()
      # 获取图片地址
      travel_image=list.xpath(".//p[@class='pics']/a/img/@src").get()
      # 获取介绍
      travel_hot=list.xpath(".//p[@class='pois']/a/text()").getall()
      # 去掉空格
      travel_hot=[hot.strip() for hot in travel_hot]
      # 转换为字符串
      travel_hot='.'.join(travel_hot)
      # 获取城市url
      travel_url ="https:"+list.xpath(".//h3/a/@href").get()
      # 数据保存
      nameList.append(".".join(travel_name))
      with open('穷游中国数据.csv',mode='a',encoding='utf-8',newline='') as f:
          csv_writer=csv.writer(f)
          csv_writer.writerow([travel_name,travel_number,travel_hot,travel_url,travel_image])
  print("爬取完成第%s页"%num)

运行结果图

爬取中国的运行结果