爬虫 PYTHON CODING

简单的爬虫:

      爬虫艺龙网,第一次写PYTHON代码。通过浏览器的F12功能获取本地发送的JSON格式,以及网站的真实URL,用PYTHON的REQUEST模块发送请求获取网站的数据。

     
网站的数据格式
dat={'code':'8872575',
'listRequest.areaID':'',
'listRequest.bookingChannel':'5',
'listRequest.cardNo':'192928',
'listRequest.checkInDate':'2018-01-18 00:00:00',
'listRequest.checkOutDate':'2018-01-19 00:00:00',
'listRequest.cityID':'0101',
'listRequest.cityName':'%E5%8C%97%E4%BA%AC%E5%B8%82',
'listRequest.customLevel':'11',
'listRequest.distance':'20',
'listRequest.endLat':'0',
'listRequest.endLng':'0',
'listRequest.facilityIds':'',
'listRequest.highPrice':'0',
'listRequest.hotelBrandIDs':'',
'listRequest.isAdvanceSave':'false',
'listRequest.isAfterCouponPrice':'true',
'listRequest.isCoupon':'false',
'listRequest.isDebug':'false',
'listRequest.isLimitTime':'false',
'listRequest.isLogin':'false',
'listRequest.isMobileOnly':'true',
'listRequest.isNeed5Discount':'true',
'listRequest.isNeedNotContractedHotel':'false',
'listRequest.isNeedSimilarPrice':'false',
'listRequest.isReturnNoRoomHotel':'true',
'listRequest.isStaySave':'false',
'listRequest.isTrace':'false',
'listRequest.isUnionSite':'false',
'listRequest.keywords':'',
'listRequest.keywordsType':'0',
'listRequest.language':'cn',
'listRequest.listType':'0',
'listRequest.lowPrice':'0',
'listRequest.orderFromID':'20008',
'listRequest.pageIndex':i,
'listRequest.pageSize':'20',
'listRequest.payMethod':'0',
'listRequest.personOfRoom':'0',
'listRequest.poiId':'0',
'listRequest.promotionChannelCode':'0000',
'listRequest.proxyID':'ZD',
'listRequest.rankType':'0',
'listRequest.returnFilterItem':'true',
'listRequest.sellChannel':'1',
'listRequest.seoHotelStar':'0',
'listRequest.sortDirection':'1',
'listRequest.sortMethod':'1',
'listRequest.starLevels':'',
'listRequest.startLat':'0',
'listRequest.startLng':'0',
'listRequest.taRecommend':'false',
'listRequest.themeIds':'',
'listRequest.ctripToken':'1893329a-ef66-4fc7-b102-2b2a75729672',
'listRequest.elongToken':'jcipx47l-78ea-4a22-85b1-b8f0d045a4c6',}
 
反爬虫的代码,(模拟浏览器的数据请求)
header={
'Accept':'application/json, text/javascript, */*; q=0.01',
'Accept-Encoding':'gzip, deflate',
'Accept-Language':'zh-CN,zh;q=0.9',
'Cache-Control':'no-cache',
'Connection':'keep-alive',
'Content-Length':'1602',
'Content-Type':'application/x-www-form-urlencoded; charset=UTF-8',
#'Cookie':'CookieGuid=c32eda0a-f900-4ccc-b0b2-5fc24ad6729a; SessionGuid=15481f07-b392-4808-a929-5ffac295f0a4; Esid=fc839781-069c-4780-ad3b-cde82add62c9; semid=ggnewbrand; outerFrom=ggnewbrand; com.eLong.CommonService.OrderFromCookieInfo=Status=1&Orderfromtype=5&Isusefparam=0&Pkid=20008&Parentid=2000&Coefficient=0.0&Makecomefrom=0&Cookiesdays=0&Savecookies=0&Priority=9000; fv=pcweb; s_cc=true; s_eVar44=ggnewbrand; _RF1=1.65.169.62; _RSG=7_aPlhfCJY1oPfSrfoA24B; _RDG=2877d3874f1ba325e2217ecb9d6c955855; _RGUID=1893329a-ef66-4fc7-b102-2b2a75729672; newjava1=ae5a4f7c29c02da29aec909ba9ac94ac; ADHOC_MEMBERSHIP_CLIENT_ID1.0=47da54db-e274-1391-9f48-db6fe4d5d77a; _fid=jcipx47l-78ea-4a22-85b1-b8f0d045a4c6; CitySearchHistory=0101%23%E5%8C%97%E4%BA%AC%E5%B8%82%23beijing%23; s_visit=1; JSESSIONID=C1AD3C44C66AE5DF686EEEC2FB1A9755; ShHotel=CityID=0101&CityNameCN=%E5%8C%97%E4%BA%AC%E5%B8%82&CityName=%E5%8C%97%E4%BA%AC%E5%B8%82&OutDate=2018-01-19&CityNameEN=beijing&InDate=2018-01-18; s_sq=elongcom%3D%2526pid%253Dhotel.elong.com%25252Fbeijing%2526pidt%253D1%2526oid%253Djavascript%25253Avoid(0)%2526ot%253DA
'Host':'hotel.elong.com',
'Origin':'http://hotel.elong.com',
'Pragma':'no-cache',
'Referer':'http://hotel.elong.com/beijing/',
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36',
'X-Requested-With':'XMLHttpRequest'
}
调用POST方法,用正则表达式截取返回数据中的信息
html=requests.post(url,data=dat,headers=header)
#time.sleep(random.randint(2,5))
hotel_tri=re.findall('<span class="h_pri_num ">(.*?)</span>',html.json()['value']['hotelListHtml'])
hotel_name=re.findall('target="_blank" title="(.*?)"',html.json()['value']['hotelListHtml'])
data=list(map(lambda x:(hotel_tri[x],hotel_name[x]),range(len(hotel_tri))))
datacsv=pd.DataFrame(data)
datacsv.to_csv('H:\\YLD\\yilongsummary.csv',header=False,index=False)
最后就可以用to_csv的方法把数据整理到EXCEL里面了。
 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值