去哪网前端(已拿offer)

版权声明:本文为博主原创文章,遵循 CC 4.0 by-sa 版权协议,转载请附上原文出处链接和本声明。
本文链接:https://blog.csdn.net/Itistoolate/article/details/80208937

自我介绍
之前在汽车之家的项目(问了很多项目细节之类的)
GB2312编码问题
性能如何提升的(回流防抖函数)
手机端rem和rm问题
最强手机端兼容性解决方案
缓存(我说了四种,表示还有一个cache-control???)
mvvm框架
手写jsonp
混合应用都知道哪些
跨域都有什么,怎么实现的
30位大数相加(优化)
webpack原理
js性能优化相关
还有个算法忘了具体是什么了
vue的vdom和diff
vue源码
会react么?(不会)
WebGL聊了一会
js异步

展开阅读全文

python爬虫去哪网热门景点

06-22

我用python爬虫去哪网热门景点信息,结果只爬到了两页的内容,不知道是哪的问题,有大佬帮忙看看:n# -*- coding: utf-8 -*-nn# created by:tianxingn# created date:2017-11-1nimport scrapynimport renimport datetimenfrom practice.items import QvnaItemnnclass QuNaSpider(scrapy.Spider):n name = 'qvnawang'n #start_urls = ['http://sou.zhaopin.com/jobs/searchresult.ashx?pd=1&jl=%E9%80%89%E6%8B%A9%E5%9C%B0%E5%8C%BA&sm=0&sf=0&st=99999&isadv=1&sg=1545043c61dd44d5bf41f9913890abfa&p=1']n start_urls = ['http://piao.qunar.com/ticket/list.htm?keyword=%E7%83%AD%E9%97%A8%E6%99%AF%E7%82%B9&region=&from=mpl_search_suggest&subject=']n def parse(self,response):n item = QvnaItem()n #得到初始展示页面的基准xpath(某一页)n #pages = response.xpath('//div[@style="width: 224px;*width: 218px; _width:200px; float: left"]/a/@href')n pages = response.xpath('//div[@class="sight_item_pop"]/table/tr[3]/td/a/@href')n n #循环取出每一页上的每一个链接url地址,并调用parse_page函数解析每一个url上的页面内容n for eachPage in pages:n #获取链接URL(页面上所有的链接,每个链接单独处理)n #singleUrl = eachPage.extract()n singleUrl = 'http://piao.qunar.com'+eachPage.extract()n #内部调用parse_page函数n yield scrapy.Request(url = singleUrl,meta='item':item,callback=self.parse_page)nn nn #取得除最后一页之外的 '下一页' 的xpathn try:n if response.xpath('//div[@class="pager"]/a/@class').extract()[0] == 'next':n nextPage = 'http://piao.qunar.com' + response.xpath('//div[@class="pager"]/a/@href').extract()[0]n # 递归调用,将下一页的URL传进Request函数n yield scrapy.Request(url=nextPage, callback=self.parse)n except IndexError as ie:n # 因最后一页没有上述xpath,所以不满足条件,即可退出递归n try:n exit()n except SystemExit as se:n passn nn #爬取单个链接对应的页面内容n def parse_page(self, response):n # 通过meta得到itemn item = response.meta['item']nn n tour_info = response.xpath('/html/body/div[2]/div[2]/div[@class="mp-description-detail"]')nn #景点名称n try:n item['name'] = tour_info.xpath('div[1]/span[1]/text()').extract()[0]\n .replace('\r','').replace('\n','').replace('\t','').replace(' ','').replace('\xa0','').replace('\u3000','')n except IndexError as ie:n item['name'] = ''nn #景点等级n try:n item['rank'] = tour_info.xpath('div[1]/span[2]/text()').extract()[0]\n .replace('\r','').replace('\n','').replace('\t','').replace(' ','').replace('\xa0','').replace('\u3000','')n except IndexError as ie:n item['rank'] = 0nn #景点描述n try:n item['decription'] = tour_info.xpath('div[2]/text()').extract()[0]\n .replace('/',',').replace('\r','').replace('\n','').replace('\t','').replace(' ','').replace('\xa0','').replace('\u3000','')n except IndexError as ie:n item['decription'] = ''nn #景点地点n try:n item['address'] = tour_info.xpath('div[3]/span[3]/text()').extract()[0]n item['address'] = item['address'].replace('/',',').replace(u'、','')\n .replace(u'(',',').replace('(',',').replace(u')','').replace(')','')\n .replace('\r','').replace('\n','').replace('\t','').replace(' ','').replace('\xa0','').replace('\u3000','')n except IndexError as ie:n item['address'] = ''nn #用户评价n try:n item['comment'] = tour_info.xpath('div[4]/span[3]/span/text()').extract()[0]\n .replace('/',',').replace('\r','').replace('\n','').replace('\t','').replace(' ','').replace('\xa0','').replace('\u3000','')n except IndexError as ie:n item['comment'] = ''nn #天气情况n try:n item['weather'] = tour_info.xpath('div[5]/span[3]/text()').extract()[0]\n .replace('/',',').replace('\r','').replace('\n','').replace('\t','').replace(' ','').replace('\xa0','').replace('\u3000','')n except IndexError as ie:n item['weather'] = ''n n #门票最低价格n try:n item['lowprice'] = tour_info.xpath('div[7]/span/em/text()').extract()[0]\n .replace('/',',').replace('\r','').replace('\n','').replace('\t','').replace(' ','').replace('\xa0','').replace('\u3000','')n except IndexError as ie:n item['lowprice'] = ''n n #发布日期n today = datetime.datetime.now()n item['date'] = today.strftime('%Y-%m-%d')nn nn yield itemn 问答

没有更多推荐了,返回首页