xpath
文章平均质量分 76
Arthur54271
人生苦短,我用Python
展开
-
Python3-爬虫~selenium\phantomjs\豆瓣音乐例子
from selenium import webdriverimport os,timefrom lxml import etree#豆瓣音乐root_dir='douban'if not os.path.exists(root_dir): os.mkdir(root_dir)#访问driver=webdriver.PhantomJS()base_url='https:...原创 2018-05-18 10:39:27 · 233 阅读 · 0 评论 -
Python3-爬虫~selenium\phantomjs\豆瓣登录过程中处理验证码
#豆瓣登录from selenium import webdriverfrom selenium.webdriver.common.action_chains import ActionChainsimport os,timedriver=webdriver.PhantomJS()driver.get('https://www.douban.com/')#网络请求时间time.sle...原创 2018-05-18 12:57:14 · 550 阅读 · 0 评论 -
Python3~xpath
from lxml import etreefrom urllib import requestimport sslssl._create_default_https_context=ssl._create_unverified_contexthtml='''<bookstore> <title>新华书店</title> <bo...原创 2018-05-13 11:46:54 · 727 阅读 · 0 评论 -
Python3~xpath应用糗事百科爬虫
from urllib import requestfrom lxml import etreeimport reimport sslimport jsonssl._create_default_https_context=ssl._create_unverified_contextdef spider(page): base_url='https://www.qiushi...原创 2018-05-14 14:07:12 · 268 阅读 · 0 评论 -
Python3~scrapy项目之爬取当前页和下一页
# -*- coding: utf-8 -*-import scrapyfrom urllib import requestfrom Py06_2018_3_16.items import TencentItemclass tencentNextPageSpider(scrapy.Spider): name = 'tencent_next_page' allowed_do...原创 2018-05-30 18:59:59 · 10367 阅读 · 0 评论 -
Python3~scrapy项目之下载网页图片
# -*- coding: utf-8 -*-import scrapy,re,osfrom PY_2018_03_17.items import TuKuItemfrom urllib import requestclass TukuSpider(scrapy.Spider): name = 'tuku' allowed_domains = ['lanrentuku.c...原创 2018-05-31 14:36:47 · 606 阅读 · 0 评论