![](https://img-blog.csdnimg.cn/20201014180756928.png?x-oss-process=image/resize,m_fixed,h_64,w_64)
python爬虫
derrick_lh
这个作者很懒,什么都没留下…
展开
-
Scrapy加Selenium爬取简书
爬虫主体:# -*- coding: utf-8 -*-import scrapyfrom scrapy.linkextractors import LinkExtractorfrom scrapy.spiders import CrawlSpider, Rulefrom js_spi.items import ArticleItemclass JsSpider(CrawlSpid...原创 2020-03-21 17:17:14 · 362 阅读 · 3 评论 -
scrapy爬取汽车之家图片之pipeline方法进阶
一:不利用scrapy自带的下载图片的方法爬虫主体:# -*- coding: utf-8 -*-import scrapyfrom car_spi.items import CarSpiItemclass CarSpider(scrapy.Spider): name = 'car' allowed_domains = ['"car.autohome.com.cn"'...原创 2020-03-19 14:33:58 · 402 阅读 · 0 评论 -
拉勾网爬虫之利用selenium控制谷歌浏览器爬取职位信息
拉勾网爬虫之利用selenium控制谷歌浏览器爬取职位信息import timefrom lxml import etreefrom selenium import webdriverJOB_LIST = []class Lagou_Spider(object): driver_path = r"C:\ChromeDriver\chromedriver.exe" ...原创 2020-03-16 12:25:43 · 293 阅读 · 0 评论 -
python多线程模式爬取表情包并根据类别放入对应文件夹
总共爬取了前一百页,耗时大概有四五分钟import requestsimport reimport urllibimport osimport threadingfrom queue import QueuegLock = threading.Lock()HEADERS = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; W...原创 2020-03-14 14:46:11 · 192 阅读 · 0 评论 -
普通模式与多线程模式之爬取斗图拉网表情对比
普通模式:import requestsimport reimport urllibimport osHEADERS = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.132 Safari/537...原创 2020-03-14 14:36:04 · 93 阅读 · 0 评论 -
消费者与生产者多线程之thread下Lock与Condition对比
Lock版本:import threadingimport randomimport timegLock = threading.Lock()ALL_MONEY = 1000TIME_COUNT = 0class producer(threading.Thread): def run(self): global ALL_MONEY glo...原创 2020-03-14 14:33:30 · 132 阅读 · 0 评论 -
正则表达式初次练习之python爬取古诗词网推荐十页所有古诗词
正则表达式初次练习之python爬取古诗词网推荐十页所有古诗词import requestsimport reHEADERS = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.132 Safari/5...原创 2020-03-12 21:28:46 · 302 阅读 · 0 评论 -
python爬取中国天气网所有城市的最低气温并抽取前十利用matlab进行可视化输出
python爬取中国天气网所有城市的最低气温并抽取前十利用matlab进行可视化输出`import requestsimport lxmlfrom bs4 import BeautifulSoupimport pandas as pdimport matplotlib.pyplot as plturl1 = 'http://www.weather.com.cn/textFC/hb.s...原创 2020-03-12 13:54:26 · 676 阅读 · 0 评论 -
python内置库urllib的爬虫基本使用
1.POST请求from urllib import request, parseheaders = { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.132 Safari/537.36" , "R...原创 2020-03-09 21:35:19 · 112 阅读 · 0 评论