scrapy
名字1001
这个作者很懒,什么都没留下…
展开
-
Scrapy
pip install Scrapyscrapy.cfgmyproject/ __init__.py items.py middlewares.py pipelines.py settings.py spiders/ __init__.py spider1.py spider2.py ...原创 2019-10-01 23:38:41 · 128 阅读 · 0 评论 -
学习scrapy,起点小说简易爬虫
# -*- coding: utf-8 -*-import scrapyclass QidianSpider(scrapy.Spider): name = 'qidian' allowed_domains = ['qidian.com'] start_urls = ['https://read.qidian.com/chapter/ZOJ_pWRoTg9wKI0S...原创 2019-10-03 14:50:50 · 957 阅读 · 2 评论 -
爬虫忽略ssl证书
from urllib.request import Request,urlopenimport sslurl = 'https://www.hao123.com'r = Request(url)#忽略证书context = ssl._create_unverified_context()response = urlopen(r,context=context)print(re...原创 2019-10-03 21:57:42 · 501 阅读 · 0 评论 -
爬虫,多个页面
from urllib.request import Request,urlopenfor i in range(3): url = 'http://hao123.zongheng.com/store/c0/w0/s0/p{0}/all.html'.format(i+1) r = Request(url) html = urlopen(r).read().decode...原创 2019-10-03 21:58:44 · 524 阅读 · 0 评论 -
爬虫,post请求
from urllib.request import Request,urlopenfrom urllib.parse import urlencodeurl = 'https://www.zhihu.com/signin?next=%2F'args = { 'username':'16633970705', 'password':'16633970705'}f_dat...原创 2019-10-03 21:59:40 · 201 阅读 · 0 评论 -
爬虫,get请求,添加参数
arg = '编程语言'url = 'https://www.baidu.com/s?wd={}'.format(quote(arg))#url = 'https://www.baidu.com/s?wd=python'headers = { "User-Agent":"Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.3...原创 2019-10-03 22:00:41 · 637 阅读 · 1 评论 -
爬虫,使用请求头User-Agent
from urllib.request import urlopenfrom urllib.request import Requesturl = 'https://www.hao123.com'header = { 'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like ...原创 2019-10-03 22:02:03 · 513 阅读 · 0 评论