爬虫
名字1001
这个作者很懒,什么都没留下…
展开
-
Scrapy
pip install Scrapyscrapy.cfgmyproject/ __init__.py items.py middlewares.py pipelines.py settings.py spiders/ __init__.py spider1.py spider2.py ...原创 2019-10-01 23:38:41 · 128 阅读 · 0 评论 -
python使用正则
import reimport requestsfrom fake_useragent import UserAgenturl = 'https://www.baidu.com'm = re.match(r'\w+',url)print(m.group())url2 = 'http://www.97xs.org/11/11389/3303898.html'headers = {...原创 2019-10-05 10:35:32 · 207 阅读 · 0 评论 -
爬虫proxy代理,简单实例
from urllib.request import Request,build_openerfrom fake_useragent import UserAgentfrom urllib.request import ProxyHandlerurl = 'https://www.qidian.com'headers = { "User-Agent":"Mozilla/5.0...原创 2019-10-04 19:59:30 · 1648 阅读 · 0 评论 -
python爬虫,使用cookie实例
from urllib.request import Request,build_opener,HTTPCookieProcessorfrom urllib.parse import urlencodefrom fake_useragent import UserAgentfrom http.cookiejar import MozillaCookieJardef get_cookie...原创 2019-10-04 19:56:45 · 334 阅读 · 0 评论 -
爬虫,使用请求头User-Agent
from urllib.request import urlopenfrom urllib.request import Requesturl = 'https://www.hao123.com'header = { 'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like ...原创 2019-10-03 22:02:03 · 513 阅读 · 0 评论 -
爬虫,get请求,添加参数
arg = '编程语言'url = 'https://www.baidu.com/s?wd={}'.format(quote(arg))#url = 'https://www.baidu.com/s?wd=python'headers = { "User-Agent":"Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.3...原创 2019-10-03 22:00:41 · 637 阅读 · 1 评论 -
爬虫,post请求
from urllib.request import Request,urlopenfrom urllib.parse import urlencodeurl = 'https://www.zhihu.com/signin?next=%2F'args = { 'username':'16633970705', 'password':'16633970705'}f_dat...原创 2019-10-03 21:59:40 · 201 阅读 · 0 评论 -
爬虫,多个页面
from urllib.request import Request,urlopenfor i in range(3): url = 'http://hao123.zongheng.com/store/c0/w0/s0/p{0}/all.html'.format(i+1) r = Request(url) html = urlopen(r).read().decode...原创 2019-10-03 21:58:44 · 524 阅读 · 0 评论 -
爬虫忽略ssl证书
from urllib.request import Request,urlopenimport sslurl = 'https://www.hao123.com'r = Request(url)#忽略证书context = ssl._create_unverified_context()response = urlopen(r,context=context)print(re...原创 2019-10-03 21:57:42 · 500 阅读 · 0 评论 -
学习scrapy,起点小说简易爬虫
# -*- coding: utf-8 -*-import scrapyclass QidianSpider(scrapy.Spider): name = 'qidian' allowed_domains = ['qidian.com'] start_urls = ['https://read.qidian.com/chapter/ZOJ_pWRoTg9wKI0S...原创 2019-10-03 14:50:50 · 957 阅读 · 2 评论 -
OCR百度api,python实现图像文字识别
from urllib.request import Request,urlopen# client_id 为官网获取的AK, client_secret 为官网获取的SKurl = 'https://aip.baidubce.com/oauth/2.0/token?grant_type=client_credentials&client_id=【API Key】&clien...原创 2019-10-06 00:40:26 · 1577 阅读 · 1 评论