python学习笔记之模块爬虫-隐藏(十三)

背景

很多网站会限制程序去爬数据,所以必须要伪装隐藏自己,模拟是浏览器发起的请求

添加head和延迟访问

方式一 发起的request请求前,加上head

示例代码如下:
header中加上User-Agent属性

import urllib.request
import urllib.parse
import json

content=input('Enter the word that needs translated:')
url='http://fanyi.youdao.com/translate?smartresult=dict&smartresult=rule'#_o要去掉,否则会出先error_code:50的报错

header = {}
header['User-Agent']='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36'


data={}
#以下为审查元素,可以在网站翻译页面按F12查看,i和doctype键不可少,其他都可以删除,不影响爬取翻译
data['i']=content
data['from']='AUTO'
data['to']='AUTO'
data['smartresult']='dict'
data['client']='fanyideskweb'
data['salt']='15601659811655'
data['sign']='78817b046452f9663a2b36604f220360'
data['doctype']='json'
data['version']='2.1'
data['keyfrom']='fanyi.web'
data['action']='FY_BY_REALTTIME'
data=urllib.parse.urlencode(data).encode('utf-8')

req = urllib.request.Request(url,data,header)
response=urllib.request.urlopen(req)
html=response.read().decode('utf-8')
target=json.loads(html)
print('result:%s'%(target['translateResult'][0][0]['tgt']))

User-Agent属性的取值在浏览器的开发者工具,network中request headers可查到
在这里插入图片描述

方式二 生成request后,加上header

在request生成后,加上*request.add_header()*修改

import urllib.request
import urllib.parse
import json

content=input('Enter the word that needs translated:')
url='http://fanyi.youdao.com/translate?smartresult=dict&smartresult=rule'#_o要去掉,否则会出先error_code:50的报错

#header = {}
#header['User-Agent']='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36'


data={}
#以下为审查元素,可以在网站翻译页面按F12查看,i和doctype键不可少,其他都可以删除,不影响爬取翻译
data['i']=content
data['from']='AUTO'
data['to']='AUTO'
data['smartresult']='dict'
data['client']='fanyideskweb'
data['salt']='15601659811655'
data['sign']='78817b046452f9663a2b36604f220360'
data['doctype']='json'
data['version']='2.1'
data['keyfrom']='fanyi.web'
data['action']='FY_BY_REALTTIME'
data=urllib.parse.urlencode(data).encode('utf-8')

req = urllib.request.Request(url,data)
req.add_header('User-Agent','Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36')

response=urllib.request.urlopen(req)
html=response.read().decode('utf-8')
target=json.loads(html)
print('result:%s'%(target['translateResult'][0][0]['tgt']))

延迟访问

  • 加入time.sleep休眠时间
  • 输入退出条件
import urllib.request
import urllib.parse
import json
import time

while True:
    content=input('Enter the word that needs translated(输入q!退出exit ):')
    if (content == 'q!'):
        break
    url='http://fanyi.youdao.com/translate?smartresult=dict&smartresult=rule'#_o要去掉,否则会出先error_code:50的报错

    #header = {}
    #header['User-Agent']='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36'


    data={}
    #以下为审查元素,可以在网站翻译页面按F12查看,i和doctype键不可少,其他都可以删除,不影响爬取翻译
    data['i']=content
    data['from']='AUTO'
    data['to']='AUTO'
    data['smartresult']='dict'
    data['client']='fanyideskweb'
    data['salt']='15601659811655'
    data['sign']='78817b046452f9663a2b36604f220360'
    data['doctype']='json'
    data['version']='2.1'
    data['keyfrom']='fanyi.web'
    data['action']='FY_BY_REALTTIME'
    data=urllib.parse.urlencode(data).encode('utf-8')

    req = urllib.request.Request(url,data)
    req.add_header('User-Agent','Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36')

    response=urllib.request.urlopen(req)
    html=response.read().decode('utf-8')
    target=json.loads(html)
    print('result:%s'%(target['translateResult'][0][0]['tgt']))
    time.sleep(3)

代理

步骤

通过代理服务来访问

  1. 参数是一个字典{‘类型’,‘代理ip:端口号’}
    proxy_support = urllib.request.ProxyHandler({})
  2. 定制创建一个opener
    opener = urllib.request.builde_opener(proxy_support)
  3. 安装opener
    urllib.request.install_opener(opener)
  4. 调用opener
    opener.open(url)
import urllib.request
import random

url = 'http://www.whatismyip.com.tw'

iplist = ['119.6.144.73:81','183.203.208.166:8118','111.1.32.28:81']
proxy_support = urllib.request.ProxyHandler({'http':random.choice(iplist)})

opener = urllib.request.build_opener(proxy_support)

urllib.request.install_opener(opener)

response = urllib.request.urlopen(url)
html = response.read().decode('utf-8')

print(html)

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值