Python学习笔记55 爬虫(隐藏)

1.为了隐藏访问方式,可以通过两种方式:

方法一:直接设置一个字典,作为参数传给request,通过修改Request的headers参数修改
head = {}
head['User-Agent'] = 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36'
req = urllib.request.Request(url,data,head)
#方法二:在request生成之后通过add header()方法修改
req = urllib.request.Request(url,data)
req.add_header('User-Agent','Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36')


2.为了更好的隐藏,可以使用延时或者代理

1.延时访问:

import urllib.request
import urllib.parse
import json
import time

while True:
    content = input ('请输入需要翻译的内容(输入"q!"退出程序):')
    if(content == 'q'):
        break
    url = 'http://fanyi.youdao.com/translate?smartresult=dict&smartresult=rule&smartresult=ugc&sessionFrom=dict2.top'


    data = {}
    data['type'] = 'AUTO'
    data['i'] = content
    data['doctype'] = 'json'
    data['xmlVersion ']= '1.8'
    data['keyfrom'] = 'fanyi.web'
    data['ue'] = 'UTF-8'
    data['action'] = 'FY_BY_CLICKBUTTON'
    data['typoResult'] = 'true'
    data = urllib.parse.urlencode(data).encode('utf-8')

    '''
    #隐藏是Python程序访问的两种方法
    方法一:直接设置一个字典,作为参数传给request,通过修改Request的headers参数修改
    head = {}
    head['User-Agent'] = 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36'
    req = urllib.request.Request(url,data,head)
    '''
    #方法二:在request生成之后通过add header()方法修改
    req = urllib.request.Request(url,data)
    req.add_header('User-Agent','Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36')

    response = urllib.request.urlopen(req)
    html = response.read().decode('utf-8')

    target = json.loads(html)
    target = target['translateResult'][0][0]['tgt']
    print("翻译结果:%s" % target)
    time.sleep(5)
2.代理

import urllib.request
import random

url = 'http://www.whatismyip.com.tw'
#找个代理Ip的网站 查找一些免费IP
iplist =  ['171.13.37.210:808','192.129.229.223:9001','61.237.131.59:80','222.94.144.86:808']

proxy_support = urllib.request.ProxyHandler({'http':random.choice(iplist)})

opener = urllib.request.build_opener(proxy_support)
opener.add_headers = [('User-Agent:','Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36')]
urllib.request.install_opener(opener)

response = urllib.request.urlopen(url)

html = response.read().decode('utf-8')

print(html)
测试结果:有时可以,有时不行,正常


评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值