Python网络爬虫实例1:用Python访问有道进行翻译

 

1.打开有道网页,写入文本并点击翻译

2.点击审查元素

3.点击Network,找到Name中transate一项

4.点击Headers,找到General中的Request URL

5.找到From Data这一项

6.打开Python写脚本

将找到的Request URL 复制放入url中

url = 'http://fanyi.youdao.com/translate?smartresult=dict&smartresult=rule'

将From Data中的元素建立一个字典,冒号前面的存为key,冒号后面的存为value

data = {}

data['i'] = content
data['from'] = 'AUTO'
data['to'] = 'AUTO'
data['smartresult'] = 'dict'
data['client'] = 'fanyideskweb'
data['salt'] = '1532850198388'
data['sign'] = '364e87600cf20d7bdb57c669faa45306'
data['doctype'] = 'json'
data['version'] = '2.1'
data['keyfrom'] = 'fanyi.web'
data['action'] = 'FY_BY_REALTIME'
data['typoResult'] = 'false'

整体代码如下:

import urllib.request
import urllib.parse
import json

content = input('请输入需要翻译的内容:')
url = 'http://fanyi.youdao.com/translate?smartresult=dict&smartresult=rule'
data = {}

data['i'] = content
data['from'] = 'AUTO'
data['to'] = 'AUTO'
data['smartresult'] = 'dict'
data['client'] = 'fanyideskweb'
data['salt'] = '1532850198388'
data['sign'] = '364e87600cf20d7bdb57c669faa45306'
data['doctype'] = 'json'
data['version'] = '2.1'
data['keyfrom'] = 'fanyi.web'
data['action'] = 'FY_BY_REALTIME'
data['typoResult'] = 'false'

data = urllib.parse.urlencode(data).encode('utf-8')
req = urllib.request.Request(url,data)
response = urllib.request.urlopen(req)
html = response.read().decode('utf-8')


target = json.loads(html)
print('翻译的结果是:%s' %  (target['translateResult'][0][0]['tgt']))

7.编译运行

如果出现{"errorCode":50} ,将url中的 _o删去,再次编译运行

运行结果如下:

8.改进

通过判断Headers下的Request Headers中的User-Agent来判断是代码访问网页还是浏览器访问网页,为了避免被服务器屏蔽可以
8.1添加headers

找到审查元素中,Headers下的User-Agent

将User-Agent的内容放入代码中

 req.add_header('User-Agent','Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.90 Safari/537.36 2345Explorer/9.3.2.17331 ') 

整体代码如下: 

import urllib.request
import urllib.parse
import json

content = input('请输入需要翻译的内容:')
url = 'http://fanyi.youdao.com/translate?smartresult=dict&smartresult=rule'
data = {}

data['i'] = content
data['from'] = 'AUTO'
data['to'] = 'AUTO'
data['smartresult'] = 'dict'
data['client'] = 'fanyideskweb'
data['salt'] = '1532850198388'
data['sign'] = '364e87600cf20d7bdb57c669faa45306'
data['doctype'] = 'json'
data['version'] = '2.1'
data['keyfrom'] = 'fanyi.web'
data['action'] = 'FY_BY_REALTIME'
data['typoResult'] = 'false'

data = urllib.parse.urlencode(data).encode('utf-8')
req = urllib.request.Request(url,data)
req.add_header('User-Agent','Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.90 Safari/537.36 2345Explorer/9.3.2.17331 ')
response = urllib.request.urlopen(req)
html = response.read().decode('utf-8')


target = json.loads(html)
print('翻译的结果是:%s' %  (target['translateResult'][0][0]['tgt']))

 

8.2代理

iplist = ['118.31.220.3:8080','221.228.17.172:8181','219.141.153.4:80']#代理ip及端口
dict1 = {'http':random.choice(iplist)}

proxy_support = urllib.request.ProxyHandler(dict1)
opener = urllib.request.build_opener(proxy_support)
opener.addheaders = [('User-Agent','Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.90 Safari/537.36 2345Explorer/9.3.2.17331')]
urllib.request.install_opener(opener)

 

  • 2
    点赞
  • 8
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值