小朋友们需要好好了解python基本爬虫知识和框架哦,方便后续的学习呀。
首先每次爬虫都需要导入基本库:
import urllib.parse
import urllib.request
然后再翻译服务的API地址:
url = "https://fanyi.baidu.com/v2transapi?from=en&to=zh"
模拟UA:
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36 Edg/122.0.0.0',
'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
'Cookie':'BAIDUID_BFESS=579B999485928630240C6140B82D644A:FG=1; BIDUPSID=579B999485928630240C6140B82D644A; PSTM=1709694736; H_PS_PSSID=40008_40171_40207_40212_40216_40224_40059_40272_40282_40294_40291_40288_40286_40317_40080_40364_40351; ZFY=IXj5:BPw8E:AtcCjkXpl11MuRWM0peOhbqL5INKY5ca9U:C; RT="z=1&dm=baidu.com&si=84daad4d-a496-4eae-911c-e83a5019fcf7&ss=ltf8mmyn&sl=2&tt=2q6&bcn=https%3A%2F%2Ffclog.baidu.com%2Flog%2Fweirwood%3Ftype%3Dperf&ld=3bv&ul=3r8&hd=3rh"; APPGUIDE_10_7_0=1; REALTIME_TRANS_SWITCH=1; FANYI_WORD_SWITCH=1; HISTORY_SWITCH=1; SOUND_SPD_SWITCH=1; SOUND_PREFER_SWITCH=1; Hm_lvt_64ecd82404c51e03dc91cb9e8c025574=1710058842; Hm_lpvt_64ecd82404c51e03dc91cb9e8c025574=1710058842; ab_sr=1.0.1_NmJjZWU1ZDU2ZjIxYWExZjViMmM4YTM5ZmJmNWIwNDFmMGRlMzQ4Mzk4MzU2ODg4NGRlMDAyZWNhOGMwOTIwYzhlYTNmZDQyN2FhYTI3NTgzOTRlMDQ5MmZlMzI3OTNkMTFmMjEwMzBhYTUyM2QwNWRmYjdmYzI3ZWM1NmI4N2U1YTkyYTI4MjMxODkyZTdiZGU0OWUzYmEzNTI4NjMyNTA1ZTdmZTg1OTljZTE0YmFiYWEwY2IzMWU4ZDQyOWE4ZTE1OTMyZjU5ZTg2NWUzMTdiNzA4OTYwZGQ0MzZlNjM='
}
标签信息:
F12打开网络即可寻找data信息即可
利用post请求参数并编码,而post请求的参数,是不会拼接在url后面的,所以需要放在请求对象的指定参数中:
data = urllib.parse.urlencode(data).encode('utf-8')
request = urllib.request.Request(url=url, data=data, headers=headers)
然后可以模拟浏览器请求面向服务器发送请求即可:
reponse = urllib.request.urlopen(request)
然后获取响应数据即可,
将字符串转换为json对象:
import json
obj = json.loads(content)
print(obj)
可以根据以上提示即可完成百度翻译的爬虫任务,如需完整代码请三连后私聊博主哟!