1.安装requests库
1.创建一个工程
2.在终端命令行执行安装requests命令
pip install requests
2.爬虫代码
1.使用 GET 方式抓取数据
main.py
import requests #导入requests包
def main():
url = 'http://www.cntour.cn/'# 想要爬取数据的网站的网址
strhtml = requests.get(url) # Get方式获取网页数据
print(strhtml.text)
return
# Press the green button in the gutter to run the script.
if __name__ == '__main__':
main()
运行结果
2.使用 POST 方式抓取数据
爬取有道翻译的内容
1.进入网址 https://fanyi.youdao.com/
2.按F12打开调试窗口, 输入要翻译的文字并点击翻译
在Network处可以看到translate_o?smartresult=dict&smartresult=rule这个请求
3.点击translate_o?smartresult=dict&smartresult=rule, 详细查看
显然这是一个post请求
4.可以得到Request URL和Form Data
做准备工作, 把Request URL = https://fanyi.youdao.com/translate_o?smartresult=dict&smartresult=rule 中的_o去掉,作为url, 因为不把_o去掉会爬取失败,报错{“errorCode”:50}
url = 'https://fanyi.youdao.com/translate?smartresult=dict&smartresult=rule'
Form Data里的数据粘贴复制下来, 写成一个字典的形式
From_data = {
'i': '今天玩得很开心!',
'from': 'zh-CHS',
'to': 'en',
'smartresult': 'dict',
'client': 'fanyideskweb',
'salt': '16378460516426',
'sign': 'c83e460e73708a30ae68bb914197b3f0',
'lts': '1637846051642',
'bv': 'b0ff5d17f404993192085bf8b1e93587',
'doctype': 'json',
'version': '2.1',
'keyfrom': 'fanyi.web',
'action': 'FY_BY_CLICKBUTTION'
}
5.在Response中可查看返回的响应数据
{"translateResult":[[{"tgt":"I had a great time today!","src":"今天玩得很开心!"}]],"errorCode":0,"type":"zh-CHS2en","smartResult":{"entries":["","Today a lot of fun\n",""," \r\n","Today they're very happy\n",""," \r\n"],"type":1}}
待会爬取得到就是这种格式的数据
6.编写爬虫
import requests #导入requests包
import json
def translate(text=None):
# 把Request URL = https://fanyi.youdao.com/translate_0?smartresult=dict&smartresult=rule 中的_o去掉,
# 作为url, 因为不把_o去掉会爬取失败,报错{"errorCode":50}
url = 'https://fanyi.youdao.com/translate?smartresult=dict&smartresult=rule'
From_data = {
# text就是要翻译的文字
'i': text,
'from': 'zh-CHS',
'to': 'en',
'smartresult': 'dict',
'client': 'fanyideskweb',
'salt': '16378460516426',
'sign': 'c83e460e73708a30ae68bb914197b3f0',
'lts': '1637846051642',
'bv': 'b0ff5d17f404993192085bf8b1e93587',
'doctype': 'json',
'version': '2.1',
'keyfrom': 'fanyi.web',
'action': 'FY_BY_CLICKBUTTION'
}
# 请求表单数据
response = requests.post(url, data=From_data)
# 将Json格式字符串转字典
content = json.loads(response.text)
print(content)
def main():
translate('明天就要放假啦!')
return
# Press the green button in the gutter to run the script.
if __name__ == '__main__':
main()
运行结果
{'type': 'ZH_CN2EN', 'errorCode': 0, 'elapsedTime': 0, 'translateResult': [[{'src': '明天就要放假啦!', 'tgt': 'Will have a holiday tomorrow!'}]]}