目录
1、get和post是两个不同的请求体,get直接简单,但不安全,post有一层包装,更安全。
8、Cookie反爬,这里是指静态Cookie,动态的爬不了
26、正则表达式使用方法,resp.encoding = 'utf-8'
33、找到包含xxx的字符串文本,需要导入正则,还有beautifulsoup4。
35、将转换成字符串方法,xpath中,使用tostring将节点对象转换为标签。
1、get和post是两个不同的请求体,get直接简单,但不安全,post有一层包装,更安全。
get请求,没有请求体,POST请求有请求体。
2、一个测试服务器实例代码:
import socket
sock =socket.socket()
sock.bind(("127.0.0.1",8080))
sock.listen(5)
while 1:
conn,addr = sock.accept()
data = conn.recv(1024)
print('data:::',data)
conn.send(b'HTTPS/1.1 200 OK\r\n\r\n1111')
在浏览器里面输入:127.0.0.1:8080即可访问,并返回:1111.
3、HTTP请求格式:
4、反爬机制三要素:UA,cookie,referer
5、json格式化快捷方式
6、referer反爬
import requests
my_headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.5359.125 Safari/537.36",
'Referer': 'https://movie.douban.com/explore'
}
url = 'https://m.douban.com/rexxar/api/v2/movie/recommend?refresh=0&start=0&count=20&selected_categories=%7B%22%E7%B1%BB%E5%9E%8B%22:%22%E7%8A%AF%E7%BD%AA%22%7D&uncollect=false&tags=%E7%8A%AF%E7%BD%AA'
res = requests.get(url,headers=my_headers)
print(res.text)
7、Pycharm快捷方式:
ctrl+/ 注释代码
8、Cookie反爬,这里是指静态Cookie,动态的爬不了
import requests
url = "https://stock.xueqiu.com/v5/stock/screener/quote/list.json?type=sha&order_by=percent&order=desc&size=10&page=1"
my_headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.5359.125 Safari/537.36",
'Referer': 'https://xueqiu.com/hq',
"Cookie": "s=ce1691zr64; xq_a_token=cf755d099237875c767cae1769959cee5a1fb37c; xq_r_token=e073320f4256c0234a620b59c446e458455626d9; xq_id_token=eyJ0eXAiOiJKV1QiLCJhbGciOiJSUzI1NiJ9.eyJ1aWQiOi0xLCJpc3MiOiJ1YyIsImV4cCI6MTcwMTk5NTg4MCwiY3RtIjoxNzAwNDgyODU2MTUwLCJjaWQiOiJkOWQwbjRBWnVwIn0.ahoNtkL30exAv-5UuI6_lC0iGfIhaH70FLFI1Z9A8TlVG3dXxDPHnT43IJo36pwmd6s9kGAIHlz7IU-f1bWQ5czH6W77as69qG6OThYTUVatXS0bvBVBVS-uzMSmXWvweqSPjFKAJHaHQH38F8SfjssNYXbEyGwXp2XfdieueRrgZVdjPwZ-sgczrolrJg-K-3XYa87gtcLzmM9vl5_8aKfsyUt5XTiSFADNQe-31aj8lm-ciiYHqabPsS19y8Gv9k2UMknYCEwaZlwc4_HUxU4BNgIqk1CU3cVVykeom5V3ggQ-W10OCzUkArX-fp26VjctK0cvKJNHDQF7_DnGVA; cookiesu=771700482860303; u=771700482860303; device_id=5196ccf1e7f4009737785bc93860f84e; Hm_lvt_1db88642e346389874251b5a1eded6e3=1700482887; Hm_lpvt_1db88642e346389874251b5a1eded6e3=1700482936"
}
res = requests.get(url,headers=my_headers)
print(res.text)
9、get请求参数
import requests
my_headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.5359.125 Safari/537.36",
'Referer': 'https://movie.douban.com/explore'
}
url = 'https://m.douban.com/rexxar/api/v2/movie/recommend'
tags = input('请输入电影类型:'),
print(tags)
my_params = {
'start': 0,
'count': 60,
'tags': tags
}
res = requests.get(url,headers=my_headers,params=my_params)
# params是get请求参数
# print(res.text)
print(res.json())
10、post请求体
import requests
rul = 'https://aidemo.youdao.com/trans'
# 有道智云网站
word = input('请输入要翻译的单词:')
my_data = {
'q': word,
'from': 'Auto',
'to': 'Auto'
}
res = requests.post(rul,data=my_data)
# data是POST的请求体
print(res.json().get('translation'))
11、爬图片和视频
import requests
url = 'http://img.netbian.com/file/2023/1113/224050jZgYd.jpg'
my_headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.5359.125 Safari/537.36",
}
res = requests.get(url,headers=my_headers)
print(res.content)
with open('1.png','wb') as f:
f.write(res.content)
# 这里需要注意:1.b换成wb 2.res.test换成res.content。 视频也一样。