【爬虫】Python爬虫

一,爬取数据模块 requests

二,反爬三要素(一般):

1,User-Agent

示例:白DU网

import requests

url = 'https://www.xxxxx.com/'
headers = {
    'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36',
}
res = requests.get(url,headers=headers)
res.encoding = 'utf8'
with open('bai.html','w') as f:
    f.write(res.text)

2,Referer

示例:豆BAN网

import requests

url = 'https://m.doudou.com/rexxar/api/v2/movie/recommend?refresh=0&start=0&count=40&selected_categories=KEYWORD&uncollect=false&tags=KEYWORD'
headers = {
    'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36',
    'Referer':'https://movie.douban.com/explore',
}
res = requests.get(url,headers=headers)
res.encoding = 'utf8'
res = res.json()

print(res['items'][2])
for item in res['items']:
    try:
        print(item['title'] + '------' + item['card_subtitle'])
    except:
        print('没有title')

3,Cookie

示例:雪QIU网

import requests

url = 'https://stock.xueqiu.com/v5/stock/screener/quote/list.json?page=1&size=30&order=desc&order_by=amount&exchange=CN&market=CN&type=sha'
cookie = 's=b611kg37f0; xq_a_token=a0f5e0d91bc0846f43452e89ae79e08167c42068; xq_r_token=76ed99965d5bffa08531a6a47501f096f61108e8; xq_id_token=eyJ0eXAiOiJKV1QiLCJhbGciOiJSUzI1NiJ9.eyJ1aWQiOi0xLCJpc3MiOiJ1YyIsImV4cCI6MTY5NTUxNTc5NCwiY3RtIjoxNjkzMjcxNTg5NDQ2LCJjaWQiOiJkOWQwbjRBWnVwIn0.U0RI9wJfaaUnb7FNVKs28Vqkjn10bTz5-FC7-shbgkoUuV4BNgK5GNs9SLbz1HPiwq7r60Zk_ySe8sOdFL1-R28xepOt2H4Kr3tJuA_0wmeSloQz8E43W0FcKrakcsIrX-zgG--gUGjy0yTOu7a3RyZuJfHBdGXQcI_97DORNwmpM2QaESZiT19oyIOaVs1UzoDCOfvquYG82XmSfmBY3Q6M1nQJmlkTLI91ZikGvu1mfQglLVvvoedwzIrN8waJU3tOmxaI7UavpqPa0eWRAoC52dYzAiNebEU0Zr55xhh21I5nsI_sQgw2J3GQD_Wo5JhR6TE7tjttpYIVKUBrVQ; cookiesu=981693271601194; u=981693271601194; Hm_lvt_1db88642e346389874251b5a1eded6e3=1693271600; device_id=23559baf9d6be36be8ef4b12c835c759; Hm_lpvt_1db88642e346389874251b5a1eded6e3=1693385189'
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.69 Safari/537.36',
    'Referer': 'https://xueqiu.com/hq',
    'Cookie':cookie,
}
res = requests.ge(url, headers=headers)
res = res.json()

三,get和post的参数

import requests
res = requests.get(url,params={})
res = requests.post(url,data={})

data = res.text()   // 输出字符串类型
data = res.json()   // 输出为字典类型

data = res.content // 输出源码,二进制保存视频和图片等

四,小知识点

import os
name = os.path.basename('/uplude/s.jpg')
// 输出 s.jpg
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值