22行代码爬取小姐姐短视频

1.先拿到目标网址的内容再分析

url = 'https://www.*****.com/discover?modal_id=*******************'
headers = {
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36',
    'referer': 'https://www.*****.com/',
}
response = requests.get(url, headers=headers).text

2.目标视频的接口和名称都藏在这页的数据里面

查看视频接口,我们去全局搜索"v3-web.******.com"等字样

我们在发现这些内容在之前拿到的目标网页的文件第141行

3.用正则把整个JSON字符串找出来,在用json来提取目标内容,就能获取目标视频的name和src

4.目标src需要处理一下,加个‘https:’,最后简单的requests后写入文件即可。

import requests
import json
import re

url = 'https://www.******.com/discover?modal_id=***************'
headers = {
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36',
    'referer': 'https://www.******.com/',
}
response = requests.get(url, headers=headers).text
scripts = re.findall('self.__pace_f.push\((.*?)\)</script>', response)[4]
scripts1=json.loads(scripts)
scripts1=scripts1[1]
scripts1=re.sub('3:','',scripts1)
scripts1=json.loads(scripts1)
scripts1=scripts1[0][3]['videoDetail']
name = scripts1['desc']
print(name)
url_x = 'https:'+scripts1['video']['playAddr'][0]['src']
response2 =requests.get(url_x,headers).content
with open(f'********/{name}.mp4',mode='ab') as f:
    f.write(response2)

某短视频网站的JSON结构优化调整

scripts = re.findall('self.__pace_f.push\((\[1,"%.*?)\)</script>', response)[0]
scripts = eval(scripts)[1]
scripts1 = unquote(scripts)
scripts1 = json.loads(scripts1)
scripts1 = scripts1['app']['videoDetail']
name = scripts1['desc']
url_x = 'https:' + scripts1['video']['playAddr'][0]['src']

  • 16
    点赞
  • 7
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值