python 写数值模拟器_python实现模拟器爬取抖音评论数据的示例代码

目标:

由于之前和朋友聊到抖音评论的爬虫,demo做出来之后一直没整理,最近时间充裕后,在这里做个笔记。

提示:大体思路 通过fiddle + app模拟器进行抖音抓包,使用python进行数据整理

安装需要的工具:

python3 下载

fiddle 安装及配置

手机模拟器下载

抖音部分:

模拟器下载好之后, 打开模拟器

在应用市场下载抖音

7931647457e3f119383af86781d75716.png

对抖音进行fiddle配置,配置成功后就可以当手机一样使用了

一、工具配置及抓包:

我们随便打开一个视频之后,fiddle就会刷新新的数据包

6b4bfd89263f9fd9d9df80a2db2c2826.png

在json中找到视频地址:

ec8cc889ce6f8c5e1858f408f6c4e14b.png

二、fiddler中添加下载视频评论代码

在fiddler中添加下载视频代码:注意两点:

(1)get后面的路径要随时看进行更换

(2)下载的路径要在fiddler下面自己新建

if (m_Hide304s && oSession.responseCode == 304) {

oSession["ui-hide"] = "true";

}

if (oSession.uriContains("https://aweme.snssdk.com/aweme/v1/general/search/single/")){

var strBody=oSession.GetResponseBodyAsString();

var sps = oSession.PathAndQuery.slice(-58,);

//FiddlerObject.alert(sps)

var timestamp=new Date().getTime();

var filename = "D:\抖音评论资料" + "/" + sps + timestamp + ".json";

var curDate = new Date();

var sw : System.IO.StreamWriter;

if (System.IO.File.Exists(filename)){

sw = System.IO.File.AppendText(filename);

sw.Write(strBody);

}

else{

sw = System.IO.File.CreateText(filename);

sw.Write(strBody);

}

sw.Close();

sw.Dispose();

此段代码放到fiddler中的script的response中,如下图:添加好之后别忘记保存!!

fa40eea479cb2c1a336da1dc09c32f60.png

三、python执行代码pycharm新建py文件

程序执行代码:

import os

import json

import time

import requests

import re

import csv

class Douyin(object):

def __init__(self):

pass

self.url1 = 'https://aweme.snssdk.com/aweme/v2/comment/list/?aweme_id=6885929189950737676&cursor=0&count=20&address_book_access=1&gps_access=1&forward_page_type=1&channel_id=0&city=310000&hotsoon_filtered_count=0&hotsoon_has_more=0&follower_count=0&is_familiar=0&page_source=0&os_api=25&device_type=VOG-AL00&ssmix=a&manifest_version_code=110301&dpi=240&uuid=868594157367551&app_name=aweme&version_name=11.3.0&ts=1603350069&cpu_support64=false&app_type=normal&ac=wifi&host_abi=armeabi-v7a&channel=aweGW&update_version_code=11309900&_rticket=1603350070959&device_platform=android&iid=1758845207590062&version_code=110300&mac_address=b0%3Ac4%3A2d%3Ad0%3Aed%3A38&cdid=7974198e-c4c0-49c2-bfaa-43686052706e&openudid=d0c6cffa7067bedd&device_id=844047245117672&resolution=720*1280&device_brand=HUAWEI&language=zh&os_version=7.1.2&aid=1128&mcc_mnc=46000'

self.url2 = 'https://aweme.snssdk.com/aweme/v2/comment/list/?aweme_id=6885163969477086479&cursor=0&count=20'

self.header = {

'Accept-Encoding': 'gzip',

'X-SS-REQ-TICKET': '1603350070957',

'sdk-version': '1',

'Cookie': 'install_id=1758845207590062; ttreq=1$34f012b99d70a66f681dc3d1f0b438fc1b161af3; d_ticket=77247c94236bf8055c233f8cabb6a5ddf3231; odin_tt=fccb20add45a15f08a2519eadcaaf22cba4b3f8f1fceec300a088407c2daf81ea76b260ef6c81dbc86dfedfea011f68c25238f9b3984fe4f5909441dfd1cc9c2; sid_guard=6de18a966e69dcbbf076f629a2ef6511%7C1603345424%7C5184000%7CMon%2C+21-Dec-2020+05%3A43%3A44+GMT; uid_tt=ba98af780b4e337f01463cf98a8afafd; sid_tt=6de18a966e69dcbbf076f629a2ef6511; sessionid=6de18a966e69dcbbf076f629a2ef6511',

'x-tt-token': '006de18a966e69dcbbf076f629a2ef651189d3f6f73fd3d6319b543d50d2e2e5a4cf3e383f8da81f07e049bcf850de07d331',

'X-Gorgon': '0404d8210000a6a3dca0dbc6b11483a82420c9a94dd050a3e511',

'X-Khronos': '1603350070',

'Host': 'aweme.nssdk.com',

'Connection': 'Keep-Alive',

'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.75 Safari/537.36',

}

self.add = 'D:\抖音评论资料'

self.videos_list = os.listdir('D:\抖音评论资料')

def parse(self):

'链接,内容,发布人昵称,发布时间,点赞数,评论数,分享数'

lists = []

for vid in self.videos_list:

a = open('D:\抖音评论资料\{}'.format(vid),encoding='utf-8')

content = json.load(a)

for con in content['data']:

meta = {}

try:

meta['title'] = con['aweme_info']['desc']

meta['author_name'] = con['aweme_info']['author']['nickname']

meta['u_name'] = con['aweme_info']['author']['unique_id']

meta['create_time'] = con['aweme_info']['create_time']

timeArray = time.localtime(meta['create_time'])

meta['create_time'] = time.strftime("%Y--%m--%d %H:%M:%S", timeArray)

meta['digg_count'] = con['aweme_info']['statistics']['digg_count']

meta['comment_count'] = con['aweme_info']['statistics']['comment_count']

meta['share_count'] = con['aweme_info']['statistics']['share_count']

meta['share_url'] = con['aweme_info']['share_url']

except:

meta['title'] = ''

meta['author_name'] = ''

meta['u_name'] = ''

meta['create_time'] = ''

meta['digg_count'] = ''

meta['comment_count'] = ''

meta['share_count'] = ''

meta['share_url'] = ''

if meta['u_name'] == '':

try:

meta['u_name'] = con['aweme_info']['music']['owner_handle']

except:

meta['u_name'] = ''

if meta['title'] == '':

pass

else:

lists.append(meta)

# print(meta)

return lists

def save_data(self, meta):

header = ['share_url', 'title', 'author_name', 'u_name', 'create_time', 'digg_count', 'comment_count', 'share_count']

print(meta)

with open('test.csv', 'a', newline='', encoding='utf-8-sig') as f:

writer = csv.DictWriter(f, fieldnames=header)

writer.writeheader() # 写入列名

writer.writerows(meta)

def run(self):

meta = self.parse()

self.save_data(meta)

if __name__ == '__main__':

douyin = Douyin()

douyin.run()

运行代码后在代码执行目录下会生成一个excel

44f141b9393e7358c5be23f089f7d27d.png

ps:抖音不会一次性返回整个评论数据包,每次往下滑动评论区会多出26条评论数据,我们就可以利用模拟器进行滑动操作。

点击 更多>鼠标宏

9920819567804627d787e27b9f71995a.png

点击录屏之后,用鼠标往下滑动一次页面

9ee03589787d4a02fd99d5bda1479f59.png

点击停止,就会将你刚才的操作保存下来

ddfac080c44f728c2d162d2ccb9812fc.png

点击设置 可以对刚才的操作进行循环播放,从而达到自动刷新评论区。

c995f213000fb8feea6c5524ec2c4ca1.pngjs

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值