python3今日头条App电商数据抓取

最近有几个同学让我们帮忙抓取今日头条app的数据,有的同学需要头条app 的广告数据,有的同学需要电商资讯的数据,之前已经在博客中发布过头条app的广告数据,这里我就来用电商的数据来给大家讲解。

1.想要抓到app的数据,就需要先抓到相应的接口,这里给大家推荐使用Charles工具来抓接口。具体怎么抓取接口的方法这里就不介绍了,大家可以去百度,我这里直接给出接口。

http://is.snssdk.com/api/news/feed/v88/?list_count=229&support_rn=4&category=%E7%94%B5%E5%AD%90%E5%95%86%E5%8A%A1&refer=1&refresh_reason=1&session_refresh_idx=6&count=20&min_behot_time=1545011410&last_refresh_sub_entrance_interval=1545011511&loc_mode=7&tt_from=pull&plugin_enable=3&iid=52371751146&device_id=51411696552&ac=wifi&channel=xiaomi&aid=13&app_name=news_article&version_code=699&version_name=6.9.9&device_platform=android&ab_version=629152%2C607361%2C609338%2C326532%2C644516%2C641414%2C645870%2C646379%2C645275%2C644943%2C622716%2C644221%2C621629%2C622134%2C622993%2C641037%2C649190%2C640997%2C641074%2C643790%2C631607%2C631595%2C643841%2C650077%2C554836%2C549647%2C644131%2C472443%2C649122%2C572465%2C649270%2C644058%2C615291%2C606549%2C442255%2C651222%2C645527%2C650134%2C630218%2C621153%2C546702%2C648932%2C281291%2C632887%2C641825%2C622042%2C325616%2C649524%2C642450%2C634871%2C646070%2C625065%2C498375%2C638335%2C467514%2C640046%2C644240%2C631638%2C650567%2C648895%2C648270%2C595556%2C647947%2C640690%2C611287%2C647156%2C640178%2C486952%2C642202%2C571130%2C641921%2C638882%2C594582%2C239095%2C612191%2C641905%2C170988%2C643893%2C642341%2C594603%2C374119%2C641853%2C585064%2C520833%2C634646%2C649420%2C633720%2C550042%2C435215%2C603541%2C586999%2C633860%2C627125%2C649428%2C649497%2C614096%2C620526%2C522766%2C647910%2C416055%2C621360%2C643129%2C642529%2C639579%2C643098%2C545739%2C630235%2C558139%2C586260%2C555254%2C640008%2C635502%2C471406%2C603441%2C596392%2C550820%2C598626%2C644845%2C634911%2C646250%2C603386%2C603400%2C603403%2C603405%2C642681%2C649811%2C646564%2C648850%2C589102%2C633487%2C457480%2C649401%2C639235&ab_client=a1%2Cc4%2Ce1%2Cf1%2Cg2%2Cf7&ab_group=100167&ab_feature=94563%2C102749&abflag=3&ssmix=a&device_type=MI+5X&device_brand=xiaomi&language=zh&os_api=25&os_version=7.1.2&uuid=868392038519494&openudid=a28f8cc2cde1730f&manifest_version_code=699&resolution=1080*1920&dpi=480&update_version_code=69912&_rticket=1545011511705&fp=jlTqP2Ztc2q_FlHeFrU1FYmeFSGI&tma_jssdk_version=1.5.3.9&rom_version=miui_v9_v9.6.2.0.ndbcnfd&plugin=26958&ts=1545011511&as=a2854071d7034c81474355&mas=00fee2f9cc34755ca140c408a81e07206945ec26ea06686e60&cp=54c91d7202137q1

2.拿到接口了之后,我们就可以使用python去获取到数据了

response = requests.get(url, headers=self.getHeader(), verify=False)

使用这行代码,将抓取到额url传入,头部我们可以使用

header = {"Host": "is.snssdk.com",
          "Accept-Language": "zh-Hans;q=1",
          "tt-request-time": str(int(time.time() * 1000)),
          "Connection": "keep-alive",
          "Accept-Encoding": "gzip,deflate",
          "Cookie": "CNZZDATA1272189606=1385639719-1525687011-%7C1525692411;alert_coverage=76;install_id=31781370987;ttreq=1$b79c6e66ea460b1579579c027e8073593305644e;odin_tt = 4c07858cc8b75143c593d0a99a04aa8fcf10136c3dca9badd9c31a2aa9cc415022834c64d7f52952d9290e3028876735;UM_distinctid = 1633a13d9fd41b-0910970a30f79a8-12485712-3d10d-1633a13d9fe84a;_ga=GA1.2.555016291.1525687770;_gid=GA1.2.96631484.1525687770;qh[360] = 1;__tea_sdk__ssid=957b8ce1-d5b3-4010-bd9c-bfec73bdf526;__tea_sdk__user_unique_id=6552731409432937992;tt_webid=6552731409432937992",
          "X-SS-Cookie": "CNZZDATA1272189606=1385639719-1525687011-%7C1525692411;alert_coverage = 76;install_id=31781370987;ttreq=1$b79c6e66ea460b1579579c027e8073593305644e;odin_tt=4c07858cc8b75143c593d0a99a04aa8fcf10136c3dca9badd9c31a2aa9cc415022834c64d7f52952d9290e3028876735;UM_distinctid=1633a13d9fd41b-0910970a30f79a8-12485712-3d10d-1633a13d9fe84a;_ga=GA1.2.555016291.1525687770;_gid=GA1.2.96631484.1525687770;qh[360]=1;__tea_sdk__ssid=957b8ce1-d5b3-4010-bd9c-bfec73bdf526;__tea_sdk__user_unique_id=6552731409432937992;tt_webid=6552731409432937992",
          "User-Agent": "News/6.6.5(iPhone;iOS10.2;Scale/2.00)",
          "Accept": "*/*"}

这样我们就可以获取电商类目的数据了,我们运行项目看结果,头条给我们返回的数据,这里是我使用json工具格式化,其中打他就是我们想要的电商类目的数据,其中content字段就是每条数据的详细信息。

这里我们就需要取出content里面的详细数据,代码如下:

json_list = (json.loads(jsonStr))["data"]
for json_str in json_list:
    content = json.loads(json_str["content"])
    self.savaDataInfo(content)

每条数据的信息量是很大的,我们取出需要的数据保存数据库即可,保存数据库代码如下

def savaDataInfo(self, content):
    DataInfo.title = content["title"]
    DataInfo.type = 1
    DataInfo.channel = "jinritoutiao"
    if "download_url" in content["raw_ad_data"]:
        DataInfo.appdownload = content["raw_ad_data"]["download_url"]
    self.saveBitmapUrlOrPath(content)
    DataInfo.device_type = "ios"

    DataInfo.app_name = content["source"]
    MySqlManager().insert_inspection_list(3)

插入数据库:

def insert_inspection_list(self, table_id):
    print(str(DataInfo.pic_list))
    print(str(DataInfo.pic_path))

    sql = "INSERT INTO " + self.getTableName(
        table_id) + "(title,app_download,time,channel,type,content,gif,video,source_type,pic_list,pic_path,device_type,material_size,app_name,created_at,updated_at)" \
                    " VALUES ('%s','%s','%s','%s',%d,'%s','%s','%s',%d,'%s','%s','%s','%s','%s','%s','%s')" \
          % (DataInfo.title, DataInfo.app_download, DataInfo.time, DataInfo.channel, DataInfo.type,
             DataInfo.content, json.dumps(DataInfo.gif), json.dumps(DataInfo.video), DataInfo.source_type, json.dumps(DataInfo.pic_list),
             json.dumps(DataInfo.pic_path), DataInfo.device_type,
             DataInfo.material_size,
             DataInfo.app_name, self.getCurrentTime(), self.getCurrentTime())
    cursor = self.conn.cursor()
    cursor.execute(sql)
    self.conn.commit()

 

到这里基本就完成了如何获取今日头条app的电商数据了

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值