关于爬取今日头条图片中的链接的提取(ajax)

在爬取今日头条的图片时,由于今日头条用了ajax加载图片,所以,通过re模块来对链接进行提取,但是在提取的过程中,遇到了一点小问题,如图:

['"{\\"count\\":9,\\"sub_images\\":[{\\"url\\":\\"http:\\\\/\\\\/p3.pstatp.com\\\\/origin\\\\/tuchong.fullscreen\\\\/418185332_tt\\",\\"width\\":1200,\\"url_list\\":[{\\"url\\":\\"http:\\\\/\\\\/p3.pstatp.com\\\\/origin\\\\/tuchong.fullscreen\\\\/418185332_tt\\"},{\\"url\\":\\"http:\\\\/\\\\/pb9.pstatp.com\\\\/origin\\\\/tuchong.fullscreen\\\\/418185332_tt\\"},{\\"url\\":\\"http:\\\\/\\\\/pb1.pstatp.com\\\\/origin\\\\/tuchong.fullscreen\\\\/418185332_tt\\"}],\\"uri\\":\\"origin\\\\/tuchong.fullscreen\\\\/418185332_tt\\",\\"height\\":1200},{\\"url\\":\\"http:\\\\/\\\\/p3.pstatp.com\\\\/origin\\\\/tuchong.fullscreen\\\\/529858694_tt\\",\\"width\\":1200,\\"url_list\\":[{\\"url\\":\\"http:\\\\/\\\\/p3.pstatp.com\\\\/origin\\\\/tuchong.fullscreen\\\\/529858694_tt\\"},{\\"url\\":\\"http:\\\\/\\\\/pb9.pstatp.com\\\\/origin\\\\/tuchong.fullscreen\\\\/529858694_tt\\"},{\\"url\\":\\"http:\\\\/\\\\/pb1.pstatp.com\\\\/origin\\\\/tuchong.fullscreen\\\\/529858694_tt\\"}],\\"uri\\":\\"origin\\\\/tuchong.fullscreen\\\\/529858694_tt\\",\\"height\\":1200},{\\"url\\":\\"http:\\\\/\\\\/p3.pstatp.com\\\\/origin\\\\/tuchong.fullscreen\\\\/374079621_tt\\",\\"width\\":1200,\\"url_list\\":[{\\"url\\":\\"http:\\\\/\\\\/p3.pstatp.com\\\\/origin\\\\/tuchong.fullscreen\\\\/374079621_tt\\"},{\\"url\\":\\"http:\\\\/\\\\/pb9.pstatp.com\\\\/origin\\\\/tuchong.fullscreen\\\\/374079621_tt\\"},{\\"url\\":\\"http:\\\\/\\\\/pb1.pstatp.com\\\\/origin\\\\/tuchong.fullscreen\\\\/374079621_tt\\"}],\\"uri\\":\\"origin\\\\/tuchong.fullscreen\\\\/374079621_tt\\",\\"height\\":1200},{\\"url\\":\\"http:\\\\/\\\\/p3.pstatp.com\\\\/origin\\\\/tuchong.fullscreen\\\\/583008374_tt\\",\\"width\\":1200,\\"url_list\\":[{\\"url\\":\\"http:\\\\/\\\\/p3.pstatp.com\\\\/origin\\\\/tuchong.fullscreen\\\\/583008374_tt\\"},{\\"url\\":\\"http:\\\\/\\\\/pb9.pstatp.com\\\\/origin\\\\/tuchong.fullscreen\\\\/583008374_tt\\"},{\\"url\\":\\"http:\\\\/\\\\/pb1.pstatp.com\\\\/origin\\\\/tuchong.fullscreen\\\\/583008374_tt\\"}],\\"uri\\":\\"origin\\\\/tuchong.fullscreen\\\\/583008374_tt\\",\\"height\\":1200},{\\"url\\":\\"http:\\\\/\\\\/p3.pstatp.com\\\\/origin\\\\/tuchong.fullscreen\\\\/458686594_tt\\",\\"width\\":1200,\\"url_list\\":[{\\"url\\":\\"http:\\\\/\\\\/p3.pstatp.com\\\\/origin\\\\/tuchong.fullscreen\\\\/458686594_tt\\"},{\\"url\\":\\"http:\\\\/\\\\/pb9.pstatp.com\\\\/origin\\\\/tuchong.fullscreen\\\\/458686594_tt\\"},{\\"url\\":\\"http:\\\\/\\\\/pb1.pstatp.com\\\\/origin\\\\/tuchong.fullscreen\\\\/458686594_tt\\"}],\\"uri\\":\\"origin\\\\/tuchong.fullscreen\\\\/458686594_tt\\",\\"height\\":1200},{\\"url\\":\\"http:\\\\/\\\\/p3.pstatp.com\\\\/origin\\\\/tuchong.fullscreen\\\\/147390595_tt\\",\\"width\\":1200,\\"url_list\\":[{\\"url\\":\\"http:\\\\/\\\\/p3.pstatp.com\\\\/origin\\\\/tuchong.fullscreen\\\\/147390595_tt\\"},{\\"url\\":\\"http:\\\\/\\\\/pb9.pstatp.com\\\\/origin\\\\/tuchong.fullscreen\\\\/147390595_tt\\"},{\\"url\\":\\"http:\\\\/\\\\/pb1.pstatp.com\\\\/origin\\\\/tuchong.fullscreen\\\\/147390595_tt\\"}],\\"uri\\":\\"origin\\\\/tuchong.fullscreen\\\\/147390595_tt\\",\\"height\\":1200},{\\"url\\":\\"http:\\\\/\\\\/p1.pstatp.com\\\\/origin\\\\/tuchong.fullscreen\\\\/22543963_tt\\",\\"width\\":1200,\\"url_list\\":[{\\"url\\":\\"http:\\\\/\\\\/p1.pstatp.com\\\\/origin\\\\/tuchong.fullscreen\\\\/22543963_tt\\"},{\\"url\\":\\"http:\\\\/\\\\/pb3.pstatp.com\\\\/origin\\\\/tuchong.fullscreen\\\\/22543963_tt\\"},{\\"url\\":\\"http:\\\\/\\\\/pb9.pstatp.com\\\\/origin\\\\/tuchong.fullscreen\\\\/22543963_tt\\"}],\\"uri\\":\\"origin\\\\/tuchong.fullscreen\\\\/22543963_tt\\",\\"height\\":1200},{\\"url\\":\\"http:\\\\/\\\\/p3.pstatp.com\\\\/origin\\\\/tuchong.fullscreen\\\\/552992907_tt\\",\\"width\\":1200,\\"url_list\\":[{\\"url\\":\\"http:\\\\/\\\\/p3.pstatp.com\\\\/origin\\\\/tuchong.fullscreen\\\\/552992907_tt\\"},{\\"url\\":\\"http:\\\\/\\\\/pb9.pstatp.com\\\\/origin\\\\/tuchong.fullscreen\\\\/552992907_tt\\"},{\\"url\\":\\"http:\\\\/\\\\/pb1.pstatp.com\\\\/origin\\\\/tuchong.fullscreen\\\\/552992907_tt\\"}],\\"uri\\":\\"origin\\\\/tuchong.fullscreen\\\\/552992907_tt\\",\\"height\\":1200},{\\"url\\":\\"http:\\\\/\\\\/p3.pstatp.com\\\\/origin\\\\/tuchong.fullscreen\\\\/420610157_tt\\",\\"width\\":1200,\\"url_list\\":[{\\"url\\":\\"http:\\\\/\\\\/p3.pstatp.com\\\\/origin\\\\/tuchong.fullscreen\\\\/420610157_tt\\"},{\\"url\\":\\"http:\\\\/\\\\/pb9.pstatp.com\\\\/origin\\\\/tuchong.fullscreen\\\\/420610157_tt\\"},{\\"url\\":\\"http:\\\\/\\\\/pb1.pstatp.com\\\\/origin\\\\/tuchong.fullscreen\\\\/420610157_tt\\"}],\\"uri\\":\\"origin\\\\/tuchong.fullscreen\\\\/420610157_tt\\",\\"height\\":1200}],\\"max_img_width\\":1200,\\"labels\\":[\\"\\\\u6444\\\\u5f71\\"],\\"sub_abstracts\\":[\\" \\\\u6444\\\\u5f71\\\\uff1a\\\\u61d2\\\\u4ebade\\\\u903b\\\\u8f91\\",\\" \\",\\" \\",\\" \\",\\" \\",\\" \\",\\" \\",\\" \\",\\" \\"],\\"sub_titles\\":[\\"\\\\u56fe\\\\u866b\\\\u8857\\\\u62cd\\\\u6444\\\\u5f71\\\\uff1a\\\\u8857\\\\u62cd06\\",\\"\\\\u56fe\\\\u866b\\\\u8857\\\\u62cd\\\\u6444\\\\u5f71\\\\uff1a\\\\u8857\\\\u62cd06\\",\\"\\\\u56fe\\\\u866b\\\\u8857\\\\u62cd\\\\u6444\\\\u5f71\\\\uff1a\\\\u8857\\\\u62cd06\\",\\"\\\\u56fe\\\\u866b\\\\u8857\\\\u62cd\\\\u6444\\\\u5f71\\\\uff1a\\\\u8857\\\\u62cd06\\",\\"\\\\u56fe\\\\u866b\\\\u8857\\\\u62cd\\\\u6444\\\\u5f71\\\\uff1a\\\\u8857\\\\u62cd06\\",\\"\\\\u56fe\\\\u866b\\\\u8857\\\\u62cd\\\\u6444\\\\u5f71\\\\uff1a\\\\u8857\\\\u62cd06\\",\\"\\\\u56fe\\\\u866b\\\\u8857\\\\u62cd\\\\u6444\\\\u5f71\\\\uff1a\\\\u8857\\\\u62cd06\\",\\"\\\\u56fe\\\\u866b\\\\u8857\\\\u62cd\\\\u6444\\\\u5f71\\\\uff1a\\\\u8857\\\\u62cd06\\",\\"\\\\u56fe\\\\u866b\\\\u8857\\\\u62cd\\\\u6444\\\\u5f71\\\\uff1a\\\\u8857\\\\u62cd06\\"]}"']

提取出来的文本全部都转义了的,解决方法也十分的简单,用replace来进行替换:

replace('\\\\','\\')

replace('\\"','"')

 

然后用json.loads(),将str 转换为dict

 

这样,就可以获得正常的json数据了

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值