1. Json字符串转换
在平时写Python代码时,尤其是在解析网络爬虫的json数据时,常常会遇到json的类型转换,主要有以下四个方法:
json.dumps
json.loads
json.dump
json.load
json.dumps # 用于将dict类型的数据转成str
json.loads # 用于将str类型的数据转成dict
json.dump # 用于将dict类型的数据转成str,并写入到json文件中
json.load # 用于从json文件中读取数据
2. 项目文件中生成requirements.txt
在项目开发完成后,通常需要生成requirements.txt文件,这里记录一下该文件的生成方法:
# 安装
pip install pipreqs
# 在当前目录生成
pipreqs . --encoding=utf8 --force
pip install -r requirements.txt
3. URL解析
在爬虫系统的开发过程中,通常会遇到URL的参数是经过编码的,例如
entrance_info=%257B%2522request_id%2522%253A%252220200819092019010198053016517F3251%2522%252C%2522sdk_version%2522%253A%25221590%2522%252C%2522action_type%2522%253A%2522draw%2522%252C%2522room_id%2522%253A%25226862493695615503112%2522%252C%2522_param_live_platform%2522%253A%2522live%2522%252C%2522enter_from_merge%2522%253A%2522live_merge%2522%252C%2522anchor_id%2522%253A%252274384867376%2522%252C%2522enter_method%2522%253A%2522live_cover%2522%252C%2522follow_status%2522%253A%25220%2522%252C%2522enter_from%2522%253A%2522live%2522%252C%2522category_name%2522%253A%2522live_merge_temai_live_cover%2522%252C%2522carrier_type%2522%253A%2522live_list_card%2522%257D
为了方便阅读和解析参数,我们通常通过一些简单的程序来进行解码操作,下面使用的python语言来对参数进行解析处理。
from urllib import parse
import json
url = 'author_id=74384867376&sec_author_id=MS4wLjABAAAAe6O0Qg6qAOqDdGt9ebqAKafSR_ItnstxQo8nN4h5C1U&room_id=6862493695615503112&entrance_info=%257B%2522request_id%2522%253A%252220200819092019010198053016517F3251%2522%252C%2522sdk_version%2522%253A%25221590%2522%252C%2522action_type%2522%253A%2522draw%2522%252C%2522room_id%2522%253A%25226862493695615503112%2522%252C%2522_param_live_platform%2522%253A%2522live%2522%252C%2522enter_from_merge%2522%253A%2522live_merge%2522%252C%2522anchor_id%2522%253A%252274384867376%2522%252C%2522enter_method%2522%253A%2522live_cover%2522%252C%2522follow_status%2522%253A%25220%2522%252C%2522enter_from%2522%253A%2522live%2522%252C%2522category_name%2522%253A%2522live_merge_temai_live_cover%2522%252C%2522carrier_type%2522%253A%2522live_list_card%2522%257D&first_enter=false&os_api=23&device_type=MI%205s&ssmix=a&manifest_version_code=110801&dpi=240&uuid=860000000001817&app_name=aweme&version_name=11.8.0&ts=1597800075&cpu_support64=false&storage_type=0&app_type=normal&ac=wifi&host_abi=armeabi-v7a&update_version_code=11809900&channel=update&_rticket=1597800074940&device_platform=android&iid=632911030130744&version_code=110800&mac_address=08%3A00%3A27%3A02%3AE0%3A6E&cdid=ccf7c839-5038-439c-b6e1-c90497ea6e10&openudid=6a3c967dbf0500ea&device_id=2981527398675694&resolution=720*1280&os_version=6.0.1&language=zh&device_brand=Xiaomi&aid=1128&mcc_mnc=46006'
# 解码操作
url = parse.unquote(url)
res = dict(parse.parse_qsl(url))
print(json.dumps(res))