python爬虫问题自查
hrflex
这个作者很懒,什么都没留下…
展开
-
将列表写入文件和读取(python)
【代码】将列表写入文件和读取(python)原创 2022-08-29 11:21:53 · 4976 阅读 · 0 评论 -
用python将单引号替换为双引号
1、若对象为字符串str = "{'err_no': 0,'err_str': 'OK', 'pic_id': '1169213517976400008', 'pic_str': 'xoet', 'md5': 'ca9bc4fda521498d2b3aba5dbb4ee4ac'}"json_str = str.replace("'",'"')2、若对象为字典imort jsondict = {'err_no': 0,'err_str': 'OK', 'pic_id': ..原创 2022-02-20 17:20:49 · 5222 阅读 · 0 评论 -
使用python提取JSON数据指定内容
假设我们要获取'pic_str'里的数据JSON数据{'err_no': 0, 'err_str': 'OK', 'pic_id': '1169213517976400008', 'pic_str': 'xoet', 'md5': 'ca9bc4fda521498d2b3aba5dbb4ee4ac'}1、JSON数据为字符串类型import jsonstr = "{'err_no': 0, 'err_str': 'OK', 'pic_id': '116921351797.原创 2022-02-20 15:33:32 · 23634 阅读 · 1 评论 -
如何用request模块下载rar, zip文件
import requestsurl = 'https://downsc.chinaz.net/Files/DownLoad/jianli/202201/jianli16910.rar'r = requests.get(url).contentwith open('demo1.rar','wb') as fp:fp.write(r)注意:网页编码格式要使用二进制格式,即使用content 采用get获取网页 写入格式使用二进制,即'wb'...原创 2022-02-16 14:42:40 · 1033 阅读 · 2 评论 -
Xpath中text()方法获取列表为空问题解决方法
When we use XPath to crawl web pages, we may encounter such a problem: the list printed using the text method is empty, whichcan be caused by non specified direct content.attention:text() method can only get direct content, string(.) can get whole conte.原创 2022-02-10 22:33:43 · 1960 阅读 · 0 评论 -
爬虫中文乱码问题解决方案
method 1:Using response.enconding mathodresponse = requests.get(url=url, headers=headers)response.encoding = 'utf-8'If this method has no effect, then trying the following general solution.method 2:First encode 'iso-8859-1' and then decode..原创 2022-02-10 20:33:50 · 703 阅读 · 0 评论