将爬取的文章转JSON存储遇到的坑

最新推荐文章于 2022-03-13 14:41:41 发布

最美的情郎

最新推荐文章于 2022-03-13 14:41:41 发布

阅读量562

点赞数 1

分类专栏： python 文章标签： json 爬虫json

本文链接：https://blog.csdn.net/qq_40604853/article/details/82853378

版权

python 专栏收录该内容

18 篇文章 0 订阅

订阅专栏

urls = "https://arxiv.org/%s" %(str(url))
       
        ress = requests.get(urls)
        # ress.encoding = "gbk"
        # print(ress.text.encode('utf8'))
        # exit()
        # ress.encoding= "gbk2312"
        json_response = ress.text  # 获取ress的文本 就是一个json字符串
        chapter.append({'content':json_response})
    with open('ComputerScience.json', 'a',encoding='utf-8') as fp:  # 将所得的数据存储为json文件
        fp.write(json.dumps(chapter, ensure_ascii=False, indent=4, sort_keys=True))

第一个问题：TypeError: Object of type 'Response' is not JSON serializable

因为我直接将ress请求过来的response直接传给了字典。

之前是

chapter.append({'content':ress})

改了一下就OK了：

json_response = ress.text  # 获取ress的文本 就是一个json字符串
chapter.append({'content':json_response})

参考文章：https://www.cnblogs.com/Lin-Yi/p/7640147.html

第二个问题：'utf-8' codec can't decode byte 0xbf in position 10: invalid start byte编码问题。

第三个问题：UnicodeEncodeError: 'gbk' codec can't encode character '\ufffd' in position 670: illegal multibyte sequence

之前是

with open('ComputerScience.json', 'a') as fp:

改过之后的代码是：

with open('ComputerScience.json', 'a',encoding='utf-8') as fp:

这里参考了这篇文章：https://www.cnblogs.com/themost/p/6603409.html

文章：https://blog.csdn.net/jiang_1603/article/details/77856720

最美的情郎

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录