避免写爬虫时出现乱码

最新推荐文章于 2024-01-18 12:40:50 发布

想做个自由的人

最新推荐文章于 2024-01-18 12:40:50 发布

阅读量506

点赞数

分类专栏：爬虫文章标签：爬虫乱码 chardet

本文链接：https://blog.csdn.net/tongjinrui/article/details/79273374

版权

爬虫专栏收录该内容

5 篇文章 0 订阅

订阅专栏

 
  def get_page_content(url): 
 
   url_content = urllib.urlopen(url).read() 
 
   char_det = chardet.detect(url_content) 
 
   get_encoding_charset = char_det[ 
  'encoding'] 
 
   if get_encoding_charset== 
  'utf-8'  
  or get_encoding_charset== 
  'UTF-8': 
 
   url_content=url_content 
 
   else: 
 
   url_content = url_content.decode( 
  'gb2312', 
  'ignore') 
  # 用来解码，在pyhton2中不用再用encoding（"utf-8") 
 
   return url_content 
 
   其中需要用的库是：chardet 和 urllib 
 
   以上的代码是基于python2.7的！

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

想做个自由的人

关注关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
打赏
0
评论
避免写爬虫时出现乱码

def get_page_content(url): url_content = urllib.urlopen(url).read() char_det = chardet.detect(url_content) get_encoding_charset = char_det['encoding'] if get_encoding_charset=='utf-8'or get_
复制链接

扫一扫