爬虫乱码（一）

最新推荐文章于 2023-03-25 22:59:44 发布

聆听我的召唤，菜鸟进化

最新推荐文章于 2023-03-25 22:59:44 发布

阅读量142

点赞数

分类专栏：笔记爬虫文章标签： python html

本文链接：https://blog.csdn.net/qq_45889931/article/details/119858642

版权

笔记同时被 2 个专栏收录

25 篇文章 0 订阅

订阅专栏

爬虫

23 篇文章 0 订阅

订阅专栏

from lxml import etree
from pyquery import PyQuery as pq
from fake_useragent import UserAgent
import time
import json
import requests
import csv
headers={
            'User-Agent':UserAgent().random
        }
url='http://www.stats.gov.cn/tjsj/pcsj/rkpc/6rp/html/B0203.htm'
  
response=requests.get(url=url,headers=headers)
# response.encoding = 'GBK'  # 改变编码
# response.encoding = 'utf-8'
#response.encoding = 'gb2312'
page_text=response.text
tree=etree.HTML(page_text)
div_list=tree.xpath('/html/body/table//tr')
    
for div in div_list:
    try:
        title0=div.xpath('./td//text()')
        if(title0[1]=='\xa0 '):
            del title0[1]
            title0[0]=title0[0]+title0[1]
            del title0[1]
        print(title0)
        
      
    except:pass