爬虫（静态页面）

最新推荐文章于 2023-12-22 02:49:36 发布

Albin2015

最新推荐文章于 2023-12-22 02:49:36 发布

阅读量116

点赞数

文章标签：爬虫 python

原文链接：http://www.cnblogs.com/17storyteller/p/6826587.html

版权

今天所学：爬虫
其实这个东西搞了好久
材料：python chrome BeautifulSoup requests
过程：
1：取出库
2：请求
3：用beautiful来处理数据

难点
1：编码问题
妈蛋，这个以后一定要全部明白
常见解决方式：
import io
import sys
sys.stdout = io.TextIOWrapper(sys.stdout.buffer,encoding='gb18030')

# -*- coding: utf-8 -*-

f=open('nee.txt','w',encoding='utf-8')

content = content.decode('gbk', 'ignore') #将gbk编码转为unicode编码
content = content.encode('utf-8', 'ignore') #将unicode编码转为utf-8编码
2：BeautifulSoup的使用
嗯今天就用了BeautifulSoup（）， .select（），.text
.select（表格（h1/a））（#id）（.class 类）[字典属性]
以上常用方法
我擦，学了一天总结只有这么点。。。
ps：明天开始分布式爬虫

转载于:https://www.cnblogs.com/17storyteller/p/6826587.html