python初学者学习笔记（一）简单的爬虫

最新推荐文章于 2022-08-01 14:13:03 发布

zhutou_

最新推荐文章于 2022-08-01 14:13:03 发布

阅读量285

点赞数

分类专栏： python笔记文章标签： python 爬虫

本文链接：https://blog.csdn.net/weixin_37608233/article/details/61414680

版权

python笔记专栏收录该内容

5 篇文章 0 订阅

订阅专栏

这只是我的一个学习笔记，方便以后回来复习。。。
每个程序都是运行过的

#-*- coding:utf8 -*-
import requests
from lxml import etree

cook = {"cookie": "这里填写从fiddler中得到的cookie"}
url = 'http://weibo.cn/'
html = requests.get(url, cookies=cook).content
print(html)
selector = etree.HTML(html)
content = selector.xpath('//span[@class="ctt"]')
for each in content:
    text = each.xpath('string(.)')
    b = 1
    print(text)

1、建一个字典来存放cookies
用.cn的原因是手机版页面比较好爬取，反正电脑版的网页用这个小程序爬不到><,事实上电脑版和手机版的内容是一样的
2、接着创建一个字符串来存url
3、用requests.get（这是最基本的GET请求）将网页的内容提取下来
这里用content的原因是：用txt会出现乱码，具体原因给忘了
其他的的请求我还没试过….不过形式好像是一样的：

requests.post(“网站/post”) #POST请求
requests.put(“网站/put”) #PUT请求
requests.delete(“网站/delete”) #DELETE请求
requests.head(“网站/get”) #HEAD请求
requests.options(“网站/get”) #OPTIONS请求

4、用etree.HTML来解析网页数据
5、用正则表达式将网页的中文部分提取并打印

zhutou_

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python初学者学习笔记（一）简单的爬虫

这只是我的一个学习笔记，方便以后回来复习。。。每个程序都是运行过的#-*- coding:utf8 -*-import requestsfrom lxml import etreecook = {"cookie": "这里填写从fiddler中得到的cookie"}url = 'http://weibo.cn/'html = requests.get(url, cookies=cook
复制链接

扫一扫