python学习笔记（一）---python爬虫起步

最新推荐文章于 2024-07-15 15:37:59 发布

辉火_

最新推荐文章于 2024-07-15 15:37:59 发布

阅读量305

点赞数

文章标签： python 爬虫数据挖掘

本文链接：https://blog.csdn.net/qq_41838340/article/details/122543564

版权

python学习笔记（一）—python爬虫起步

python爬虫起步

urllib模块

import urllib.request

获取url

content = urllib.request.urlopen(url).read()#获取网页
content = content.decode('utf-8')
print(content)#

设置用户代理（爬取一些需要登陆的网站时）

headers = {
    'Accept-Language':'zh-Hans-CN, zh-Hans; q=0.5',
    'Connection':'close',
    'referer': 'https://www.baidu.com',
    'User-Agent':'Mozilla/5.0 (Linux; Android 10; HMA-AL00; HMSCore 5.3.0.312; GMSCore 20.15.16) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.93 HuaweiBrowser/11.1.2.301 Mobile Safari/537.36',
    'Upgrade-Insecure-Requests':'1',
    'Cache-Control':'max-age=0',
    'cookie':'cookie=abc;'
    }
req = urllib.request.Request(url=url,headers=headers)#设置用户代理后进行请求
content = urllib.request.urlopen(req).read()

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

辉火_

关注关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python学习笔记（一）---python爬虫起步

python学习笔记（一）—python爬虫起步python爬虫起步urllib模块import urllib.request 获取urlcontent = urllib.request.urlopen(url).read()#获取网页content = content.decode('utf-8')print(content)#设置用户代理（爬取一些需要登陆的网站时）headers = { 'Accept-Language':'zh-Hans-CN, zh-Hans; q=0
复制链接

扫一扫