python从web抓取信息

SuperYang_

已于 2023-04-26 23:26:19 修改

阅读量222

点赞数 1

分类专栏： python 文章标签： python web 抓取信息 beautifulsoup request

于 2020-12-15 00:59:30 首次发布

本文链接：https://blog.csdn.net/SuperYang_/article/details/111185674

版权

python 专栏收录该内容

10 篇文章 0 订阅

订阅专栏

webbrowser模块：open()函数可以打开一个浏览器的指定url，这大概就是webbrowser模块唯一能做的事情了

import webbrowser
webbrowser.open("www.baidu.com")

request模块：

1> 不是python自带的模块，需要安装 pip install request

2> 编写request模块是因为python的urllib2模块用起来太复杂，当你需要从Web下载东西的时候使用request就好

import requests
res = requests.get("https://jingyan.baidu.com/article/2a138328efdb44074a134fc5.html")
print(type(res))
print(res.status_code == requests.codes.ok)
print(len(res.text))
print(res.text[:250])
=======================================================================================
result:
<class 'requests.models.Response'>
True
160814
<!DOCTYPE html><html><!--STATUS OK--><head><meta http-equiv="X-UA-Compatible" content="IE=Edge" /><meta charset="utf-8" /><meta name="referrer"

检查错误：在response对象上调用raise_for_status方法，如果下载文件出错，将会抛出异常，如果下载成功就什么都不做

import requests
res = requests.get("http://inventwithpyon.com/page_that_does_not_exist")
res.raise_for_status()

BeautifulSoup模块：

1> 用于从HTML页面提取信息

2> 模块名称bs4，需要引用的时候import bs4

html.example:

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="utf-8">
    <title>The Website Title</title>
</head>
<p>Download my<strong>Python</strong></p>
<p><span id="author">Super Yang</span></p>
</body>
</html>

main.py:

import bs4
exampleFile = open("example.html")
exampleSoup = bs4.BeautifulSoup(exampleFile.read())
elems = exampleSoup.select("#author")
print(len(elems))
print(elems[0].getText())
print(str(elems[0]))
print(elems[0].attrs)
print(elems[0].get('id'))
===================================================
result:
1
Super Yang
<span id="author">Super Yang</span>
{'id': 'author'}
author

SuperYang_

关注

1
点赞
踩
2

收藏

觉得还不错? 一键收藏
打赏
0
评论
python从web抓取信息

webbrowser模块：open()函数可以打开一个浏览器的指定url，这大概就是webbrowser模块唯一能做的事情了import webbrowserwebbrowser.open("www.baidu.com")request模块：1> 不是python自带的模块，需要安装 pip install request2> 编写request模块是因为python的urllib2模块用起来太复杂，当你需要从Web下载东西的时候使用request就好import.
复制链接

扫一扫