python爬虫--BeautifulSoup的简单用法

最新推荐文章于 2024-03-29 00:20:16 发布

蚂蚁快跑007

最新推荐文章于 2024-03-29 00:20:16 发布

阅读量477

点赞数

分类专栏： python 爬虫文章标签： BeautifulSoup python 爬虫

本文链接：https://blog.csdn.net/yuheni/article/details/51155271

版权

python 爬虫专栏收录该内容

3 篇文章 0 订阅

订阅专栏

BeautifulSoup的简单用法

#coding=utf-8

import urllib
import urllib2
import cookielib
from bs4 import BeautifulSoup
import re

url ="http://www.baidu.com"

try:
request = urllib2.Request(url, data = None)
response = urllib2.urlopen(request, timeout= 2)
except urllib2.HTTPError, e:
print e.code
except urllib2.URLError, e:
print e.reason
except:
print "Error"

data = response.read()
soup = BeautifulSoup(data,"lxml")

for tag in soup.find_all('div',class_="qrcode-text"):
for item in tag.children:
print item

find_all('div',class_="qrcode-text")方法
1、参数可以是name参数，如：’a’ ,’div’，[‘a’,’p’]，re.compie(‘^b’),True等等
2、参数可以是属性，比如：id=”link2”,href=re.compile(‘baidu’)等等
3、参数还可以是text，用于匹配Tag的string，如text=”baidu”
4、还可以混合起来使用，如上面程序所示
5、tag.children:表示tag的所有子节点，返回的是类list结构

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

蚂蚁快跑007

关注关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python爬虫--BeautifulSoup的简单用法

BeautifulSoup的简单用法#coding=utf-8 import urllibimport urllib2import cookielibfrom bs4 import BeautifulSoupimport re url ="http://www.baidu.com" try: request = urllib2.Request(
复制链接

扫一扫