python bs4模块属性方法

最新推荐文章于 2024-07-17 14:11:27 发布

天黑前最后的余辉

最新推荐文章于 2024-07-17 14:11:27 发布

阅读量716

点赞数

分类专栏： python基础

本文链接：https://blog.csdn.net/a961634066/article/details/118089932

版权

python基础专栏收录该内容

46 篇文章 1 订阅

订阅专栏

BeautifulSoup对象表示的是一个文档的全部内容。

一. 可以通过点属性的方式获取tag标签，通过点属性的方式只能获取当前名字的第一个tag

# -*-coding:utf-8 -*-

import urllib2
import chardet

from bs4 import BeautifulSoup

re = urllib2.Request('https://www.baidu.com')

response = urllib2.urlopen(re)

print "查看响应信息类型: %s" % type(response)

page = response.read()
print "内容编码格式：%s" % chardet.detect(page)
print(page.decode('utf-8'))

soup = BeautifulSoup(page, features="html.parser")
ht = soup.body
print ("string:", ht.get_text())
print ("string:", ht.string)
print ("name:", ht.name)
print ("text:", ht.text)
print ("contents:", ht.contents)
print ("attrs:", ht.attrs)
for item in soup.descendants:
    print item.name

属性：

1. 获取标签内容

get_text()、text、string，内容为空时，string获取到的是None

2.获取标签名

name

3.获取表现属性/获取属性值

attrs、ht["class"]

4. 返回一个列表，可继续获取子节点

contents

5.返回一个生成器，获取子节点

descedamts

6.select选择器

select()

二、可以通过搜索方法搜索文档树

方法：常用的两个方法为find()、find_all()、findAll() =find_all()

可以通过搜索方法搜索文档树
find_all(self, name=None, attrs={}, recursive=True, text=None, limit=None, **kwargs)

只返回第一个找到的标签

find(self, name=None, attrs={}, recursive=True, text=None, **kwargs)

天黑前最后的余辉

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录