爬取li标签下的＜a＞的 href

最新推荐文章于 2022-12-28 15:00:25 发布

scan724

最新推荐文章于 2022-12-28 15:00:25 发布

阅读量4.5k

点赞数 1

分类专栏： Python 爬虫

本文链接：https://blog.csdn.net/zhaoyangjian724/article/details/83302667

版权

Python 爬虫专栏收录该内容

26 篇文章 33 订阅 ¥39.90 ¥99.00

订阅专栏

本文介绍了如何使用Python的BeautifulSoup库解析HTML文档，特别是针对li标签下的a链接，详细讲解了如何提取a标签的href属性，以便进行网页爬虫的数据抓取工作。

摘要由CSDN通过智能技术生成

# !/usr/bin/env python
# -*- coding: utf-8 -*-
from lxml import etree

# 获取文件元素
from lxml import etree

# 获取文件元素
htmlEmt = etree.parse('test02.html')
# 获取所有的 <li> 标签
result = htmlEmt.xpath('//a/@href')
print(result)
print type(result)
for x in result:
    # print x
    # print type(x)
    # print '-------------------------'
    print x

	
C:\Python27\python.exe C:/Users/TLCB/PycharmProjects/untitled/xpath/l1.py
['aaa', 'bbb']
<type 'list'>
aaa
bbb

Process finished with exit code 0

爬取li标签下的<a> 的 href

# !/usr/bin/env python
# -*- coding: utf-8 -*-
from lxml import etree

# 获取文件元素
from lxml import etree

# 获取文件元素
htmlEmt = etree.parse('test02.html')
# 获取所有的 <li> 标签
result

了解本专栏