python请求html,用python请求解析HTML

最新推荐文章于 2024-06-08 11:12:05 发布

weixin_39867327

最新推荐文章于 2024-06-08 11:12:05 发布

阅读量162

点赞数

文章标签： python请求html

im not a coder but i need to implement a simple HTML parser.

After a simple research i was able to implement as a given example:

from lxml import html

import requests

page = requests.get('https://URL.COM')

tree = html.fromstring(page.content)

#This will create a list of buyers:

buyers = tree.xpath('//div[@title="buyer-name"]/text()')

#This will create a list of prices

prices = tree.xpath('//span[@class="item-price"]/text()')

print 'Buyers: ', buyers

print 'Prices: ', prices

How can i use tree.xpath to parse all words ending with ".com.br" and starting with "://"

解决方案

As @nosklo pointed out here, you are looking for href tags and the associated links. A parse tree will be organized by the html elements themselves, and you find text by searching those elements specifically. For urls, this would look like so (using the lxml library in python 3.6):

from lxml import etree

from io import StringIO

import requests

# Set explicit HTMLParser

parser = etree.HTMLParser()

page = requests.get('https://URL.COM')

# Decode the page content from bytes to string

html = page.content.decode("utf-8")

# Create your etree with a StringIO object which functions similarly

# to a fileHandler

tree = etree.parse(StringIO(html), parser=parser)

# Call this function and pass in your tree

def get_links(tree):

# This will get the anchor tags

refs = tree.xpath("//a")

# Get the url from the ref

links = [link.get('href', '') for link in refs]

# Return a list that only ends with .com.br

return [l for l in links if l.endswith('.com.br')]

# Example call

links = get_links(tree)

weixin_39867327

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python请求html,用python请求解析HTML

im not a coder but i need to implement a simple HTML parser.After a simple research i was able to implement as a given example:from lxml import htmlimport requestspage = requests.get('https://URL.COM'...
复制链接

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。