python读取html,如何使用Python获取HTML文件？

最新推荐文章于 2024-04-26 15:09:14 发布

何处话死两个

最新推荐文章于 2024-04-26 15:09:14 发布

阅读量377

点赞数

文章标签： python读取html

I am not very familiar with Python. I am trying to extract the artist names (for a start :)) from the following page: http://www.infolanka.com/miyuru_gee/art/art.html.

How do I retrieve the page? My two main concerns are; what functions to use and how to filter out useless links from the page?

解决方案

Example using urlib and lxml.html:

import urllib

from lxml import html

url = "http://www.infolanka.com/miyuru_gee/art/art.html"

page = html.fromstring(urllib.urlopen(url).read())

for link in page.xpath("//a"):

print "Name", link.text, "URL", link.get("href")

output >>

[('Aathma Liyanage', 'athma.html'),

('Abewardhana Balasuriya', 'abewardhana.html'),

('Aelian Thilakeratne', 'aelian_thi.html'),

('Ahamed Mohideen', 'ahamed.html'),

]

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

何处话死两个

关注关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python读取html,如何使用Python获取HTML文件？

I am not very familiar with Python. I am trying to extract the artist names (for a start :)) from the following page: http://www.infolanka.com/miyuru_gee/art/art.html.How do I retrieve the page? My tw...
复制链接

扫一扫