python如何解析table标签_如何使用Python解析带有表的HTML文件

I have got a html file with table ( its a large one, so only sample code is given ). I want to retrieve the values in tables. I tried the HTMLParser library from python.

I started coding like below. Then I found that the attribute "class" is same as system defined keyword. So its giving me error.

class MyHTMLParser(HTMLParser):

def handle_starttag(self, tag, attrs):

if tag == 'tr':

for class in attrs:

if class == 'Table_row'

p = MyHTMLParser()

p.feed(ht)

HTML code for table

STATION CODESTATION NAMESCHEDULED ARRIVALSCHEDULED DEPARTUREACTUAL/ EXPECTED ARRIVALACTUAL/ EXPECTED DEPARTURE
TVC ORIGONStarting Station 05:00, 07 May 2011Starting Station05:00, 07 May 2011
TVP NEY YORK05:04, 07 May 201105:05, 07 May 201105:04, 07 May 201105:05, 07 May 2011

UPDATE

How could I get data between the tags?

解决方案

Note that the documentation of the handle_starttag method states:

The tag argument is the name of the

tag converted to lower case. The attrs

argument is a list of (name, value)

pairs containing the attributes found

inside the tag’s <> brackets.

So, you're probably looking for something like:

from HTMLParser import HTMLParser

class MyHTMLParser(HTMLParser):

def handle_starttag(self, tag, attrs):

if tag == 'tr':

for name, value in attrs:

if name == 'class':

print 'Found class', value

p = MyHTMLParser()

p.feed(ht)

Prints:

Found class Table_Heading

Found class Table_row

Found class alternat_table_row

P.S. I also recommend BeautifulSoup for parsing HTML with Python.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值