I have got a html file with table ( its a large one, so only sample code is given ). I want to retrieve the values in tables. I tried the HTMLParser library from python.
I started coding like below. Then I found that the attribute "class" is same as system defined keyword. So its giving me error.
class MyHTMLParser(HTMLParser):
def handle_starttag(self, tag, attrs):
if tag == 'tr':
for class in attrs:
if class == 'Table_row'
p = MyHTMLParser()
p.feed(ht)
HTML code for table
STATION CODE | STATION NAME | SCHEDULED ARRIVAL | SCHEDULED DEPARTURE | ACTUAL/ EXPECTED ARRIVAL | ACTUAL/ EXPECTED DEPARTURE |
TVC | ORIGON | Starting Station | 05:00, 07 May 2011 | Starting Station | 05:00, 07 May 2011 |
TVP | NEY YORK | 05:04, 07 May 2011 | 05:05, 07 May 2011 | 05:04, 07 May 2011 | 05:05, 07 May 2011 |
UPDATE
How could I get data between the tags?
解决方案
Note that the documentation of the handle_starttag method states:
The tag argument is the name of the
tag converted to lower case. The attrs
argument is a list of (name, value)
pairs containing the attributes found
inside the tag’s <> brackets.
So, you're probably looking for something like:
from HTMLParser import HTMLParser
class MyHTMLParser(HTMLParser):
def handle_starttag(self, tag, attrs):
if tag == 'tr':
for name, value in attrs:
if name == 'class':
print 'Found class', value
p = MyHTMLParser()
p.feed(ht)
Prints:
Found class Table_Heading
Found class Table_row
Found class alternat_table_row
P.S. I also recommend BeautifulSoup for parsing HTML with Python.