<tbody>
<tr class="odd">
<td>5</td>
<td>20</td>
<td>
<a href="/film/6063" title="功夫瑜伽">功夫瑜伽</a>
</td>
<td>
<img src="http://img.58921.com/sites/all/movie/files/protec/00aaa311611439d303e5971a8bc659fa.png" alt="" />
</td>
<td>--</td>
<td>--</td>
<td>2017</td>
<td>
<a href="/boxoffice/history/6063" title="数据纠错">数据纠错</a>
</td>
</tr>
<tr class="even">
<td>4</td><td>21</td>
<td>
<a href="/film/8651" title="飞驰人生">飞驰人生</a>
</td>
<td>
<img src="http://img.58921.com/sites/all/movie/files/protec/b301c4763c6ca030fa346e500b9f9cc6.png" alt="" />
</td>
<td>--</td>
<td>--</td>
<td>2019</td>
<td>
<a href="/boxoffice/history/8651" title="数据纠错">数据纠错</a>
</td>
</tr>
<tr class="odd">
<td>9</td>
<td>22</td>
<td>
<a href="/film/6865" title="侏罗纪世界2">侏罗纪世界2</a>
</td>
<td>
<img src="http://img.58921.com/sites/all/movie/files/protec/6a6ee871ec8371067a77b9a2f34bd400.png" alt="" />
</td>
<td>--</td>
<td>--</td>
<td>2018</td>
<td>
<a href="/boxoffice/history/6865" title="数据纠错">数据纠错</a>
</td>
</tr>
</tbody>
from xml.dom.minidom import parse import xml.dom.minidom dom=xml.dom.minidom.parse('d:/test.xml') root=dom.documentElement movies=root.getElementsByTagName("tr") for movie in movies: if movie.hasAttribute("title"): print(movie.hasAttribute("title")) rank=movie.getElementsByTagName('td')[0] history=movie.getElementsByTagName('td')[1] title = movie.getElementsByTagName('td')[2].getElementsByTagName('a')[0] year = movie.getElementsByTagName('td')[6] print("电影名称:"+title.childNodes[0].data) print("年度排名:"+rank.childNodes[0].data) print("历史排名:"+history.childNodes[0].data) print("上映年份:"+year.childNodes[0].data)
在调试的过程中遇到过两个问题:
1、'NodeList' object has no attribute 'getElementsByTagName'
如果直接对movies进行getElementsByTagName会发生该报错,需要对m2ovies包含的内容进行轮询;
2、xml.parsers.expat.ExpatError: junk after document element:
因为用于测试的xml文件是复制过来的,漏掉了根节点所以遇到这个错,即没有<tbody></tbody>,加上就好了。