背景:根据项目需求,有一些html 格式的log,需要对其进行解析,抓取需要的数据,并放进去excel里面方便进行分析
html 图显示如下图,部分截图
用notepad++打开 html 的JavaScript格式(部分代码)如下:
<table cellspacing="0"><tr><td><UL>
<b>Additional Test Run Data</b><UL CLASS="tob"><LI CLASS="close" ONMOUSEOVER="over(event);" ONMOUSEOUT="out(event);" ONCLICK="showIt(event);">
<span>Test Time...</span><UL>
<LI>Start Time: 17:35:07</LI>
<LI>Stop Time: 17:37:52</LI>
<LI>Duration: 00:02:45 174ms</LI>
</UL>
</LI></UL>
<UL CLASS="tob"><LI CLASS="close" ONMOUSEOVER="over(event);" ONMOUSEOUT="out(event);" ONCLICK="showIt(event);">
<span>DUT Information...</span><UL>
<LI>DUT ID: 48F3F32D8AA6</LI>
<LI>Name: DuPods Pro-73F</LI>
</UL>
</LI></UL>
<UL CLASS="tob"><LI CLASS="close" ONMOUSEOVER="over(event);" ONMOUSEOUT="out(event);" ONCLICK="showIt(event);">
<span>Script Information...</span><UL>
<LI>Name: script#3</LI>
<LI>File Path: C:/itc/Bluetooth/paraCfg/lastPara.dat</LI>
</UL>
</LI></UL>
<UL CLASS="tob"><LI CLASS="close" ONMOUSEOVER="over(event);" ONMOUSEOUT="out(event);" ONCLICK="showIt(event);">
<span>Test Station Information...</span><UL>
<LI>Name: Bluetooth Test Set</LI>
<LI>Model: ITC-RT550</LI>
<LI>SN: 186351</LI>
</UL>
</LI></UL>
</UL></td></tr></table>
<table border="1" width="80%" cellspacing="0" style="table-layout:fixed;">
<tr>
<td bgcolor="#FFFFCC" align="center"><b>Output Power</b></td>
<td bgcolor="#FFFFCC" align="center"><b>Limits</b></td>
<td bgcolor="#6699CC" align="center"><b> Summary </b></td>
</tr>
<tr>
<td align="center">Avg Max Txp</td>
<td align="center">(-6 dBm, 20 dBm)</td>
<td align="center">10.74 dBm</td>
</tr>
<tr>
<td align="center">Avg Min Txp</td>
<td align="center">(-6 dBm, 20 dBm)</td>
<td align="center">7.3 dBm</td>
</tr>
<tr>
<td align="center">Avg Txp</td>
<td align="center">(-6 dBm, 20 dBm)</td>
<td align="center">9.07 dBm</td>
</tr>
<tr>
<td align="center">Peak Txp</td>
<td align="center">< 23 dBm</td>
<td align="center">11.09 dBm</td>
</tr>
<tr>
<td align="center">Result</td>
<td align="center"> --/-- </td>
<td align="center">Pass</td>
</tr>
</table>
如上,此html是html的一些table以及tr td来表示的。通过BeautifulSoup的一些find_all的函数,来获取所有table tr以及td
from bs4 import BeautifulSoup
htmlfile = open(filepath, 'r')
htmlhandle = htmlfile.read()
soup = BeautifulSoup(htmlhandle, "lxml")
for table in soup.find_all('table'):
for tr in table.find_all('tr'):
for td in tr.find_all('td')
print(td.text)