如果有很多特殊符号不一定能处理好,需要自己调整下脚本语言
in.txt (浏览器 复制元素 内容而来)
<table class="data-table"><tbody>
<tr>
<th>Name</th>
<th>Description</th>
<th>Type</th>
<th>Default</th>
<th>Valid Values</th>
<th>Importance</th>
</tr>
<tr>
<td>blacklist</td><td>Fields to exclude. This takes precedence over the whitelist.</td><td>list</td><td>""</td><td></td><td>medium</td></tr>
<tr>
<td>renames</td><td>Field rename mappings.</td><td>list</td><td>""</td><td>list of colon-delimited pairs, e.g. <code>foo:bar,abc:xyz</code></td><td>medium</td></tr>
<tr>
<td>whitelist</td><td>Fields to include. If specified, only these fields will be used.</td><td>list</td><td>""</td><td></td><td>medium</td></tr>
</tbody></table>
python脚本
# -*- coding:utf-8 -*-
import re
from bs4 import BeautifulSoup
f = open('in.txt')
contents = f.read()
# print(contents)
f_out = open('out.md','w+')
soup = BeautifulSoup(contents, 'html5lib')
data_list = []
for idx, tr in enumerate(soup.find_all('tr')):
if idx != 0:
tds = tr.find_all('td')
row_str = "|"
for td in tds:
#print td.contents
#print type(td.contents)
td_content_list = []
for content in td.contents:
# 强制转换为 string
str2 = str(content)
# 替换 <code> </code> 为 ```
str3 = str2.replace("<code>", "```" ).replace("</code>", "```" )
#print str3
td_content_list.append(str3)
# list 转 str
td_content_str = ''.join(td_content_list)
#print td_content_str
row_str = row_str + " " + td_content_str + " |"
# row_str = row_str + " " + td.text + " |"
f_out.write(row_str + "\n")
else:
# 表头
ths = tr.find_all('th')
# tlen = len(ths)
row_str = "|"
row_str2 = "|"
for th in ths:
row_str = row_str + " " + th.text + " |"
row_str2 = row_str2 + " :- |"
f_out.write(row_str + "\n")
f_out.write(row_str2 + "\n")
f.close()
f_out.close()
转换后写入到 out.md文件中
| Name | Description | Type | Default | Valid Values | Importance |
| :- | :- | :- | :- | :- | :- |
| blacklist | Fields to exclude. This takes precedence over the whitelist. | list | "" | | medium |
| renames | Field rename mappings. | list | "" | list of colon-delimited pairs, e.g. ```foo:bar,abc:xyz``` | medium |
| whitelist | Fields to include. If specified, only these fields will be used. | list | "" | | medium |