xml处理
一、格式化xml
xml_text = '<?xml version="1.0" encoding="ISO-8859-1"?><note><to>George</to><from>John</from><heading>Reminder</heading><body>Do not forget the meeting!</body></note>'
url = "http://web.chacuo.net/formatxml"
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36",
"Host": "web.chacuo.net",
"X-Requested-With": "XMLHttpRequest",
"Content-Type": "application/x-www-form-urlencoded; charset=UTF-8",
}
form_data = {"data": xml_text, "type": "format", "beforeSend": "undefined"}
resp = requests.post(url, data=form_data, headers=headers, timeout=20)
print(resp.json()['data'][0])
二、将xml转为字典(import xmltodict)
- xmltodict.parse()方法实现对xml字符串转为字典
- xmltodict.unparse()方法可以将字典转换为xml字符串
import xmltodict
format_ed_xml = '<?xml version="1.0" encoding="ISO-8859-1"?><note><to>George</to><from>John</from><heading>Reminder</heading><body>Do not forget the meeting!</body></note>'
dict_xml = xmltodict.parse(format_ed_xml)
print(dict_xml)
三、 xml.parsers.expat.ExpatError: XML or text declaration not at start of entity报错解决方法
- 第一种:按步骤一的方式,先将xml字符串格式化,然后再转字典;
- 第二种:如果按步骤一格式化后仍有错误,原因在于一些非法字符诸如:< , > , &等被xml误认为是标签,但没有找到成对的,此时按提示多少行多少列将这些字符替换即可(
可将格式化后的文本保存为xml文件,用Notepad++打开,修改保存会提示哪行出错,右下角有行列的显示
)
- 第三种:用IE浏览器打开步骤xml文件,然后复制xml内容再去格式化,应该就可以解决问题,如还有报错,按第二种方法解决报错
四、完整代码如下
import requests
import xmltodict
def pretty_xml(text: str) -> str:
"""
将未格式化的xml字符串格式化
:param text: 待格式化的xml字符串
:return: 格式化好的字符串
"""
url = "http://web.chacuo.net/formatxml"
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36",
"Host": "web.chacuo.net",
"X-Requested-With": "XMLHttpRequest",
"Content-Type": "application/x-www-form-urlencoded; charset=UTF-8",
}
form_data = {"data": text, "type": "format", "beforeSend": "undefined"}
resp = requests.post(url, data=form_data, headers=headers, timeout=20)
print(resp.json()['data'][0])
return resp.json()['data'][0]
def save_xml(pretty_xml_str: str):
"""将xml存入xml文件"""
with open("test.xml", "w", encoding="utf-8") as fp:
fp.write(pretty_xml_str)
def xml_to_dict(format_ed_xml: str):
"""将xml转为字典"""
dict_xml = xmltodict.parse(format_ed_xml)
print(f"\n>>>>{dict_xml['note']['body']}")
if __name__ == "__main__":
xml_text = '<?xml version="1.0" encoding="ISO-8859-1"?><note><to>George</to><from>John</from><heading>Reminder</heading><body>Do not forget the meeting!</body></note>'
format_xml = pretty_xml(xml_text)
xml_to_dict(format_xml)
save_xml(format_xml)