[数据工程]如何将xml转csv

最新推荐文章于 2022-12-27 21:31:39 发布

guaguastd

最新推荐文章于 2022-12-27 21:31:39 发布

阅读量1.2k

点赞数 2

分类专栏： # 工程实战文章标签：数据分析 csv xml

本文链接：https://blog.csdn.net/guaguastd/article/details/107830118

版权

本文介绍了如何将XML文件转换为CSV格式，适用于数据分析任务。通过解析XML结构并将其整理成逗号分隔值，可以方便地加载和处理数据。

摘要由CSDN通过智能技术生成

import xml.etree.ElementTree as ELT
from tqdm import tqdm

def parse_xml_to_csv(path, save_path=None):
    """
    Open xml posts dump and convert the text to a csv, tokenizing it in the process
    :param path: path to the xml document containing posts
    :return: a dataframe of processed text
    """

    # Use python's standard library to parse xml file
    doc = ELT.parse(path)
    root = doc.getroot()

    # Each row is a question
    all_rows = [row.attrib for row in root.findall('row')]

    # Using tdqm to display progress since preprocessing takes time
    for item in tqdm(all_rows):
        # Decode text from HTML
        soup = BeautifulSoup(item['Body'], features='html.parser')
        item['body_text'] = soup.get_text()

    # Create dataframe from our list of dict
    df = pd.DataFrame.from_dict(all_rows)
    if save_path:
        df.to_csv(save_path)
    return df
    
parse_xml_to_csv("MiniPosts.xml", "1.csv")

'''

MiniPosts.xml

<?xml version="1.0" encoding="utf-8"?>
<posts>
  <row Id="5" PostTypeId="1" CreationDate="2014-05-13T23:58:30.457" Score="9" ViewCount="516" Body="&lt;p&gt;I've always been interested in machine learning, but I can't figure out one thing about starting out with a simple &quot;Hello World&quot; example - how can I avoid hard-coding behavior?&lt;/p&gt;&#xA;&#xA;&lt;p&gt;For example, if I wanted to &quot;teach&quot; a bot how to avoid randomly placed obstacles, I couldn't just use relative motion, because the obstacles move around, but I don't want to hard code, say, distance, because that ruins the whole point of machine learning.&lt;/p&gt;&#xA;&#xA;&lt;p&gt;Obviously, randomly generating code would be impractical, so how could I do this?&lt;/p&gt;&#xA;" OwnerUserId="5" LastActivityDate="2014-05-14T00:36:31.077" Title="How can I do simple machine learning without hard-coding behavior?" Tags="&lt;machine-learning&gt;" AnswerCount="1" CommentCount="1" FavoriteCount="1" ClosedDate="2014-05-14T14:40:25.950" />
  <row Id="7" PostTypeId="1" AcceptedAnswerId="10" CreationDate="2014-05-14T00:11:06.457" Score="4" ViewCount="411" Body="&lt;p&gt;As a researcher and instructor, I'm looking for open-source books (or similar materials) that provide a relatively thorough overview of data science from an applied perspective. To be clear, I'm especially interested in a thorough overview that provides material suitable for a college-level course, not particular pieces or papers.&lt;/p&gt;&#xA;" OwnerUserId="36" LastEditorUserId="97" LastEditDate="2014-05-16T13:45:00.237" LastActivityDate="2014-05-16T13:45:00.237" Title="What open-source books (or other materials) provide a relatively thorough overview of data science?" Tags="&lt;education&gt;&lt;open-source&gt;" A

最低0.47元/天解锁文章

guaguastd

关注

2
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
[数据工程]如何将xml转csv

import xml.etree.ElementTree as ELTfrom tqdm import tqdmdef parse_xml_to_csv(path, save_path=None): """ Open xml posts dump and convert the text to a csv, tokenizing it in the process :param path: path to the xml document containing posts .
复制链接

扫一扫

专栏目录