Spark读写XML文件及注意事项

最近有粉丝问浪尖spark 如何读写xml格式的文件,尤其是嵌套型的,spark本身是不支持xml格式文件读取的,但是databricks开源了一个jar,支持xml文件的读写,浪尖这里给大家介绍一下用法。

与此类似浪尖以前在星球里也讲过如何读取tar文件,思路跟这个差不多。

导入依赖包

小版本已经到了0.9了

<dependency>
     <groupId>com.databricks</groupId>
     <artifactId>spark-xml_2.11</artifactId>
     <version>0.9.0</version>
 </dependency>

XML文件示例

下面是一个关于书籍的XML文件示例:

<?xml version="1.0"?><catalog>   <book id="bk101">      <author>Gambardella, Matthew</author>      <title>XML Developer's Guide</title>      <genre>Computer</genre>      <price>44.95</price>      <publish_date>2000-10-01</publish_date>      <description>

         An in-depth look at creating applications         with XML.This manual describes Oracle XML DB, and how you can use it to store, generate, manipulate, manage,         and query XML data in the database.

         After introducing you to the heart of Oracle XML DB, namely the XMLType framework and Oracle XML DB repository,         the manual provides a brief introduction to design criteria to consider when planning your Oracle XML DB         application. It provides examples of how and where you can use Oracle XML DB.

         The manual then describes ways you can store and retrieve XML data using Oracle XML DB, APIs for manipulating         XMLType data, and ways you can view, generate, transform, and search on existing XML data. The remainder of         the manual discusses how to use Oracle XML DB repository, including versioning and security,         how to access and manipulate repository resources using protocols, SQL, PL/SQL, or Java, and how to manage         your Oracle XML DB application using Oracle Enterprise Manager. It also introduces you to XML messaging and         Oracle Streams Advanced Queuing XMLType support.      </description>   </book>   <book id="bk102">      <author>Ralls, Kim</author>      <title>Midnight Rain</title>      <genre>Fantasy</genre>      <price>5.95</price>      <publish_date>2000-12-16</publish_date>      <description>A former architect battles corporate zombies,       an evil sorceress, and her own childhood to become queen       of the world.</description>   </book>   <book id="bk103">      <author>Corets, Eva</author>      <title>Maeve Ascendant</title>      <genre>Fantasy</genre>      <price>5.95</price>      <publish_date>2000-11-17</publish_date>      <description>After the collapse of a nanotechnology       society in England, the young survivors lay the       foundation for a new society.</description>   </book>   <book id="bk104">      <author>Corets, Eva</author>      <title>Oberon's Legacy</title>      <genre>Fantasy</genre>      <price>5.95</price>      <publish_date>2001-03-10</publish_date>      <description>In post-apocalypse England, the mysterious       agent known only as Oberon helps to create a new life       for the inhabitants of London. Sequel to Maeve       Ascendant.</description>   </book>   <book id="bk105">      <author>Corets, Eva</author>      <title>The Sundered Grail</title>      <genre>Fantasy</genre>      <price>5.95</price>      <publish_date>2001-09-10</publish_date>      <description>The two daughters of Maeve, half-sisters,       battle one another for control of England. Sequel to       Oberon's Legacy.</description>   </book>   <book id="bk106">      <author>Randall, Cynthia</author>      <title>Lover Birds</title>      <genre>Romance</genre>      <price>4.95</price>      <publish_date>2000-09-02</publish_date>      <description>When Carla meets Paul at an ornithology       conference, tempers fly as feathers get ruffled.</description>   </book>   <book id="bk107">      <author>Thurman, Paula</author>      <title>Splish Splash</title>      <genre>Romance</genre>      <price>4.95</price>      <publish_date>2000-11-02</publish_date>      <description>A deep sea diver finds true love twenty       thousand leagues beneath the sea.</description>   </book>   <book id="bk108">      <author>Knorr, Stefan</author>      <title>Creepy Crawlies</title>      <genre>Horror</genre>      <price>4.95</price>      <publish_date>2000-12-06</publish_date>      <description>An anthology of horror stories about roaches,      centipedes, scorpions  and other insects.</description>   </book>   <book id="bk109">      <author>Kress, Peter</author>      <title>Paradox Lost</title>      <genre>Science Fiction</genre>      <price>6.95</price>      <publish_date>2000-11-02</publish_date>      &l
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值