scala html 格式化,xhtml - Scala and HTML parsing - Stack Overflow

该代码段展示了如何使用TagSoup库解析XML文档,创建节点和处理元素。TagSoup是一个解析不规范HTML的库,它遵循XML语法并允许进行XPath和CSS选择。这段代码实现了XML的SAX解析,并提供了创建元素、文本节点和处理指令的方法。
摘要由CSDN通过智能技术生成

/*

Copyright (c) 2008 Florian Hars, BIK Aschpurwis+Behrens GmbH, Hamburg

Copyright (c) 2002-2008 EPFL, Lausanne, unless otherwise specified.

All rights reserved.

This software was developed by the Programming Methods Laboratory of the

Swiss Federal Institute of Technology (EPFL), Lausanne, Switzerland.

Permission to use, copy, modify, and distribute this software in source

or binary form for any purpose with or without fee is hereby granted,

provided that the following conditions are met:

1. Redistributions of source code must retain the above copyright

notice, this list of conditions and the following disclaimer.

2. Redistributions in binary form must reproduce the above copyright

notice, this list of conditions and the following disclaimer in the

documentation and/or other materials provided with the distribution.

3. Neither the name of the EPFL nor the names of its contributors

may be used to endorse or promote products derived from this

software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND

ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE

IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE

ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE

FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL

DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR

SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER

CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT

LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY

OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF

SUCH DAMAGE.

*/

package tagsoup

import org.xml.sax.InputSource

import javax.xml.parsers.SAXParser

import org.ccil.cowan.tagsoup.jaxp.SAXFactoryImpl

import scala.xml.parsing.FactoryAdapter

import scala.xml._

class TagSoupFactoryAdapter extends FactoryAdapter {

val parserFactory = new SAXFactoryImpl

parserFactory.setNamespaceAware(false)

val emptyElements = Set("area", "base", "br", "col", "hr", "img",

"input", "link", "meta", "param")

/** Tests if an XML element contains text.

* @return true if element named localName contains text.

*/

def nodeContainsText(localName: String) = !(emptyElements contains localName)

/** creates a node.

*/

def createNode(pre:String, label: String, attrs: MetaData,

scpe: NamespaceBinding, children: List[Node] ): Elem = {

Elem( pre, label, attrs, scpe, children:_* );

}

/** creates a text node

*/

def createText( text:String ) =

Text( text );

/** Ignore Processing Instructions

*/

def createProcInstr(target: String, data: String) = Nil

/** load XML document

* @param source

* @return a new XML document object

*/

override def loadXML(source: InputSource) = {

val parser: SAXParser = parserFactory.newSAXParser()

scopeStack.push(TopScope)

parser.parse(source, this)

scopeStack.pop

rootElem

}

}

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值