使用DOM4J解析XML及采用Schema校验的方法
Validation
Currentlydom4jdoesnotcomewithavalidationengine.Youareforcedtouseaexternalvalidator(译:dom4j无校验引擎,需使用外部校验).InthepastwerecommendedXerces,butnowyouareabletouseSunMulti-SchemaXMLValidator(原来推荐Xerces,但是现在推荐Sun的复合描述XML校验器).XercesisabletovalidateagainstDTDsandXMLSchema,butnotagainstTREXorRelax.TheSunsMultiSchemaValidator(MSV)supportsallmentionedkindsofvalidation(Xerces可以按DTD和Schema标准解析,但是不能够根据TREX和Relax标准解析,Sun的复合描述XML校验器可以支持所有上面提到的校验器).
Validationconsumesvaluableresources.Useitwisely.(有选择的使用校验器,下面介绍两种)
第一种:ApachesXerces1.4.x+dom4j
UsingApachesXerces1.4.xanddom4jforvalidation
ItiseasytouseXerces1.4.xforvalidation.DownloadXercesfromApachesXMLwebsites.Experienceshowsthatthenewestversionisnotalwaysthebest.ViewXercesmailinglistsinordertofindoutissueswithspecificversions.XercesprovidesSchemasupportstrartingfrom1.4.0.
Turnonvalidationmode-whichisfalsefordefault-usingaSAXReaderinstance(打开SAXReader的校验模式,默认为不打开的)
SetthefollowingXercespropertyhttp://apache.org/xml/properties/schema/external-noNamespaceSchemaLocationusingtheschemaURI.
以下是校验示例
CreateaSAXXMLErrorHandlerandinstallittoyourSAXReaderinstance.
ParseandvalidatetheDocument.
OutputValidation/Parsingerrors.
importorg.dom4j.Document;
importorg.dom4j.Element;
importorg.dom4j.io.OutputFormat;
importorg.dom4j.io.SAXReader;
importorg.dom4j.io.XMLWriter;
importorg.dom4j.util.XMLErrorHandler;
importorg.xml.sax.ErrorHandler;
importorg.xml.sax.SAXParseException
publicclassSimpleValidationDemo{
publicstaticvoidmain(String[]args){
SAXReaderreader=newSAXReader();
reader.setValidation(true);
//specifytheschematouse
reader.setProperty(
"http://apache.org/xml/properties/schema/external-noNamespaceSchemaLocation",
"prices.xsd"
);
//adderrorhandlerwhichturnsanyerrorsintoXML
XMLErrorHandlererrorHandler=newXMLErrorHandler();
reader.setErrorHandler(errorHandler);
//parsethedocument
Documentdocument=reader.read(args[0]);
//outputtheerrorsXML
XMLWriterwriter=newXMLWriter(OutputFormat.createPrettyPrint());
writer.write(errorHandler.getErrors());
}
Both,XerecsandCrimson,areJaXPableparsers.BecarefulwhileusingCrimsonandXercesinsameclasspath.XerceswillworkcorrectlyonlywhenitisspecifiedinclasspathbeforeCrimson.AtthistimeIrecommendthatyoushouldeitherXerecesorCrimson.
第二种:MSV+dom4j(完美组合)
Aperfectteam-MultiSchemaValidatorMSVanddom4j
KohsukeKawaguchiadeveloperfromSuncreatedaextremlyusefulltoolforXMLvalidation.MultiSchemaValidator(MSV)supportsfollowingspecifications:
RelaxNG
Relax
TREX
XMLDTDs
XMLSchema
CurrentlyitsnotclearwhetherXMLSchemawillbethenextstandardforvalidation.RelaxNGhasanevermoregrowinglobby.IfyouwanttobuildaopenapplicationthatisnotfixedtoaspecificXMLparserandspecifictypeofXMLvalidationyoushouldusethispowerfulltool.AsusageofMSVisnottrivialthenextsectionshowshowtouseitinsimplerway.
SimplifiedMulti-SchemaValidationbyusingJavaAPIforRELAXVerifiers(JARV)
TheJavaAPIforRELAXVerifiersJARVdefinesasetofInterfacesandprovideaschemataandvendorneutralAPIforvalidationofXMLdocuments.TheaboveexplainedMSVoffersaFactorythatsupportsJARV.SoyoucanusetheJARVAPIontopofMSVanddom4jtovalidateadom4jdocuments.
importorg.iso_relax.verifier.Schema;
importorg.iso_relax.verifier.Verifier;
importorg.iso_relax.verifier.VerifierFactory;
importorg.iso_relax.verifier.VerifierHandler;
importcom.sun.msv.verifier.jarv.TheFactoryImpl;
importorg.apache.log4j.Category;
importorg.dom4j.Document;
importorg.dom4j.io.SAXWriter;
importorg.xml.sax.ErrorHandler;
importorg.xml.sax.SAXParseException;
publicclassValidator{
privatefinalstaticCATEGORY=Category.getInstance(Validator.class);
privateStringschemaURI;
privateDocumentdocument;
publicValidator(Documentdocument,StringschemaURI){
this.schemaURI=schemaURI;
this.document=document;
}
publicbooleanvalidate()throwsException{
//(1)useautodetectionofschemas
VerifierFactoryfactory=newcom.sun.msv.verifier.jarv.TheFactoryImpl();
Schemaschema=factory.compileSchema(schemaURI);
//(2)configureaVertifier
Verifierverifier=schema.newVerifier();
verifier.setErrorHandler(
newErrorHandler(){
publicvoiderror(SAXParseExceptionsaxParseEx){
CATEGORY.error("Errorduringvalidation.",saxParseEx);
}
publicvoidfatalError(SAXParseExceptionsaxParseEx){
CATEGORY.fatal("Fatalerrorduringvalidation.",saxParseEx);
}
publicvoidwarning(SAXParseExceptionsaxParseEx){
CATEGORY.warn(saxParseEx);
}
}
);
//(3)startingvalidationbyresolvingthedom4jdocumentintosax
VerifierHandlerhandler=verifier.getVerifierHandler();
SAXWriterwriter=newSAXWriter(handler);
writer.write(document);
returnhandler.isValid();
}
}
}
Thewholeworkintheaboveexampleisdoneinvalidate()method.ForemostthewecreateaFactoryinstanceanduseittocreateaJAVRorg.iso_relax.verifier.Schemainstance.Insecondstepwecreateandconfigureaorg.iso_relax.verifier.Verifierusingaorg.sax.ErrorHandler.IuseApachesLog4jAPItologpossibleerrors.YoucanalsouseSystem.out.println()or,dependingoftheapplicationsdesiredrobustness,anyothermethodtoprovideinformationaboutfailures.Thirdandlaststepresolvestheorg.dom4j.DocumentinstanceusingSAXinordertostartthevalidation.Finallywereturnabooleanvaluethatinformsaboutsuccessofthevalidation.
Usingteamworkofdom4j,MSV,JAVRandgoodoldSAXsimplifiestheusageofmultischematavalidationwhilegainingthepowerofMSV.
XSLTdefinesadeclarativerule-basedwaytotransformXMLtreeintoplaintext,HTML,FOoranyothertext-basedformat.XSLTisverypowerful.Ironicallyitdoesnotneedvariablestoholddata.AsMichaelKayXSLTReferencesays:"Thisstyleofcodingwithoutassignmentstatements,iscalledFunctionalProgramming.TheearliestandmostfamousfunctionalprogramminglanguagewasLisp...,whilemodernexamplesincludeMLandScheme."InXSLTyoudefineasocalledtemplatethatmatchesacertainXPathexpression.TheXSLTProcessortraversethesourcetreeusingarecursivetreedescentalgorithmandperformsthecommandsyoudefinedwhenaspecifictreebranchmatchesthetemplaterule.
dom4joffersanAPIthatsupportsXSLTsimilarrulebasedprocessing.TheAPIcanbefoundinorg.dom4j.rulepackageandthischapterwillintroduceyoutothispowerfulfeatureofdom4j.