java and xml_java, unicode and xml

本文介绍了如何使用Java处理XML文档,特别是在没有DTD的情况下理解文档结构。内容涵盖XML的基础知识,包括其用于结构化数据的便携性,以及Java对Unicode的支持。通过实例,你将学习到有效处理XML和Unicode的方法,特别是面对不完整或错误的数据时的应对策略。此外,文章适合那些希望快速入门Java和XML而不想深入复杂细节的读者。
摘要由CSDN通过智能技术生成

What you need to know about... Java and XML

by Aaron Elkiss

Introduction

XML stands for eXtensible Markup Language. It is a structured document format that allows us, among other things, to represent semistructured data in a portable fashion. That means that (at least in theory) we can use a single toolkit (the XML Parser) to extract structure and content from our document instead of having to adapt to many different ad-hoc conventions for text and binary data. XML alone doesn‘t give us any clue as to the meaning of a document. There are many ways of encoding various layers of meaning using XML, ranging from XML Schema to RDF and even Cascading Style Sheets, but they aren‘t covered here.

XML files can have associated Document Type Definitions (or DTDs) which explicitly specify the allowed structure. The main use for this is in validating parsers which enforce that a document conform to a DTD. In this tutorial, we‘ll assume that there is no DTD for our XML documents but we know the structure of documents we‘re working with ahead of time.

Java is among many modern languages that support the manipulation of XML. One crucial component of that is Java‘s support for Unicode. In this article you will learn how to effectively work with XML and Unicode with Java. The focus is on short real-world examples that will allow you to get started working with Java and XML right away. The presentation is geared towards those who need to work with XML and Unicode but don‘t want to spend a lot of time learning all the intricacies.

In addition, I include several suggestions on how to handle malformed data. This is a frequent scenario in many domains including my field, natural language processing. Data can be corrupted in transit and it‘s not always possible to get the supplier to fix it. I‘ll provide you with practical tools and ideas to solve this problem.

If you‘re not familiar with Unicode, you may want to skim the section on character sets and encodings or check out Joel Spolsky‘s The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) before reading the material on XML. Otherwise, you should be able to jump right in. This tutorial assumes a basic level of familiarity with Java, i.e. that you‘re fairly comfortable with the material in chapters 1-7 of Dive Into Java.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值