JDOM 得失

A Short History of JDOM

Bill Venners : Tell me about JDOM.

Elliotte Rusty Harold : The convention center we're in now, the Santa Clara convention center, is where JDOM was born three years ago at the very first O'Reilly Enterprise Java conference. Brett McLaughlin, who was then working on O'Reilly's Java and XML book, was giving a talk on DOM. Jason Hunter was in the audience. Jason noticed that about every third slide in Brett's talk was "gotcha!" Something in DOM doesn't work like you would naturally expect it to work. So Jason said to himself, there's got to be a better way. He and Brett went outside and had lunch on the lawn, and in the course of their conversation decided to create what would become JDOM. Over the next couple of weeks, they did some work on it, and I think released their first alpha version to the world.

JDOM, like DOM, is a tree-based object model. It loads the whole document in memory like DOM, but it's much simpler. JDOM uses concrete classes rather than interfaces, which DOM uses, and I saw how that made life simpler. JDOM is designed just for Java. It is not designed to support C++, Python, Perl, or any other language. The interoperability is achieved through XML, not the API. The API doesn't need to port. A Java API runs on one system. The XML document is what moves between systems and needs to be portable. JDOM is in many ways what DOM should have been. It's simple. It's mostly correct. It's easy enough for people who aren't experts in both XML and JDOM to use. To use DOM correctly, you really need to be an expert in both XML and DOM.

JDOM Offers Many Convenience Methods

Bill Venners : In your talk, one complaint you had about JDOM is: "There's more than one way to do it." What's that all about?

Elliotte Rusty Harold : JDOM often provides convenience methods. For example, suppose you have an item element in RSS, and you want to get the content of the title of that item . You can call getChildElement("title").getText() . You can call getChildText() . You can call getChildTextTrim() . You can call getChildTextNormalized() . If you want an attribute, there are still more methods you can call.

JDOM has lots of convenience methods. The idea is that sometimes you want the white space removed from either end of this element text you're reading, sometimes you don't. So they give you methods to do both. Looked at individually, any one of these methods is fine. My concern is that when you add them all up, the sheer number of them becomes intimidating. You can't read the JavaDoc documentation for the Element class in JDOM, without saying, "There's just so much here." It's just too big. I would prefer not to provide so many convenience methods. I would prefer a simpler, smaller API that can be grokked in one sitting, an API all of whose methods you can see in maybe one screen.

JDOM Allows Malformed Documents

Bill Venners : You also complained that JDOM XML documents are not always well-formed. Could you differentiate between well-formed and valid documents, and explain your concerns about JDOM?

Elliotte Rusty Harold : XM L documents must be well-formed. There are, depending on how you count, anywhere from a hundred to several thousand different rules. These "well-formedness" rules are the minimum requirements for an XML document. The rules cover things like what characters are allowed in element names: The letter 'a' is OK. The letter omega is OK. The asterisk character is not OK. White space is not OK. The rules say that every start-tag has to have a matching end-tag. Elements can nest, but they cannot overlap. Processing instructions have the form < , ? , a target, white space, the data, ? , and a > . Comments cannot contain a double hyphen. There are many such rules governing well-formedness of XML documents.

Validity talks about which elements and attributes are allowed where. Well-formedness only talks about the structure of any XML document, irrespective of what the names are. Validity says, we're only going to allow these elements with these names in these positions. Validity is not required. Well-formedness is.

JDOM, and for that matter DOM, allows you to create malformed documents. They do not check everything they can possibly check. For instance, they do not currently check that the text content of a text node does not contain the null character, which is completely illegal in an XML document. Similarly so are vertical tabs, form feeds, and other control characters. So one way you can create a malformed document using either JDOM or DOM, is to pass in a string to the Text constructor that contains some of these control characters. In my opinion, an XML API shouldn't allow that. It shouldn't rely on the programmer who is using the API to know which characters are and are not legal. If a programmer tries to do something illegal that would result in a malformed document, it should stop them by throwing an exception.

Bill Venners : You also mentioned the internal DTD subset in this portion of your talk.

Elliotte Rusty Harold : An XML document's DocType declaration points to its Document Type Definition (DTD). If the DTD is actually contained inside the instance document, between square brackets, then that part of the DTD is called the internal DTD subset. In some cases the internal DTD can also point to an external part, which is why we distinguish internal from external. We merge the two DTD subsets to get the complete DTD. Sometimes the whole DTD is there in the internal DTD subset. Sometimes it's in the external part.

In JDOM, the internal DTD subset is not checked. You could put absolutely any string in there whatsoever, including strings that are totally illegal in an internal DTD subset. For example, you could just put the text of the Declaration of Independence as your internal DTD subset in JDOM, even though that would not be well-formed. It's just another thing that JDOM decided they would not check for well-formedness, because checking the internal DTD subset would be too onerous.

DOM solves that problem in a different way, incidentally. DOM makes the DocType declaration read-only, so it can't be changed at all. Therefore, it can't be changed to something that is malformed.

JDOM Ignores Setter Method Conventions

Bill Venners : How about, setter methods don't return void .

Elliotte Rusty Harold : I learned in JavaBeans that one of the ways you recognize a setter method is that it returns void , as in public void setColor() . You know that method sets the color property, because it follows a naming convention. The name begins with the word set . The first letter in Color is capitalized, and so forth. JDOM follows a different pattern, called method invocation chaining, where for example the setName method on the Element class returns that Element object. To me, that just makes no sense. There's no reason for setter methods to return anything.

Bill Venners : The set methods return this ?

Elliotte Rusty Harold :You might have an element object e in class X , and you call e.setName() , which returns e . From inside the method, yes, it's returning this . From outside the method, it's returning whatever object you invoked it on. That pattern is used, for example, in the new IO library in Java, where I also don't like it. But the designers of JDOM do like it. To me, it does not seem semantically correct. It does not seem to indicate what the method is doing, as opposed to how the method is being used.

DOM Uses Java Collections

Bill Venners : You asked, "Is JDOM too Java centric?"

Elliotte Rusty Harold : When JDOM was designed, Brett and Jason said, we're going to go whole hog. We're not going to invent a separate NodeList class, like DOM does. We're going to use the Java Collections API. We're not going to have a cloneNode method like DOM does. We're going to use the Java clone method. We're going to implement Serializable , because good Java classes implement Serializable . We're going to implement Cloneable . We're going to have equals and hashcode methods—all the nice, normal things Java programmers have learned to love. The problem is, five or six years down the road, we've learned that some of those things aren't so nice. The Cloneable interface is a disaster. Joshua Bloch talks about this in Effective Java, and flat out recommends that people ignore it and implement their own copy constructors instead, just because Cloneable is so poorly designed.

The Serializable interface is useful in some circumstances, but I think in XML the serialization format should be XML, not binary object serialization, so I'm not sure whether that's necessary. And when it comes to the Collections API, that API suffers seriously from two things. One is Java's lack of true generics, i.e., templates to C++ programmers. The other is that Java has primitive data types, and the Collections API can't be used for int s or double s. I'm not so sure that one's relevant, but the first one is. When you expose the children of an Element as a java.util.List , what you're getting back is a list of Object s. Every time you get something out of that List , you have to cast it back to its type. We don't know what it is, so we have to have a big switch block that says, if (o instanceof Element) { e = (Element) o; } , and then you do the same thing for Text , Comment , and ProcessingInstruction , and it gets really messy. DOM, by contrast, does have a different NodeList interface that contains Node s. When you get something out of that list, you know it's a Node . And you've got certain operations you can use on a Node , and often that's all you need. Sometimes you need something more. Sometimes you do need to know whether it's an Element node, an Attribute node, or a Text node. But a lot of times, it's enough to know it's a Node . It's not enough to know that it's an Object .

JDOM Uses Too Many Checked Exceptions

Bill Venners : You also suggested in your talk that JDOM had too many checked exceptions.

Elliotte Rusty Harold : JDOM does check many of the things that can make an XML document malformed, not all of them, but many. For example, you can't have an element name that contains white space. Generally speaking, if JDOM detects a problem, then it throws a checked exception, a JDOMException specifically. That means that when you're writing JDOM code, you have a lot of try catch blocks. Try such and such, catch JDOMException , respond appropriately. As Bruce Eckel has pointed out, a lot of people just write catch JDOMException open close curly brace, and don't actually do anything to handle the failure appropriately.

Perhaps the appropriate response is, instead of throwing a checked exception, to throw RuntimeExceptions . That way it doesn't get in the way of your code. It doesn't make your code any messier. But the signal of the problem is still there if the problem arises. The way Joshua Bloch explains this is that any problem that could possibly be caught in testing should be a RuntimeException , for example, setting the name of an element. That should throw a RuntimeException . Because if you use a bad String for that, you'll catch it in testing, if you have good testing. On the other hand, parsing an external document should not throw a RuntimeException , it should throw a checked exception, because that's going to depend on which document is being passed into your program. Sometimes it is going to be well-formed and sometimes not. There's no way to know that in general, so that's a legitimate checked exception. But I just have come to learn, in a way I didn't understand a few years ago, that many exceptions that are currently checked exceptions should really be RuntimeException s.

Bill Venners : So you think JDOM goes a bit overboard with the checked exceptions.

Elliotte Rusty Harold : Yes, and that's probably my fault. I was the one who in the very early days of JDOM argued most strongly for putting in lots of exceptions and checking everything. There were others who argued against putting in any exceptions at all. I think what we were missing then, was anybody standing in the middle saying, "Hey, guys, RuntimeException s would satisfy both of you at the same time. I just didn't know that then. I've learned from Bruce Eckel and Joshua Bloch.

Will JDOM Remain Class-Based?

Bill Venners : In your talk you asked, "Are JDOM committers committed to classes?" What did you mean by that?

Elliotte Rusty Harold : That's a completely separate issue. I had a conversation with Jason Hunter, one of the two or three committers to the CVS tree for JDOM. Jason said that if JDOM used interfaces rather than classes, then it could be used, for example, as the API for a native XML database. And he thought that was an important use case. And on further reflection, I think I agree with him. There is, perhaps, a need for such an API. However, I also think there's a need for a simple, concrete, class-based API. And I'm just not certain at this point going forward that JDOM will always be a class-based API, that it will be a class-based API when it gets to 1.0. So, I think it's useful to have my little XOM API, which I know is going to be a class-based API.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值