Hadoop is a great piece of technology. But it’s not the technology that helps you solve the great problems. It’s the attitude you gain after absorbing the knowledge, and the courage to attack the problems.
For Hadoop, the “hello world” application is WordCount. Basically you feed a document with the assumption that it can be huge, the map reduce program outputs unique words and their counts. In real life however, the challenges you face is not as trivial. Some
are not yet answered and subject to active exploration and development. Dependency injection is a hot topic for instance. But for this post I’ll focus on a specific problem and present you the solution.
If you ever have to deal with XML in map reduce environment, it’s possible that you get a stacktrace dump similar below.
The reason is that the JDK supplied XML libraries are a bit out of date. In order to get rid of this error, you’ll need to both provide recent versions of Xalan and Xerces with you job configuration, which means you’ll need to make them available in your classpath.
If you’re using maven, (you are using maven for map reduce jobs right?) it’s just a couple of lines to include in the pom file.
The versions for xalan are xerces are specific. You need to supply the versions listed or above.