Almost a year ago I published some lines about Snappy installation on HBase 0.94.x. Since both Hadoop 2.2.0 and HBase 0.96.0 are now out, I have decided to install a new cluster with those 2 versions.
Installation was quite simple but whe when I tried to move the data in this new cluster, things did not work well since I was missing Snappy, again.
Since it has not been that much straight forward to install it, goal of this blog is to provide some feedback on my experience.
Again, and first thing, the command line to confirm if snappy is working or not is the following:
bin/hbase org.apache.hadoop.hbase.util.CompressionTest file:///tmp/test.txt snappy
The goal is to get this final output:
2013-12-25 18:08:02,820 INFO? [main] Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
2013-12-25 18:08:03,903 INFO? [main] util.ChecksumType: Checksum using org.apache.hadoop.util.PureJavaCrc32
2013-12-25 18:08:03,905 INFO? [main] util.ChecksumType: Checksum can use org.apache.hadoop.util.PureJavaCrc32C
2013-12-25 18:08:04,143 INFO? [main] compress.CodecPool: Got brand-new compressor [.snappy]
2013-12-25 18:08:04,149 INFO? [main] compress.CodecPool: Got brand-new compressor [.snappy]
2013-12-25 18:08:04,685 INFO? [main] compress.CodecPool: Got brand-new decompressor [.snappy]
SUCCESS
And here are the steps to get it. All the difficulties to get the snappy libs is to get the right versions of the right tools. To get the snappy libs for your infrastructure, you need to compile both Snappy and Hadoop (since it doesn't come with 64 bits native libs). To compile the Snappy lib and get the related so files, you can refer to the previous post. Getting the so files for Hadoop is a bit more complicated. Hadoop 2.2.0 depends on maven 3.0.5 and on protobuf 2.5. Debian Wheezy (stable) doesn't include those versions of those tools. Only Jessie (testing) contains those version. So you have 2 options to compile it. First, you change your distribution to the testing version, or you compile it into a VM then deploy it. I choosed the 2nd option.
Maven and protobuf required a recent version of libc6. You will need to install this version on all the servers you will install the libs on.
First, those packages are required:
- subversion
- maven
- biuld-essential
- cmake
- zlib1g-dev
- libsnappy-dev
- pkg-config
- libssl-dev
To install them, just run, as root:
apt-get install subversion maven buid-essential cmake zlib1g-dev libsnappy-dev pkg-config
Make sure you JAVA_HOME is setup correctly. I tried with Sun JDK 1.7.0u45.
When everything is installed correctly, you can start with the core of the operations. First, you will need to extract the hadoop source code. Make sure you extract the right tag. Adjust that it based on the Hadoop version you will install those libraries to.
svn checkout http://svn.apache.org/repos/asf/hadoop/common/tags/release-2.2.0/
If you are using 2.2.0, some files will fail to compil. They are related to the tests. We don?t need them, so simply delete them.
release-2.2.0/hadoop-common-project/hadoop-auth/src/test/java/org/apache/hadoop/security/authentication/client/AuthenticatorTestCase.java
release-2.2.0/hadoop-common-project/hadoop-auth/src/test/java/org/apache/hadoop/security/authentication/client/TestPseudoAuthenticator.java
release-2.2.0/hadoop-common-project/hadoop-auth/src/test/java/org/apache/hadoop/security/authentication/client/TestKerberosAuthenticator.java
Now the tricky part. Hadoop 2.2.0 depends on Protobuf 2.5. However, this version is only available in debian experimental! So you will need to update your apt sources.list file to add the experimental repository and run this command:
apt-get -t experimental install protobuf-compiler
This will also update those packages under the experimental version:
libc-dev-bin libc6 libc6-dev locales
So you will have to get those versions too on the servers you are going to install to.
Now move into the release-2.2.0 folder and build using the following command:
mvn package -Drequire.snappy -Pdist,native,src -DskipTests -Dtar
If everything go as expected, you should find your so files under hadoop-dist/target/hadoop-2.2.0/lib/native/. Look for both libhdfs.so and libhadoop.so.
You should now have 3 files:
- libsnappy.so
- libhdfs.so
- libhadoop.so
Restore your sources.list to the initial version (remove the testing and experimental lines), and copy those 3 files under hbase/lib/native/Linux-amd64-64 (or under hbae/lib/native/Linux-i386-32 if you are not running 64 bits).
With your 3 files now in place you can now re-run the initial command and snappy should be running fine. Don't forget to copy those files on all your region servers.
Reference: http://www.spaggiari.org/index.php/hbase/how-to-install-snappy-with-1#.Us-8R_QW1CE
http://www.spaggiari.org/index.php/hbase/how-to-install-snappy-with#.Us--2vQW1CE