Hadoop 安装指南: Ubuntu12.04 系统

原文地址(单机):

http://askubuntu.com/questions/144433/how-to-install-hadoop


多节点安装部署指南:

http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/


参考文章:

install hadoop on ubuntu:

http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/


python hadoop:

http://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/


c++ hadoop:

http://cs.smith.edu/dftwiki/index.php/Hadoop_Tutorial_2.2_--_Running_C%2B%2B_Programs_on_Hadoop

http://blog.changecong.com/2013/10/ubuntu-%E7%BC%96%E8%AF%91%E5%AE%89%E8%A3%85-hadoop-2-2-0/


I am trying to install Hadoop in Ubuntu 12.04 version. Following the instructions from http://michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/, I installed java-6-openjdkfrom Ubuntu software-center. I have set java_home in .bashrc. Also set java_home in Hadoopconf/env.sh. While formatting the namenode, I am getting the following error:

usr/lib/jvm/java-6-openjdk/bin/java no such file or directory.

Thank you. But it's a 64bit OS.

share improve this question
 
1  
Why is there a bounty when the user has found the solution? What kind of answer is expected? –   green7  Apr 9 at 16:02
1  
@green7 The intention is to find an answer that actually is marked as accepted, is highly detailed, and the likes. Since there doesn't seem to be one, I can understand Jorge adding the bounty. –   Thomas W.  Apr 9 at 16:08
 
@TheLordofTime The detailed answer would atmost contain 5 lines as the question is too localized. And if an answer containing the installation instructions for Hadoop is expected, it would be redundant as the link mentioned in the question explains it wonderfully. Moreover since the answer, with the most votes, was posted by the asker him/herself, it is very unlikely he/she would accept any other answer. –   green7  Apr 9 at 16:12 
 
@green7 if the question is too localized then vote to close it as such? –   Thomas W.  Apr 9 at 17:21
add comment

6 Answers

The guides I followed when I had 12.04 were:

I was actually opposed to the MyLearning one because the first thing it recommended was Oracle Java 7 instead of OpenJDK 7, but I had some issues with OpenJDK 7 when trying this out so I had to go with Oracle.

The guide is mostly straight forward and here it is:

  1. Install Java

    sudo add-apt-repository ppa:webupd8team/java  
    sudo apt-get update && sudo apt-get upgrade  
    sudo apt-get install oracle-java7-installer  
    
  2. Create Hadoop user

    sudo addgroup hadoop  
    sudo adduser --ingroup hadoop hduser
    

    Where hduser is the Hadoop user you want to have.

  3. Configuring SSH

    su - hduser   
    ssh-keygen -t rsa -P ""
    cat .ssh/id_rsa.pub >> .ssh/authorized_keys
    

    To be sure that SSH installation went well, you can open a new terminal and try to create ssh session using hduser by the following command:

    ssh localhost
    

    reinstall ssh if localhost does not connect

    sudo apt-get install openssh-server
    
  4. Edit Sudoers

    sudo visudo
    
  5. Add at the end the line to add hduser into sudoers

    hduser (ALL)=(ALL:ALL) ALL
    

    To save press CTRL+X, type Y and press ENTER

  6. Disable IPv6

    sudo gedit /etc/sysctl.conf
    

    Copy the following lines at the end of the file:

    #disable ipv6  
    net.ipv6.conf.all.disable_ipv6 = 1  
    net.ipv6.conf.default.disable_ipv6 = 1   
    net.ipv6.conf.lo.disable_ipv6 = 1  
    

    If you face a problem telling you, you don't have permissions, just run the previous command with the root account (In case sudo is not enough. For me it was)

  7. Now reboot.

    You can also do sudo sysctl -p but I rather reboot.

    After rebooting, check to make sure IPv6 is off:

    cat /proc/sys/net/ipv6/conf/all/disable_ipv6
    

    it should say 1. If it says 0, you missed something.

  8. Installing Hadoop

    There are several ways of doing this, the one the Guide suggests is to download from the Apache Hadoop site and decompress the file in your hduser home folder. Rename the extracted folder tohadoop.

    The other way is to use a PPA that was tested for 12.04:

    sudo add-apt-repository ppa:hadoop-ubuntu/stable  
    sudo apt-get update && sudo apt-get upgrade  
    sudo apt-get install hadoop  
    

    NOTE: The PPA may work for some and for others will not. The on I tried was to download from the official site because I did not know about the PPA.

  9. Update $HOME/.bashrc

    You will need to update the .bachrc for hduser (and for every user you need to administer Hadoop). To open .bachrc file, you will need to open it as root:

    sudo gedit /home/hduser/.bashrc  
    

    Then you will add the following configurations at the end of .bachrc file

    # Set Hadoop-related environment variables   
    export HADOOP_HOME=/home/hduser/hadoop  
    
    # Set JAVA_HOME (we will also configure JAVA_HOME directly for Hadoop later on)`
    export JAVA_HOME=/usr/lib/jvm/java-7-oracle  
    

    Now, if you have OpenJDK7, it would look something like this:

    export JAVA_HOME=/usr/lib/java-7-openjdk-amd64  
    

    The thing to watch out for in here is the folder where the Java resides with the AMD64 version. If the above does not work, you can try looking in that particular folder or setting the Java that will be in used with:

    sudo update-alternatives --config java  
    

    Now for some helpful alias:

    # Some convenient aliases and functions for running Hadoop-related commands  
    unalias fs &> /dev/null   
    alias fs="hadoop fs"    
    unalias hls &> /dev/null  
    alias hls="fs -ls"  
    
    # Add Hadoop bin/ directory to PATH  
    export PATH=$PATH:$HADOOP_HOME/bin  
    
  10. Configuring Hadoop

    The following are configuration files we can use to do the proper configuration. Some of the files you will be using with Hadoop are (More information in this site):

    start-dfs.sh - Starts the Hadoop DFS daemons, the namenode and datanodes. Use this before start-mapred.sh

    stop-dfs.sh - Stops the Hadoop DFS daemons.

    start-mapred.sh - Starts the Hadoop Map/Reduce daemons, the jobtracker and tasktrackers.

    stop-mapred.sh - Stops the Hadoop Map/Reduce daemons.

    start-all.sh - Starts all Hadoop daemons, the namenode, datanodes, the jobtracker and tasktrackers. Deprecated; use start-dfs.sh then start-mapred.sh

    stop-all.sh - Stops all Hadoop daemons. Deprecated; use stop-mapred.sh then stop-dfs.sh

    But before we start using them, we need to modify several files in the /conf folder.

    hadoop-env.sh

    Look for the file hadoop-env.sh, we need to only update the JAVA_HOME variable in this file:

    sudo gedit /home/hduser/hadoop/conf/hadoop-env.sh
    

    or

    sudo nano /home/hduser/hduser/hadoop/conf/hadoop-env.sh
    

    Then change the following line:

    # export JAVA_HOME=/usr/lib/j2sdk1.5-sun
    

    To

    export JAVA_HOME=/usr/lib/jvm/java-7-oracle
    

    Note: if you get Error: JAVA_HOME is not set Error while starting the services, you forgot to uncomment the previous line (just remove #).

    core-site.xml

    Now we need to create a temp directory for Hadoop framework. If you need this environment for testing or a quick prototype (e.g. develop simple hadoop programs for your personal test ...), I suggest to create this folder under /home/hduser/ directory, otherwise, you should create this folder in a shared place under shared folder (like /usr/local ...) but you may face some security issues. But to overcome the exceptions that may caused by security (like java.io.IOException), I have created the tmp folder under hduser space.

    To create this folder, type the following command:

    sudo mkdir /home/hduser/tmp   
    

    Please note that if you want to make another admin user (e.g. hduser2 in hadoop group), you should grant him a read and write permission on this folder using the following commands:

    sudo chown hduser2:hadoop /home/hduser/tmp  
    sudo chmod 755 /home/hduser/tmp                                                         sudo chmod -R a+w /home/hduser/tmp       //this is valid in ubuntu 12.04
    

    Now, we can open hadoop/conf/core-site.xml to edit the hadoop.tmp.dir entry. We can open the core-site.xml using text editor:

    sudo gedit /home/hduser/hadoop/conf/core-site.xml  
    

    or

    nano /home/hduser/hduser/hadoop/conf/core-site.xml
    

    Then add the following configurations between .. xml elements:

    <property>
      <name>hadoop.tmp.dir</name>
      <value>/app/hadoop/tmp</value>
      <description>A base for other temporary directories.</description>
    </property>
    
    <property>
      <name>fs.default.name</name>
      <value>hdfs://localhost:54310</value>
      <description>The name of the default file system.  A URI whose
      scheme and authority determine the FileSystem implementation.  The
      uri's scheme determines the config property (fs.SCHEME.impl) naming
      the FileSystem implementation class.  The uri's authority is used to
      determine the host, port, etc. for a filesystem.</description>
    </property>
    

    Now edit mapred-site.xml

    <property>
      <name>mapred.job.tracker</name>
      <value>localhost:54311</value>
      <description>The host and port that the MapReduce job tracker runs
      at.  If "local", then jobs are run in-process as a single map
      and reduce task.
      </description>
    </property>
    

    Now edit hdfs-site.xml

    <property>
      <name>dfs.replication</name>
      <value>1</value>
      <description>Default block replication.
      The actual number of replications can be specified when the file is created.
      The default is used if replication is not specified in create time.
      </description>
    </property> 
  11. Formatting NameNode

    Now you can start working on the Node. First format:

    ~/hduser/hadoop/bin/hadoop namenode -format
    

    or

    ./home/hduser/hadoop/bin/hadoop namenode -format
    

    You should format the NameNode in your HDFS. You should not do this step when the system is running. It is usually done once at first time of your installation.

  12. Starting Hadoop Cluster

    You will need to navigate to hadoop/bin directory and run the ./start-all.sh script.

    cd ~/hduser/hadoop/bin/  
    start-all.sh  
    

    If you have a different version from the one shown in the guides (Which you will most likely have if doing this with the PPA or a newer version) then try it this way:

    cd ~/hduser/hadoop/bin/   
    start-dfs.sh  
    start-mapred.sh   
    

    This will start a Namenode, Datanode, Jobtracker and a Tasktracker on your machine.

  13. Checking if Hadoop is running

    There is a nice tool called jps. You can use it to ensure that all the services are up. In your hadoop bin folder type:

    jps
    

    It should show you all Hadoop related processes.

    NOTE: Since this was done around 6 months ago for me, if there is any part not working let me know.

Hadoop Using Juju (A Juju Charm for Hadoop)

Taken from Charming Hadoop

I will assume the following is already set up:

  • You have a server with Juju already set
  • You have access to the server (Locally or remotely)
  • You have Juju configured and ready to start adding charms
  • You are using 12.04 (This is because I tested all of this with 12.04)
  • You have already configure ~/.juju/environments.yaml with the information regarding the server you will be using including the PPA origin.

Ok now follow this steps to have a Hadoop service running:

  1. Bootstrap the environment for Hadoop

    juju bootstrap
    

    Wait until it finishes then check to see if it is connecting correctly:

    juju status
    
  2. Deploy Hadoop (Master and Slave)

    juju deploy --constraints="instance-type=m1.large" hadoop hadoop-master   
    
    juju deploy --constraints="instance-type=m1.medium" -n 5 hadoop hadoop-slave
    
  3. Create Relations

    juju add-relation hadoop-master:namenode hadoop-slave:datanode  
    
    juju add-relation hadoop-master:jobtracker hadoop-slave:tasktracker
    
  4. Expose Hadoop (Since you already deploy and created relations the service should be running)

    juju expose hadoop-master
    

    And check status to see if it working correctly:

    juju status hadoop-master
    

Up to now you have a running Hadoop. There are many more things you can do that can be found in the link provided or in the official Juju Charm for Hadoop

share improve this answer
  add comment

I successfully installed Hadoop by setting the path of JAVA_HOME as usr/lib/jvm/java-6-openjdk-amd64.

share improve this answer
  add comment

I think the path of java-6-openjdk in 12.04 is /usr/lib/jvm/java-6-openjdk-i386 if you have a 32-bit OS.

share improve this answer
  add comment

To be able to install sun-java with the apt-get command, you need to add a line to a file calledsources.list. This line can be found in /etc/apt/sources.list.

Open the file using this command:

sudo nano /etc/apt/sources.list

Then at the very end of (bottom) of that file, you copy/paste the line:

deb http://us.archive.ubuntu.com/ubuntu/ hardy multiverse

Now press Ctrl+X to exit, and y for saving.


Now type the command:

sudo apt-get update 

And when that is done, you can successfully run the command:

sudo apt-get install sun-java6-jdk
share improve this answer
  add comment

It seems that you have missed something while updating .bashrc, or the installation of the jdk has different folder. Moreover, I have followed the same steps on the blog you mentioned but I have faced some problems. You should install jdk in the correct path and make sure that JAVA_HOME has the correct value.

share improve this answer
  add comment

For a more up to date tutorial (not sure on differences) look at the hadoop screencasts video tutorials. They provide video and the actual commands to install underneath. Also if you email the writer he is very happy to respond and help you out if you get stuck with anything.

These instructions are largely similar to the ones that @Luis replied with.

share improve this answer
  add comment

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值