ubuntu 32 hadoop 2.2.0

BigData Handler

Handles Velocity,Volume,Variety and Complexity

Installing single node Hadoop 2.2.0 on Ubuntu

Posted on
November 2, 2013
by aravindu012
 
 

Please find the complete step by step process for installing Hadoop 2.2.0 stable version on Ubuntu as requested by many of this blog visitors, friends and subscribers.

hadoopstack

Apache Hadoop 2.2.0 release has significant changes compared to its previous stable release, which is Apache Hadoop 1.2.1(Setting up Hadoop 1.2.1 can be found here).

 In short , this release has a number of changes compared to its earlier version 1.2.1:

  • YARN – A general purpose resource management system for Hadoop to allow MapReduce and other data processing frameworks like Hive, Pig and Services
  • High Availability for HDFS
  • HDFS Federation, Snapshots
  • NFSv3 access to data in HDFS
  • Introduced Application Manager to manage the application life cycle
  • Support for running Hadoop on Microsoft Windows
  • HDFS – Symlinks feature is disabled & will be taken out in future versions
  • Jobtracker has been replaced with Resource Manager and Node Manager

Before starting into setting up Apache Hadoop 2.2.0, please understand the concepts of Big Data and Hadoop from my previous blog posts:

Big Data Characteristics, Problems and Solution.

What is Apache Hadoop?.

Setting up Single node Hadoop Cluster.

Setting up Multi node Hadoop Cluster.

Understanding HDFS architecture (in comic format).

Setting up the environment:

In this tutorial  you will know step by step process for setting up a Hadoop Single Node cluster, so that you can play around with the framework and learn more about it.

In This tutorial we are using following Software versions, you can download same by clicking the hyperlinks:

If  you are using putty to access your Linux box remotely, please install openssh  by running this command, this also helps in configuring SSH access easily in the later part of the installation:

sudo apt-get install openssh-server

Prerequisites:

  1. Installing Java v1.7
  2. Adding dedicated Hadoop system user.
  3. Configuring SSH access.
  4. Disabling IPv6.

Before starting of installing any applications or softwares, please  makes sure your list of packages from all repositories and PPA’s is up to date or if not update them by using this command:

sudo apt-get update

Hadoop2.2.0_1
1. Installing Java v1.7:

For running Hadoop it requires Java v1. 7+

a. Download Latest oracle Java Linux version of the oracle website by using this command

wget https://edelivery.oracle.com/otn-pub/java/jdk/7u45-b18/jdk-7u45-linux-x64.tar.gz

If it fails to download, please check with this given command which  helps to avoid passing username and password.

wget --no-cookies --no-check-certificate --header "Cookie: gpw_e24=http%3A%2F%2Fwww.oracle.com" "https://edelivery.oracle.com/otn-pub/java/jdk/7u45-b18/jdk-7u45-linux-x64.tar.gz"

Hadoop2.2.0_2

Hadoop2.2.0_3

b. Unpack the compressed Java binaries, in the directory:

sudo tar xvzf jdk-7u25-linux-x64.tar.gz

Hadoop2.2.0_4

c. Create a Java directory using mkdir under /user/local/ and change the directory to /usr/local/Java by using this command

mkdir -R /usr/local/Java
cd /usr/local/Java

Hadoop2.2.0_5

d. Copy the Oracle Java binaries into the /usr/local/Java directory.

sudo cp -r jdk-1.7.0_45 /usr/local/java

Hadoop2.2.0_6

e. Edit the system PATH file /etc/profile and add the following system variables to your system path

sudo nano /etc/profile    or  sudo gedit /etc/profile

Hadoop2.2.0_7

f. Scroll down to the end of the file using your arrow keys and add the following lines below to the end of your /etc/profile file:

JAVA_HOME=/usr/local/Java/jdk1.7.0_45
PATH=$PATH:$HOME/bin:$JAVA_HOME/bin
export JAVA_HOME
export PATH

Hadoop2.2.0_8

g. Inform your Ubuntu Linux system where your Oracle Java JDK/JRE is located. This will tell the system that the new Oracle Java version is available for use.

sudo update-alternatives --install "/usr/bin/javac" "javac" "/usr/local/java/jdk1.7.0_45/bin/javac" 1
sudo update-alternatvie --set javac /usr/local/Java/jdk1.7.0_45/bin/javac

Hadoop2.2.0_9

  • This command notifies the system that Oracle Java JDK is available for use

h. Reload your system wide PATH /etc/profile by typing the following command:

. /etc/profile

Test to see if Oracle Java was installed correctly on your system.

Java -version

Hadoop2.2.0_10

2. Adding dedicated Hadoop system user.
We will use a dedicated Hadoop user account for running Hadoop. While that’s not required  but it is recommended, because it helps to separate the Hadoop installation from other software applications and user accounts running on the same machine.

a. Adding group:

sudo addgroup Hadoop

b. Creating a user and adding the user to a group:

sudo adduser –ingroup Hadoop hduser

It will ask to provide the new UNIX password and Information as shown in below image.

Hadoop2.2.0_11

3. Configuring SSH access:

The need for SSH Key based authentication is required so that the master node can then login to slave nodes (and the secondary node) to start/stop them and also local machine if you want to use Hadoop with it. For our single-node setup of Hadoop, we therefore need to configure SSH access to localhost for the hduser user we created in the previous section.

Before this step you have to make sure that SSH is up and running on your machine and configured it to allow SSH public key authentication.

Generating an SSH key for the hduser user.
a. Login as hduser with sudo
b. Run this Key generation command:

ssh-keyegen -t rsa -P ""

Hadoop2.2.0_12

c. It will ask to provide the file name in which to save the key, just press has entered so that it will generate the key at ‘/home/hduser/ .ssh’

d. Enable SSH access to your local machine with this newly created key.

cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys

e. The final step is to test the SSH setup by connecting to your local machine with the hduser user.

ssh hduser@localhost

This will add localhost permanently to the list of known hosts

Hadoop2.2.0_13

4. Disabling IPv6.

We need to disable IPv6 because Ubuntu is using 0.0.0.0 IP for different Hadoop configurations. You will need to run the following commands using a root account:

sudo gedit /etc/sysctl.conf

Add the following lines to the end of the file and reboot the machine, to update the configurations correctly.

#disable ipv6
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1

Hadoop2.2.0_14Hadoop2.2.0_15

Hadoop Installation:

Go to Apache Downloadsand download Hadoop version 2.2.0 (prefer to download any stable versions)

i. Run this following command to download Hadoop version 2.2.0

wget http://apache.mirrors.pair.com/hadoop/common/stable2/hadoop-2.2..tar.gz

Hadoop2.2.0_16

ii. Unpack the compressed hadoop file by using this command:

tar –xvzf hadoop-2.2.0.tar.gz

Hadoop2.2.0_17

iii. move hadoop-2.2.0 to hadoop directory by using give command

mv hadoop-2.2.0 hadoop

Hadoop2.2.0_18

iv.  Move hadoop package of your choice, I picked /usr/local for my convenience

sudo mv hadoop /usr/local/

Hadoop2.2.0_18

v. Make sure to change the owner of all the files to the hduser user and hadoop group by using this command:

sudo chown -R hduser:hadoop Hadoop

Hadoop2.2.0_19

Configuring Hadoop:

The following are the required files we will use for the perfect configuration of the single node Hadoop cluster.

a. yarn-site.xml:
b. core-site.xml
c. mapred-site.xml
d. hdfs-site.xml
e. Update $HOME/.bashrc

We can find the list of files in Hadoop directory which is located in

cd /usr/local/hadoop/etc/hadoop

Hadoop2.2.0_20
Hadoop2.2.0_21

a.yarn-site.xml:

<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>

Hadoop2.2.0_22

b. core-site.xml:

i. Change the user to “hduser”. Change the directory to /usr/local/hadoop/conf  and edit the core-site.xml file.

vi core-site.xml

ii. Add the following entry to the file and save and quit the file:

<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>

Hadoop2.2.0_23

c. mapred-site.xml:

If this file does not exist, copy mapred-site.xml.template as mapred-site.xml

i. Edit the mapred-site.xml file

vi mapred-site.xml

ii. Add the following entry to the file and save and quit the file.

<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>

Hadoop2.2.0_24

d. hdfs-site.xml:

i. Edit the hdfs-site.xml file

vi hdfs-site.xml

ii. Create two directories to be used by namenode and datanode.

mkdir -p $HADOOP_HOME/yarn_data/hdfs/namenode
sudo mkdir -p $HADOOP_HOME/yarn_data/hdfs/namenode
mkdir -p $HADOOP_HOME/yarn_data/hdfs/datanode

iii. Add the following entry to the file and save and quit the file:

<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop/yarn_data/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/hadoop/yarn_data/hdfs/datanode</value>
</property>
</configuration>

Hadoop2.2.0_25

e. Update $HOME/.bashrc

i. Go back to the root and edit the .bashrc file.

vi .bashrc

Hadoop2.2.0_26

ii. Add the following lines to the end of the file.

Add below configurations:

# Set Hadoop-related environment variables
export HADOOP_PREFIX=/usr/local/hadoop
export HADOOP_HOME=/usr/local/hadoop
export HADOOP_MAPRED_HOME=${HADOOP_HOME}
export HADOOP_COMMON_HOME=${HADOOP_HOME}
export HADOOP_HDFS_HOME=${HADOOP_HOME}
export YARN_HOME=${HADOOP_HOME}
export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
# Native Path
export HADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP_PREFIX}/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_PREFIX/lib"
#Java path
export JAVA_HOME='/usr/locla/Java/jdk1.7.0_45'
# Add Hadoop bin/ directory to PATH
export PATH=$PATH:$HADOOP_HOME/bin:$JAVA_PATH/bin:$HADOOP_HOME/sbin

Hadoop2.2.0_27

Formatting and Starting/Stopping the HDFS filesystem via the NameNode:

i. The first step to starting up your Hadoop installation is formatting the Hadoop filesystem which is implemented on top of the local filesystem of your cluster. You need to do this the first time you set up a Hadoop cluster. Do not format a running Hadoop filesystem as you will lose all the data currently in the cluster (in HDFS). To format the filesystem (which simply initializes the directory specified by the dfs.name.dir variable), run the

hadoop namenode -format

Hadoop2.2.0_28

Hadoop2.2.0_29

ii. Start Hadoop Daemons by running the following commands:

Name node:

$ hadoop-daemon.sh start namenode

Data node:

$ hadoop-daemon.sh start datanode

Resource Manager:

$ yarn-daemon.sh start resourcemanager

Hadoop2.2.0_30

Node Manager:

$ yarn-daemon.sh start nodemanager

Job History Server:

$ mr-jobhistory-daemon.sh start historyserver

Hadoop2.2.0_31

v. Stop Hadoop by running the following command

stop-dfs.sh
stop-yarn.sh

Hadoop Web Interfaces:

Hadoop comes with several web interfaces which are by default  available at these locations:

Hadoop2.2.0_32

By this we are done in setting up a single node hadoop cluster v2.2.0, hope this step by step guide helps you to setup same environment at your end.

Please leave a comment/suggestion in the comment section,will try to answer asap and don’t forget to subscribe for the newsletter and a facebook like :)

 
Categories: Hadoop

4 Comments

  1. BigData

    Hi, This is a great tutorial. It really helped me. But I have 1 problem, when i type jps after hadoop-daemon.sh start namenode or after hadoop-daemon.sh start datanode I do not get any NameNode printed in the output only some number JPS is printed. So have I done something wrong ?

    • Hi Arundhati

      I am glad that this tutorial helped you.
      Coming to your question/issue: Can you please check whether all the configurations files are configured correctly(setting up Hadoop home directory) or go to change the home directory to hadoop bin(/usr/local/hadoop/bin) directory and run those start commands, it should work.

  2. BigData

    Hi I am getting the error:

    13/11/29 16:12:32 FATAL namenode.NameNode: Exception in namenode join
    java.io.IOException: Cannot create directory /usr/local/hadoop/yarn_data/hdfs/namenode/current
    at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.clearDirectory(Storage.java:301)
    at org.apache.hadoop.hdfs.server.namenode.NNStorage.format(NNStorage.java:523)
    at org.apache.hadoop.hdfs.server.namenode.NNStorage.format(NNStorage.java:544)
    at org.apache.hadoop.hdfs.server.namenode.FSImage.format(FSImage.java:147)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:837)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1213)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1320)
    13/11/29 16:12:32 INFO util.ExitUtil: Exiting with status 1
    13/11/29 16:12:32 INFO namenode.NameNode: SHUTDOWN_MSG:
    /************************************************************

    • aravindu012

      Hi

      Can you please check the directory permession:

      Make sure to change the owner of all the files to the hduser user and hadoop group by using this command:
      sudo chown -R hduser:hadoop Hadoop

Leave a Reply

 
 
 
 
 
 
   

Recent Posts

Categories

Subscribe to Blog via Email

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

 

 

November 2013
M T W T F S S
« Oct
  Dec »
  1 2
3
4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26
27 28 29 30  
 
© BigData Handler. Proudly Powered by WordPress | Nest Theme by YChong
 
  

 
深度学习是机器学习的一个子领域,它基于人工神经网络的研究,特别是利用多层次的神经网络来进行学习和模式识别。深度学习模型能够学习数据的高层次特征,这些特征对于图像和语音识别、自然语言处理、医学图像分析等应用至关重要。以下是深度学习的一些关键概念和组成部分: 1. **神经网络(Neural Networks)**:深度学习的基础是人工神经网络,它是由多个层组成的网络结构,包括输入层、隐藏层和输出层。每个层由多个神经元组成,神经元之间通过权重连接。 2. **前馈神经网络(Feedforward Neural Networks)**:这是最常见的神经网络类型,信息从输入层流向隐藏层,最终到达输出层。 3. **卷积神经网络(Convolutional Neural Networks, CNNs)**:这种网络特别适合处理具有网格结构的数据,如图像。它们使用卷积层来提取图像的特征。 4. **循环神经网络(Recurrent Neural Networks, RNNs)**:这种网络能够处理序列数据,如时间序列或自然语言,因为它们具有记忆功能,能够捕捉数据中的时间依赖性。 5. **长短期记忆网络(Long Short-Term Memory, LSTM)**:LSTM 是一种特殊的 RNN,它能够学习长期依赖关系,非常适合复杂的序列预测任务。 6. **生成对抗网络(Generative Adversarial Networks, GANs)**:由两个网络组成,一个生成器和一个判别器,它们相互竞争,生成器生成数据,判别器评估数据的真实性。 7. **深度学习框架**:如 TensorFlow、Keras、PyTorch 等,这些框架提供了构建、训练和部署深度学习模型的工具和库。 8. **激活函数(Activation Functions)**:如 ReLU、Sigmoid、Tanh 等,它们在神经网络中用于添加非线性,使得网络能够学习复杂的函数。 9. **损失函数(Loss Functions)**:用于评估模型的预测与真实值之间的差异,常见的损失函数包括均方误差(MSE)、交叉熵(Cross-Entropy)等。 10. **优化算法(Optimization Algorithms)**:如梯度下降(Gradient Descent)、随机梯度下降(SGD)、Adam 等,用于更新网络权重,以最小化损失函数。 11. **正则化(Regularization)**:技术如 Dropout、L1/L2 正则化等,用于防止模型过拟合。 12. **迁移学习(Transfer Learning)**:利用在一个任务上训练好的模型来提高另一个相关任务的性能。 深度学习在许多领域都取得了显著的成就,但它也面临着一些挑战,如对大量数据的依赖、模型的解释性差、计算资源消耗大等。研究人员正在不断探索新的方法来解决这些问题。
【4层】3100平米综合办公楼毕业设计(含计算书、建筑结构图) 、1资源项目源码均已通过严格测试验证,保证能够正常运行; 2、项目问题、技术讨论,可以给博主私信或留言,博主看到后会第一时间与您进行沟通; 3、本项目比较适合计算机领域相关的毕业设计课题、课程作业等使用,尤其对于人工智能、计算机科学与技术等相关专业,更为适合; 4、下载使用后,可先查看README.md或论文文件(如有),本项目仅用作交流学习参考,请切勿用于商业用途。 5、资源来自互联网采集,如有侵权,私聊博主删除。 6、可私信博主看论文后选择购买源代码。 1、资源项目源码均已通过严格测试验证,保证能够正常运行; 2、项目问题、技术讨论,可以给博主私信或留言,博主看到后会第一时间与您进行沟通; 3、本项目比较适合计算机领域相关的毕业设计课题、课程作业等使用,尤其对于人工智能、计算机科学与技术等相关专业,更为适合; 4、下载使用后,可先查看README.md或论文文件(如有),本项目仅用作交流学习参考,请切勿用于商业用途。 5、资源来自互联网采集,如有侵权,私聊博主删除。 6、可私信博主看论文后选择购买源代码。 、1资源项目源码均已通过严格测试验证,保证能够正常运行; 2、项目问题、技术讨论,可以给博主私信或留言,博主看到后会第一时间与您进行沟通; 3、本项目比较适合计算机领域相关的毕业设计课题、课程作业等使用,尤其对于人工智能、计算机科学与技术等相关专业,更为适合; 4、下载使用后,可先查看README.md或论文文件(如有),本项目仅用作交流学习参考,请切勿用于商业用途。 5、资源来自互联网采集,如有侵权,私聊博主删除。 6、可私信博主看论文后选择购买源代码。
1、资源项目源码均已通过严格测试验证,保证能够正常运行; 、2项目问题、技术讨论,可以给博主私信或留言,博主看到后会第一时间与您进行沟通; 3、本项目比较适合计算机领域相关的毕业设计课题、课程作业等使用,尤其对于人工智能、计算机科学与技术等相关专业,更为适合; 4、下载使用后,可先查看README.md或论文文件(如有),本项目仅用作交流学习参考,请切勿用于商业用途。 5、资源来自互联网采集,如有侵权,私聊博主删除。 6、可私信博主看论文后选择购买源代码。 1、资源项目源码均已通过严格测试验证,保证能够正常运行; 2、项目问题、技术讨论,可以给博主私信或留言,博主看到后会第一时间与您进行沟通; 3、本项目比较适合计算机领域相关的毕业设计课题、课程作业等使用,尤其对于人工智能、计算机科学与技术等相关专业,更为适合; 4、下载使用后,可先查看README.md或论文文件(如有),本项目仅用作交流学习参考,请切勿用于商业用途。、资 5源来自互联网采集,如有侵权,私聊博主删除。 6、可私信博主看论文后选择购买源代码。 1、资源项目源码均已通过严格测试验证,保证能够正常运行; 2、项目问题、技术讨论,可以给博主私信或留言,博主看到后会第一时间与您进行沟通; 3、本项目比较适合计算机领域相关的毕业设计课题、课程作业等使用,尤其对于人工智能、计算机科学与技术等相关专业,更为适合; 4、下载使用后,可先查看README.md或论文文件(如有),本项目仅用作交流学习参考,请切勿用于商业用途。、 5资源来自互联网采集,如有侵权,私聊博主删除。 6、可私信博主看论文后选择购买源代码。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值