Hadoop(Day03) -- Introduction, Installation

一.Preface

1.What is mass data processing:

There is a large amount of data, different data complexity, different size of individual data files, different sources of data, and mainly storage intensive operations, and may be accompanied by high concurrency.

For example: 12306, Taobao....



2.Reason:

Hadoop and correlation techniques development. More detailed, we need to thank the three open source articles from Google. 

HDFS------GFS


MAPREDUCE------MAPREDUCE


HBASE------BIGTABLE


3.Introduction:

Apache Hadoop ( /həˈdp/) is an open-source software framework used for distributed storage and processing of dataset of big data using the MapReduce programming model. It consists of computer clusters built from commodity hardware. All the modules in Hadoop are designed with a fundamental assumption that hardware failures are common occurrences and should be automatically handled by the framework.[2]

The core of Apache Hadoop consists of a storage part, known as Hadoop Distributed File System (HDFS), and a processing part which is a MapReduce programming model. Hadoop splits files into large blocks and distributes them across nodes in a cluster. It then transfers packaged code into nodes to process the data in parallel. This approach takes advantage of data locality,[3] where nodes manipulate the data they have access to. This allows the dataset to be processed faster and more efficiently than it would be in a more conventional supercomputer architecture that relies on a parallel file system where computation and data are distributed via high-speed networking. -- Wiki

Actually, Hadoop is not only a technique, but a sum of a series of techniques.

HDFS---------------Mass data storage(larger file, batch)
MAPREDUCE----------DPA----------SPARK
HIVE---------------HADOOP data warehouse--------HQL
HBASE--------------HADOOP column warehouse------(small file, instantaneity)
SQOOP/PIG----------ETL tools---------------data celaning, conversion, loading
FLUME--------------Data channel
AZKABAN------------Job scheduling
AVRO---------------Serilaizable processing
ZOOKEEPER----------Global coordinator





4.Installation

1.
vi /etc/hosts
192.168.16.100 hadoop


Get IP Adress:
ifconfig -a
Get hostname:
hostname


2.delete exist java:
rpm -qa |grep java
java-1.7.0-openjdk-1.7.0.45-2.4.3.3.el6.x86_64
tzdata-java-2013g-1.el6.noarch
java-1.6.0-openjdk-1.6.0.0-1.66.1.13.0.el6.x86_64


rpm -e java-1.7.0-openjdk-1.7.0.45-2.4.3.3.el6.x86_64 --nodeps
rpm -e java-1.6.0-openjdk-1.6.0.0-1.66.1.13.0.el6.x86_64 --nodeps


3.firewall stop:

service iptables stop
service ip6tables stop


Close SELINUX:
vi /etc/selinux/config
SELINUX=enfocing修改为diabled

Close firewall foreever:
chkconfig iptables off
chkconfig ip6tables off 


4.YUM:

vim /etc/yum.repos.d/rhel-source.repo
mount /dev/sr0 /mnt

cd /media/RHEL_6.5\ x86_64\ Disc\ 1/Packages/

pwd

mkdir /rpms_YUM

ll /

cp * /rpms_YUM

cd /rpms_YUM

rpm -ivh deltarpm-3.5-0.5.20090913git.el6.x86_64.rpm 
rpm -ivh python-deltarpm-3.5-0.5.20090913git.el6.x86_64.rpm
rpm -ivh createrepo-0.9.9-18.el6.noarch.rpm 

createrepo .   

rm -rf /etc/yum.repos.d/*

vi /etc/yum.repos.d/yum.local.repo

[local]
name=yum local repo
baseurl=file:///rpms_YUM
enabled=1
gpgcheck=0

yum clean all


5.Install Yum package:

yum -y install openssh*
yum -y install man*
yum -y install compat-libstdc++-33*
yum -y install libaio-0.*
yum -y install libaio-devel*
yum -y install sysstat-9.*
yum -y install glibc-2.*
yum -y install glibc-devel-2.* glibc-headers-2.*
yum -y install ksh-2*
yum -y install libgcc-4.*
yum -y install libstdc++-4.*
yum -y install libstdc++-4.*.i686*
yum -y install libstdc++-devel-4.*
yum -y install gcc-4.*x86_64*
yum -y install gcc-c++-4.*x86_64*
yum -y install elfutils-libelf-0*x86_64* elfutils-libelf-devel-0*x86_64*
yum -y install elfutils-libelf-0*i686* elfutils-libelf-devel-0*i686*
yum -y install libtool-ltdl*i686*
yum -y install ncurses*i686*
yum -y install ncurses*
yum -y install readline*
yum -y install unixODBC*
yum -y install zlib
yum -y install zlib*
yum -y install openssl*
yum -y install patch
yum -y install git
yum -y install lzo-devel zlib-devel gcc autoconf automake libtool
yum -y install lzop
yum -y install lrzsz
yum -y install lzo-devel  zlib-devel  gcc autoconf automake libtool
yum -y install nc
yum -y install glibc
yum -y install gzip
yum -y install zlib
yum -y install gcc
yum -y install gcc-c++
yum -y install make
yum -y install protobuf
yum -y install protoc
yum -y install cmake
yum -y install openssl-devel
yum -y install ncurses-devel
yum -y install unzip
yum -y install telnet
yum -y install telnet-server
yum -y install wget
yum -y install svn
yum -y install ntpdate


6.Close unessary server:
chkconfig autofs off
chkconfig acpid off
chkconfig sendmail off
chkconfig cups-config-daemon off
chkconfig cpus off
chkconfig xfs off
chkconfig lm_sensors off
chkconfig gpm off
chkconfig openibd off
chkconfig pcmcia off
chkconfig cpuspeed off
chkconfig nfslock off
chkconfig iptables off
chkconfig ip6tables off
chkconfig rpcidmapd off
chkconfig apmd off
chkconfig sendmail off
chkconfig arptables_jf off
chkconfig microcode_ctl off
chkconfig rpcgssd off


7.Insatll Java:
transfer under /usr by Xshell


tar xzvf jdk-8u45-linux-x64.tar.gz


mv jdk1.8.0_45 java

vi /etc/profile

export JAVA_HOME=/usr/java
export JRE_HOME=/usr/java/jre
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$JRE_HOME/lib
export PATH=$PATH:$JAVA_HOME/bin

source /etc/profile

Second Part: HADOOP


1.

transfer HADOOP package under /usr/local


2.
tar xzvf hadoop-2.7.2.tar.gz

mv hadoop-2.7.2 hadoop

3.

vi /etc/profile

export HADOOP_HOME=/usr/local/hadoop
#export HADOOP_OPTS="-Djava.library.path=$HADOOP_PREFIX/lib:$HADOOP_PREFIX/lib/native"
export LD_LIBRARY_PATH=$HADOOP_HOME/lib/native
export HADOOP_COMMON_LIB_NATIVE_DIR=/usr/local/hadoop/lib/native
export HADOOP_OPTS="-Djava.library.path=/usr/local/hadoop/lib"
#export HADOOP_ROOT_LOGGER=DEBUG,console
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

source /etc/profile

************************************************************************
************************************************************************
Modify setting file(Importance)


cd /usr/local/hadoop/etc/hadoop

①:hadoop-env.sh

vim hadoop-env.sh
#27 line
export JAVA_HOME=/usr/java

②:core-site.xml HADOOP-HDFS System kernel file

vi core-site.xml

<!-- define HADOOP's system file schema(URI),NameNode Address-->
<property>
    <name>fs.defaultFS</name>
    <value>hdfs://hadoop:9000</value>
</property>


<!-- define hadoop's storage path -->
<property>
    <name>hadoop.tmp.dir</name>
    <value>/var/hadoop/tmp</value>
</property>


③:hdfs-site.xml 


vi hdfs-site.xml
                
<!-- define HDFS's  number of dupilicate-->
<property>
      <name>dfs.replication</name>
      <value>1</value>
</property>
<!-- close authority verify -->
  <property>
      <name>dfs.permissions.enabled</name>
      <value>false</value>
</property>


④:mapred-site.xml 

mv mapred-site.xml.template mapred-site.xml
vim mapred-site.xml

<!-- define mr run on the yarn -->
<property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
</property>


⑤:yarn-site.xml

vi yarn-site.xml

<!-- define ResourceManager Address -->
<property>
    <name>yarn.resourcemanager.hostname</name>
    <value>hadoop</value>
</property>
<!-- the method of reducer get data-->
<property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
</property>

Mutual trust:

sh sshUserSetup.sh -user root -hosts "hadoop" -advanced -noPromptPassphrase


Start hadoop cluster:


First, format NAMENODE:


hdfs namenode -format

/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at hadoop/192.168.16.100
************************************************************/

Start Hadoop:
start-all.sh


Verify:
jps
7010 Jps
6387 SecondaryNameNode (important)
6599 ResourceManager (significant)
6249 DataNode (unnecessary)
6153 NameNode (important)
6698 NodeManager (unnecessary) 

http://192.168.16.100:50070 (HDFS manage view)
http://192.168.16.100:8088 (MR manage view)


When we encounter the following error, Actually, it is supprot, but hadoop will check automatically, and it require 2.14.

ldd --version
2.12 

17/03/16 17:41:28 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable


Resloution:
First, checking our hadoop bit.

file /usr/local/hadoop/lib/native/libhadoop.so.1.0.0 

If it is the fllowing result:
libhadoop.so.1.0.0: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, not stripped

it means it is 64 bit.

So we can use the following command cancle WARN:

vi /usr/local/hadoop/etc/hadoop/log4j.properties

log4j.logger.org.apache.hadoop.util.NativeCodeLoader=ERROR


*Install Hadoop by script:

vi hadoop-install.sh

#!/bin/bash
clear
echo "Transfer your Hadoop package under /usr/local, or your directory! "
echo
read -p "Your path(absoulate path) is: " d
echo
read -p "Finished?
Yes -- y! No -- n: " a
echo
case &a in
Y|y)
f = `ls -l $d |grep hadoop.*.tar.gz |awk '{print $9}'`
echo "Is it this file: $f?"
read -p "Yes -- y! No -- n: " b
if [ "$b" == 'y' ];then
cd $d
echo
echo "Preparing tar!!!"
sleep 5
echo
echo "Strating tar!!!"
sleep 2
echo
tar xzvf $d/$f
echo
echo "Finished!!!"
sleep 2
echo
aa = `ls -al |grep hadoop| grep -v .tar.gz|awk '{print $9}'`
echo "The result is: $aa"
echo
mv $d/$aa $d/hadoop
else
exit
fi
;;
N|n)
echo "Transfer again, please!!!"
;;
*)
echo "Input error, try again!!!"
esac

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值