一.Preface
1.What is mass data processing:
There is a large amount of data, different data complexity, different size of individual data files, different sources of data, and mainly storage intensive operations, and may be accompanied by high concurrency.
For example: 12306, Taobao....
2.Reason:
Hadoop and correlation techniques development. More detailed, we need to thank the three open source articles from Google.
HDFS------GFS
MAPREDUCE------MAPREDUCE
HBASE------BIGTABLE
3.Introduction:
Apache Hadoop ( /həˈduːp/) is an open-source software framework used for distributed storage and processing of dataset of big data using the MapReduce programming model. It consists of computer clusters built from commodity hardware. All the modules in Hadoop are designed with a fundamental assumption that hardware failures are common occurrences and should be automatically handled by the framework.[2]
The core of Apache Hadoop consists of a storage part, known as Hadoop Distributed File System (HDFS), and a processing part which is a MapReduce programming model. Hadoop splits files into large blocks and distributes them across nodes in a cluster. It then transfers packaged code into nodes to process the data in parallel. This approach takes advantage of data locality,[3] where nodes manipulate the data they have access to. This allows the dataset to be processed faster and more efficiently than it would be in a more conventional supercomputer architecture that relies on a parallel file system where computation and data are distributed via high-speed networking. -- Wiki
Actually, Hadoop is not only a technique, but a sum of a series of techniques.
HDFS---------------Mass data storage(larger file, batch)
MAPREDUCE----------DPA----------SPARK
HIVE---------------HADOOP data warehouse--------HQL
HBASE--------------HADOOP column warehouse------(small file, instantaneity)
SQOOP/PIG----------ETL tools---------------data celaning, conversion, loading
FLUME--------------Data channel
AZKABAN------------Job scheduling
AVRO---------------Serilaizable processing
ZOOKEEPER----------Global coordinator
4.Installation
1.
vi /etc/hosts
192.168.16.100 hadoop
Get IP Adress:
ifconfig -a
Get hostname:
hostname
2.delete exist java:
rpm -qa |grep java
java-1.7.0-openjdk-1.7.0.45-2.4.3.3.el6.x86_64
tzdata-java-2013g-1.el6.noarch
java-1.6.0-openjdk-1.6.0.0-1.66.1.13.0.el6.x86_64
rpm -e java-1.7.0-openjdk-1.7.0.45-2.4.3.3.el6.x86_64 --nodeps
rpm -e java-1.6.0-openjdk-1.6.0.0-1.66.1.13.0.el6.x86_64 --nodeps
3.firewall stop:
service iptables stop
service ip6tables stop
Close SELINUX:
vi /etc/selinux/config
SELINUX=enfocing修改为diabled
Close firewall foreever:
chkconfig iptables off
chkconfig ip6tables off
4.YUM:
vim /etc/yum.repos.d/rhel-source.repo
mount /dev/sr0 /mnt
cd /media/RHEL_6.5\ x86_64\ Disc\ 1/Packages/
pwd
mkdir /rpms_YUM
ll /
cp * /rpms_YUM
cd /rpms_YUM
rpm -ivh deltarpm-3.5-0.5.20090913git.el6.x86_64.rpm
rpm -ivh python-deltarpm-3.5-0.5.20090913git.el6.x86_64.rpm
rpm -ivh createrepo-0.9.9-18.el6.noarch.rpm
createrepo .
rm -rf /etc/yum.repos.d/*
vi /etc/yum.repos.d/yum.local.repo
[local]
name=yum local repo
baseurl=file:///rpms_YUM
enabled=1
gpgcheck=0
yum clean all
5.Install Yum package:
yum -y install openssh*
yum -y install man*
yum -y install compat-libstdc++-33*
yum -y install libaio-0.*
yum -y install libaio-devel*
yum -y install sysstat-9.*
yum -y install glibc-2.*
yum -y install glibc-devel-2.* glibc-headers-2.*
yum -y install ksh-2*
yum -y install libgcc-4.*
yum -y install libstdc++-4.*
yum -y install libstdc++-4.*.i686*
yum -y install libstdc++-devel-4.*
yum -y install gcc-4.*x86_64*
yum -y install gcc-c++-4.*x86_64*
yum -y install elfutils-libelf-0*x86_64* elfutils-libelf-devel-0*x86_64*
yum -y install elfutils-libelf-0*i686* elfutils-libelf-devel-0*i686*
yum -y install libtool-ltdl*i686*
yum -y install ncurses*i686*
yum -y install ncurses*
yum -y install readline*
yum -y install unixODBC*
yum -y install zlib
yum -y install zlib*
yum -y install openssl*
yum -y install patch
yum -y install git
yum -y install lzo-devel zlib-devel gcc autoconf automake libtool
yum -y install lzop
yum -y install lrzsz
yum -y install lzo-devel zlib-devel gcc autoconf automake libtool
yum -y install nc
yum -y install glibc
yum -y install gzip
yum -y install zlib
yum -y install gcc
yum -y install gcc-c++
yum -y install make
yum -y install protobuf
yum -y install protoc
yum -y install cmake
yum -y install openssl-devel
yum -y install ncurses-devel
yum -y install unzip
yum -y install telnet
yum -y install telnet-server
yum -y install wget
yum -y install svn
yum -y install ntpdate
6.Close unessary server:
chkconfig autofs off
chkconfig acpid off
chkconfig sendmail off
chkconfig cups-config-daemon off
chkconfig cpus off
chkconfig xfs off
chkconfig lm_sensors off
chkconfig gpm off
chkconfig openibd off
chkconfig pcmcia off
chkconfig cpuspeed off
chkconfig nfslock off
chkconfig iptables off
chkconfig ip6tables off
chkconfig rpcidmapd off
chkconfig apmd off
chkconfig sendmail off
chkconfig arptables_jf off
chkconfig microcode_ctl off
chkconfig rpcgssd off
7.Insatll Java:
transfer under /usr by Xshell
tar xzvf jdk-8u45-linux-x64.tar.gz
mv jdk1.8.0_45 java
vi /etc/profile
export JAVA_HOME=/usr/java
export JRE_HOME=/usr/java/jre
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$JRE_HOME/lib
export PATH=$PATH:$JAVA_HOME/bin
source /etc/profile
Second Part: HADOOP
1.
transfer HADOOP package under /usr/local
2.
tar xzvf hadoop-2.7.2.tar.gz
mv hadoop-2.7.2 hadoop
3.
vi /etc/profile
export HADOOP_HOME=/usr/local/hadoop
#export HADOOP_OPTS="-Djava.library.path=$HADOOP_PREFIX/lib:$HADOOP_PREFIX/lib/native"
export LD_LIBRARY_PATH=$HADOOP_HOME/lib/native
export HADOOP_COMMON_LIB_NATIVE_DIR=/usr/local/hadoop/lib/native
export HADOOP_OPTS="-Djava.library.path=/usr/local/hadoop/lib"
#export HADOOP_ROOT_LOGGER=DEBUG,console
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
source /etc/profile
************************************************************************
************************************************************************
Modify setting file(Importance)
cd /usr/local/hadoop/etc/hadoop
①:hadoop-env.sh
vim hadoop-env.sh
#27 line
export JAVA_HOME=/usr/java
②:core-site.xml HADOOP-HDFS System kernel file
vi core-site.xml
<!-- define HADOOP's system file schema(URI),NameNode Address-->
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop:9000</value>
</property>
<!-- define hadoop's storage path -->
<property>
<name>hadoop.tmp.dir</name>
<value>/var/hadoop/tmp</value>
</property>
③:hdfs-site.xml
vi hdfs-site.xml
<!-- define HDFS's number of dupilicate-->
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<!-- close authority verify -->
<property>
<name>dfs.permissions.enabled</name>
<value>false</value>
</property>
④:mapred-site.xml
mv mapred-site.xml.template mapred-site.xml
vim mapred-site.xml
<!-- define mr run on the yarn -->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
⑤:yarn-site.xml
vi yarn-site.xml
<!-- define ResourceManager Address -->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop</value>
</property>
<!-- the method of reducer get data-->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
Mutual trust:
sh sshUserSetup.sh -user root -hosts "hadoop" -advanced -noPromptPassphrase
Start hadoop cluster:
First, format NAMENODE:
hdfs namenode -format
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at hadoop/192.168.16.100
************************************************************/
Start Hadoop:
start-all.sh
Verify:
jps
7010 Jps
6387 SecondaryNameNode (important)
6599 ResourceManager (significant)
6249 DataNode (unnecessary)
6153 NameNode (important)
6698 NodeManager (unnecessary)
http://192.168.16.100:50070 (HDFS manage view)
http://192.168.16.100:8088 (MR manage view)
ldd --version
2.12
17/03/16 17:41:28 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Resloution:
First, checking our hadoop bit.
file /usr/local/hadoop/lib/native/libhadoop.so.1.0.0
If it is the fllowing result:
libhadoop.so.1.0.0: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, not stripped
it means it is 64 bit.
So we can use the following command cancle WARN:
vi /usr/local/hadoop/etc/hadoop/log4j.properties
log4j.logger.org.apache.hadoop.util.NativeCodeLoader=ERROR
*Install Hadoop by script:
vi hadoop-install.sh
#!/bin/bash
clear
echo "Transfer your Hadoop package under /usr/local, or your directory! "
echo
read -p "Your path(absoulate path) is: " d
echo
read -p "Finished?
Yes -- y! No -- n: " a
echo
case &a in
Y|y)
f = `ls -l $d |grep hadoop.*.tar.gz |awk '{print $9}'`
echo "Is it this file: $f?"
read -p "Yes -- y! No -- n: " b
if [ "$b" == 'y' ];then
cd $d
echo
echo "Preparing tar!!!"
sleep 5
echo
echo "Strating tar!!!"
sleep 2
echo
tar xzvf $d/$f
echo
echo "Finished!!!"
sleep 2
echo
aa = `ls -al |grep hadoop| grep -v .tar.gz|awk '{print $9}'`
echo "The result is: $aa"
echo
mv $d/$aa $d/hadoop
else
exit
fi
;;
N|n)
echo "Transfer again, please!!!"
;;
*)
echo "Input error, try again!!!"
esac