-
部署HDFS(更正确详细参考书:CDH4-Installation-Guide,P104~)
-
设置网络名称
编辑hosts文件:
vim /etc/hosts
192.168.1.2 server1
192.168.1.3 server2
192.168.1.4 server3
192.168.1.5 server4
这一步在server1上执行,然后通过远程复制命令scp传给集群中的其他节点。
-
HDFS架构
关于Hadoop:\cdh\hadoop-2.2.0\share\doc\hadoop\index.html
牢记网址:http://archive.cloudera.com/cdh4/
HDFS是一个具有高度容错性的分布式文件系统,适合部署在廉价的机器上,HDFS能提供高吞吐量的数据访问,非常适合大规模数据集上的应用。HDFS的架构总体上采用了主从架构,主要有以下几个组件组成:Client、NameNode、Secondary NodeNode和DataNode。
-
Client
Client通过NameNode和DataNode交互访问HDFS中的文件。
-
NameNode
NameNode是整个系统的中枢,复制
HDFS分布式文件系统:核心
namenode:存储metadata
datanode:存储普通数据
jobtracker:分配资源、监视进程
只有reduce的结果放在HDFS中
HDFS的副本存放策略:第一个副本放在本地节点上;
第二个副本存放在另外一个机架的节点上;
第三个副本存放在本地机架的另外一个节点上。
各节点部署模块:
server1:RM,HS,PS,ZK-server,client
server2:NN,NM,DN,MR,ZK,client
server3:NM,DN,MR,ZK,client
server4:NM,DN,MR,ZK,client
chmod og+r 文件:指定某文件(夹)为可读
cat /etc/passwd .
cat /etc/group .
在集群上配置CHD4
创建cloudera-cdh4.repo(scp到所有的节点)
baseurl = http://192.168.1.2/cdh/4/
gpgkey = http://192.168.1.2/cdh/RPM-GPG-KEY-cloudera
server1:
yum install zookeeper-server
yum install hadoop-yarn-resourcemanager
yum install hadoop-mapreduce-historyserver
yum install hadoop-yarn-proxyserver
yum install hadoop-client
将cloudera-cdh4.repo也scp到server2、server3、server4
server2:
yum install hadoop-hdfs-namenode
yum install hadoop-yarn-nodemanager
yum install hadoop-hdfs-datanode
yum install hadoop-mapreduce
yum install zookeeper
yum install hadoop-client
server3、server4:
yum install hadoop-yarn-nodemanager
yum install hadoop-hdfs-datanode
yum install hadoop-mapreduce
yum install zookeeper
yum install hadoop-client
拷出默认配置文件(考虑目录):(rm /var/lib/hadoop-hdfs/cache/hdfs/dfs/* -rf)
cp -r /etc/hadoop/conf.dist/opt/hadoop/conf/
alternatives--verbose --install /etc/hadoop/conf hadoop-conf /etc/hadoop/conf.empty 10
alternatives --verbose --install/etc/hadoop/conf hadoop-conf /etc/hadoop/conf 50
Linux版本更新快,update时不影响配置文件
vim /var/lib/alternatives/hadoop-conf
cd /etc/hadoop/conf/
vim core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://server2/</value>
</property>
</configuration>
vim hdfs-site.xml
<configuration>
<property>
<name>dfs.permissions.superusergroup</name>
<value>hadoop</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/data/1/dfs/nn,/nfsmount/dfs/nn</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/data/1/dfs/nn,/nfsmount/dfs/nn</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/data/1/dfs/dn,/data/2/dfs/dn,/data/3/dfs/dn,/data/4/dfs/dn</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
</configuration>
配置HDFS本地存储目录:
server2:
mkdir -p /data/1/dfs/nn /nfsmount/dfs/nn
server2、server3、server4:
mkdir -p /data/1/dfs/dn /data/2/dfs/dn/data/3/dfs/dn /data/4/dfs/dn
chown -R hdfs:hdfs /data/1/dfs/nn/nfsmount/dfs/nn /data/1/dfs/dn /data/2/dfs/dn /data/3/dfs/dn /data/4/dfs/dn
chown -R hdfs:hdfs /data/1/dfs/dn/data/2/dfs/dn /data/3/dfs/dn /data/4/dfs/dn
vim mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>server1:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>server1:19888</value>
</property>
<property>
<name>yarn.app.mapreduce.am.staging-dir</name>
<value>/user</value>
</property>
</configuration>
vim yarn-site.xml
<configuration>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>server1:8031</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>server1:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>server1:8030</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>server1:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>server1:8088</value>
</property>
<property>
<description>Classpath for typicalapplications.</description>
<name>yarn.application.classpath</name>
<value>
$HADOOP_CONF_DIR,
$HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*,
$HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*,
$HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*,
$YARN_HOME/*,$YARN_HOME/lib/*
</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce.shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.nodemanager.local-dirs</name>
<value>/data/1/yarn/local,/data/2/yarn/local,/data/3/yarn/local</value>
</property>
<property>
<name>yarn.nodemanager.log-dirs</name>
<value>/data/1/yarn/logs,/data/2/yarn/logs,/data/3/yarn/logs</value>
</property>
<property>
<description>Where to aggregatelogs</description>
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>/var/log/hadoop-yarn/apps</value>
</property>
</configuration>
格式化namenode:
sudo -u hdfs hadoop namenode -format
Re-format filesystem in /data/namedir ? (Yor N)
注意:Respondwith an upper-case Y
配置yarn本地存储目录:
mkdir -p /data/1/yarn/local/data/2/yarn/local /data/3/yarn/local /data/4/yarn/local
mkdir -p /data/1/yarn/logs/data/2/yarn/logs /data/3/yarn/logs /data/4/yarn/logs
chown -R yarn:yarn /data/1/yarn/local/data/2/yarn/local /data/3/yarn/local /data/4/yarn/local
chown -R yarn:yarn /data/1/yarn/logs/data/2/yarn/logs /data/3/yarn/logs /data/4/yarn/logs
启动namenode和datanode:
cd /etc/init.d
ls hadoop-hdfs-*
server2:hadoop-hdfs-namenode hadoop-hdfs-datanode
server3、server4:hadoop-hdfs-datanode
server2、server3、server4:
for x in `cd /etc/init.d ; lshadoop-hdfs-*` ; do sudo service $x start; done
以下各步在server2上操作:
创建HDFS目录(权限设置为drwxrwxrwt):
sudo -u hdfs hdfs dfs -mkdir /tmp
sudo -u hdfs hdfs dfs -chmod -R 1777 /tmp
创建history目录:
sudo -u hdfs hadoop fs -mkdir /usr/history
sudo -u hdfs hadoop fs -chmod -R 1777/usr/history
sudo -u hdfs hadoop fs -chown yarn/usr/history
创建log目录:
sudo -u hdfs hadoop fs -mkdir/var/log/hadoop-yarn
sudo -u hdfs hadoop fs -chown yarn:mapred/var/log/hadoop-yarn
检查HDFS文件结构:
sudo -u hdfs hadoop fs -ls -R /
看到:
drwxrwxrwt - hdfs supergroup 0 2012-04-1914:31 /tmp
drwxr-xr-x - hdfs supergroup 0 2012-05-3110:26 /usr
drwxrwxrwt - yarn supergroup 0 2012-04-1914:31 /usr/history
drwxr-xr-x - hdfs supergroup 0 2012-05-3115:31 /var
drwxr-xr-x - hdfs supergroup 0 2012-05-3115:31 /var/log
drwxr-xr-x - yarn mapred 0 2012-05-31 15:31/var/log/hadoop-yarn
CID-d218af4f-4418-4820-9067-c0a4cc34b165
启动YARN:
server1:
service hadoop-yarn-resourcemanager start
service hadoop-mapreduce-historyserverstart
server2、server3、server4:
service hadoop-yarn-nodemanager start
为用户创建Home目录:
sudo -u hdfs hadoop fs -mkdir /usr/$USER
sudo -u hdfs hadoop fs -chown $USER/usr/$USER
$USER:当前用户,可通过echo $USER查看
设置HADOOP_MAPRED_HOME:
cd /etc/profile.d
vim hadoop.sh
exportHADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce
因集群中每个服务器安装的功能模块都不一样,所以只能分别启动每个服务器上的各个功能模块,而不能用诸如start-all.sh这种脚本文件启动。
查看网页结果:
server1:8088 /cluster/nodes
server2/3/4:8042/node
server1:19888/jobhistory
server2:
datanode: server4:50075
1006不可用
datanode: server2:50070/dfshealth.jsp