自学大数据,自我励志。知识都是积累的_自学积累的大数据经验-CSDN博客

本文链接：https://blog.csdn.net/qq_29379335/article/details/78502749

vi /etc/sysconfig/network 配置主机同户名

vi /etc/hosts 配置IP和用户名的绑定

1.JDK 安装虚拟 contos

rpm -qa|grep java 查看

rpm -e --nodeps java-1.6.0-openjdk-1.6.0.0-1.50.1.11.5.el6_3.x86_64 tzdata-java-2012j-1.el6.noarch java-1.7.0-openjdk-1.7.0.9-2.3.4.1.el6_3.x86_64 卸载原有JAVA

解压 JDK

tar -zxvf jdk-7u79-linux-x64.tar.gz -C /opt/modules/

配置环境变量

vi /etc/profile

export JAVA_HOME=/opt/modules/jdk1.7.0_79
export PATH=$PATH:$JAVA_HOME/bin

刷新

source /etc/profile

安装hadoop

解压 tar -zxvf hadoop-2.5.0-cdh5.3.6.tar.gz -C /opt/modules/

到 hadoop etc 下面 hadoop

hadoop-env.sh 配置

export JAVA_HOME=/opt/modules/jdk1.7.0_79

然后在解压的hadoop文件目录下创建

mkdir input

拷贝hadoopxml文件到input下面

cp etc//hadoop/*.xml input/

本地方式启动 hadoop

bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.0-cdh5.3.6.jar grep input output 'dfs[a-z.]+'

做之前先要建立好目录 mapreduce c程序运行在本地没启动JVM

bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.0-cdh5.3.6.jar wordcount wcinput wcoutput

配置伪分布HDFS

hadoop etc / core-site.xml localhost.localdomain这个配置在命令打hostname 下的

<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost.localdomain:8020</value> 
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/modules/hadoop-2.5.0-cdh5.3.6/data/tmp</value> 
</property>

<property>
<name>fs.trash.interval</name>
<value>420</value> 
</property>
</configuration>

配置 hdfs-site.xml

<configuration>
<property>
<name>dfs.replication</name>
<value>1</value> 
</property>
</configuration>

第一次的话先格式化一下bin/hdfs namenode -format

启动 sbin/hadoop-daemon.sh start namenode

sbin/hadoop-daemon.sh start datanode

jps 查看

http://192.168.1.106:50070/ 查看界面

创建一个目录到 hdfs上面

bin/hdfs dfs -mkdir -p /user/beifeng/

上传一个文件到指定目录

bin/hdfs dfs -put wcinput/wc.input /user/beifeng/mapreduce/wordcount/input/

执行mapreduce

bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.0-cdh5.3.6.jar wordcount /user/beifeng/mapreduce/wordcount/input/ /user/beifeng/mapreduce/wordcount/output

配置yarn-site.xml

<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>

<property>
<name>yarn.resourcemanager.hostname</name>
<value>localhost.localdomain</value>
</property>

<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>

<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>640800</value>
</property>

</configuration>

配置yarn-env.sh

export JAVA_HOME=/opt/modules/jdk1.7.0_79

配置 slaves

localhost.localdomain

启动 nodemanager resourcemanager

sbin/yarn-daemon.sh start nodemanager

sbin/yarn-daemon.sh start resourcemanager

查看端口被占用netstat -apn | grep 4040 杀死端口 kill -9 4040

配置mapred-env.sh

export JAVA_HOME=/opt/modules/jdk1.7.0_79

给mapred-site.xml.template 重命名mapred-site.xml

mapred-site.xml 进行编辑

 <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>

hdfs 输出目录存在会报错先删除掉

bin/hdfs dfs -rm -R /user/beifeng/mapreduce/wordcount/output/

执行mapreduce

bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.0-cdh5.3.6.jar wordcount /user/beifeng/mapreduce/wordcount/input/ /user/beifeng/mapreduce/wordcount/output

查看结果

bin/hdfs dfs -cat /user/beifeng/mapreduce/wordcount/output/part*

vi /etc/sysconfig/network 配置主机同户名

vi /etc/hosts 配置IP和用户名的绑定

1.JDK 安装虚拟 contos

rpm -qa|grep java 查看

rpm -e --nodeps java-1.6.0-openjdk-1.6.0.0-1.50.1.11.5.el6_3.x86_64 tzdata-java-2012j-1.el6.noarch java-1.7.0-openjdk-1.7.0.9-2.3.4.1.el6_3.x86_64 卸载原有JAVA

解压 JDK

tar -zxvf jdk-7u79-linux-x64.tar.gz -C /opt/modules/

配置环境变量

vi /etc/profile

export JAVA_HOME=/opt/modules/jdk1.7.0_79
export PATH=$PATH:$JAVA_HOME/bin

刷新

source /etc/profile

安装hadoop

解压 tar -zxvf hadoop-2.5.0-cdh5.3.6.tar.gz -C /opt/modules/

到 hadoop etc 下面 hadoop

hadoop-env.sh 配置

export JAVA_HOME=/opt/modules/jdk1.7.0_79

然后在解压的hadoop文件目录下创建

mkdir input

拷贝hadoopxml文件到input下面

cp etc//hadoop/*.xml input/

本地方式启动 hadoop

bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.0-cdh5.3.6.jar grep input output 'dfs[a-z.]+'

做之前先要建立好目录 mapreduce c程序运行在本地没启动JVM

bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.0-cdh5.3.6.jar wordcount wcinput wcoutput

配置伪分布HDFS

hadoop etc / core-site.xml localhost.localdomain这个配置在命令打hostname 下的

配置 hdfs-site.xml

<configuration>
<property>
<name>dfs.replication</name>
<value>1</value> 
</property>

<property>
<name>dfs.namenode.secondary.http-address</name>
<value>hadoop1:50090</value>

</property>
</configuration>

第一次的话先格式化一下bin/hdfs namenode -format

启动 sbin/hadoop-daemon.sh start namenode

sbin/hadoop-daemon.sh start datanode

jps 查看

http://192.168.1.106:50070/ 查看界面

创建一个目录到 hdfs上面

bin/hdfs dfs -mkdir -p /user/beifeng/

上传一个文件到指定目录

bin/hdfs dfs -put wcinput/wc.input /user/beifeng/mapreduce/wordcount/input/

执行mapreduce

bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.0-cdh5.3.6.jar wordcount /user/beifeng/mapreduce/wordcount/input/ /user/beifeng/mapreduce/wordcount/output

配置yarn-site.xml

<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>4096</value>
</property>

<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>4</value>
</property>

</configuration>

配置yarn-env.sh

export JAVA_HOME=/opt/modules/jdk1.7.0_79

配置 slaves

localhost.localdomain

启动 nodemanager resourcemanager

sbin/yarn-daemon.sh start nodemanager

sbin/yarn-daemon.sh start resourcemanager

sbin/mr-jobhistory-daemon.sh start historyserver

查看端口被占用netstat -apn | grep 4040 杀死端口 kill -9 4040

配置mapred-env.sh

export JAVA_HOME=/opt/modules/jdk1.7.0_79

给mapred-site.xml.template 重命名mapred-site.xml

mapred-site.xml 进行编辑

 <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>

<property>
	  <name>mapreduce.jobhistory.address</name>
	  <value>hadoop1:10020</value>
	</property>


	<property>
	  <name>mapreduce.jobhistory.webapp.address</name>
	  <value>hadoop1:19888</value>
	</property>

hdfs 输出目录存在会报错先删除掉

bin/hdfs dfs -rm -R /user/beifeng/mapreduce/wordcount/output/

执行mapreduce

bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.0-cdh5.3.6.jar wordcount /user/beifeng/mapreduce/wordcount/input/ /user/beifeng/mapreduce/wordcount/output

查看结果

bin/hdfs dfs -cat /user/beifeng/mapreduce/wordcount/output/part*

http://192.168.1.106:19888/jobhistory yarn日志界面

http://192.168.1.106:8088/cluster yarn界面

启动方式

*各个服务组件逐一启动

*hdfs

hadoop-deamon.sh start|stop namenode|datanode|secondarynamenode

*yarn

yarn-deamon.sh start|stop resourcemanager|nodemanager

*mapreduce

sbin/mr-jobhistory-daemon.sh start historyserver

*各个模块分开启动

*hdfs

start-dfs.sh

stop-dfs.sh

*yarn

start-yarn.sh

stop-yarn.sh

全部启动

start-all.sh

stop-all.sh 不推荐使用

配置SSH 免登陆

cd 到主目录 ll -a 发现 .ssh目录进去

ssh-keygen -t rsa 4下回车就可以生成了

ssh-copy-id hadoop1 后面跟 IP地址或绑定的主机名

部署细节回顾

HDFS

NAMENODE

core-site.xml

<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop1:8020</value> 
</property>

datanode

slaves

hadoop1

SecondaryNameNode

<property>
<name>dfs.namenode.secondary.http-address</name>
<value>hadoop1:50090</value>

</property>

yran

resoutcemanager

yarn-site.xml

<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop1</value>
</property>

Nodemanager

slaves

hadoop1

mapreduce HistoryServer

marred-site.xml

<property>
<name>mapreduce.jobhistory.address</name>
<value>hadoop1:10020</value>
</property>

<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>hadoop1:19888</value>
</property>