一 spark、hadoop、yarn关系
spark :计算
hadoop:存储
yarn: 资源管理
在这里主要配置hdfs和yarn
hdfs
yarn
mapreduce(计算框架, spark)
yarn: 主进程:resourcemanager
yarn的开: sbin/start-yarn.sh
yarn的关闭:sbin/stop-yarn.sh
登录的url:http://localhost:8088
hdfs:
namenode 进程
datanode 进程
dfs的开与关: sbin/start-dfs.sh , sbin/stop-dfs.sh
登录url:http://localhost:50070
二 hadoop2.6集群环境搭建
hadoop 下载,解压
设置环境变量
2.1 HADOOP_HOME 设置
2.2 hadoop_CONF_DIR 设置 $HADOOP_HOME/etc/hadoop2.3 YARN_CONF_DIR 设置 $HADOOP_HOME/etc/hadoop
具体见:vim ~/bashrc export JAVA_HOME=/usr/lib/java/jdk1.8.0_45 export JRE_HOME=${JAVA_HOME}/jre export HADOOP_HOME=/usr/local/hadoop/hadoop-2.6.0 export CLASS_PATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib export PATH=${JAVA_HOME}/bin:${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin:$PATH export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop export YARN_CONF_DIR=${HADOOP_HOME}/etc/hadoop
core-site.xml 设置
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop/hadoop-2.6.0/tmp</value>
</property>
</configuration>
- hdfs-site.xml 设置
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>/usr/local/hadoop/hadoop-2.6.0/dfs/name</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/usr/local/hadoop/hadoop-2.6.0/dfs/data</value>
</property>
</configuration>
- mapred-site.xml 设置
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>
</configuration>
- hadoop-env.sh设置 $JAVA_HOME
# The java implementation to use.
export JAVA_HOME=/usr/lib/java/jdk1.8.0_45
- YARN-env.sh设置 $JAVA_HOME
export JAVA_HOME=/usr/lib/java/jdk1.8.0_45