搭建Hadoop2.6.0环境(Standalone):
1.下载Hadoop: http://hadoop.apache.org/releases.html#Download
2.配置环境变量:
- JAVA_HOME:
#JAVA
export JAVA_HOME=/usr/java/jdk1.8.0_40
export PATH=$PATH:$JAVA_HOME/bin
- Hadoop_HOME:
#HADOOP
export HADOOP_INSTALL=/usr/local/hadoop
export PATH=$PATH:$HADOOP_INSTALL/bin
export PATH=$PATH:$HADOOP_INSTALL/sbin
export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_HOME=$HADOOP_INSTALL
export HADOOP_HDFS_HOME=$HADOOP_INSTALL
export YARN_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_INSTALL/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_INSTALL/lib"
3.SSH免密码登陆: http://my.oschina.net/mup/blog/387289
注:(ssh: connect to host localhost port 22: Connection refused问题的解决)
4.修改配置文件:
- hadoop/etc/hadoop/core-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop/tmp</value>
<description>Abase for other temporary directories.</description>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
- hadoop/etc/hadoop/hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/hadoop/dfs/data</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
</configuration>
- hadoop/etc/hadoop/yarn-site.xml
<?xml version="1.0"?>
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
- hadoop/etc/hadoop/masters
localhost
- hadoop/etc/hadoop/slaves
localhost
5.启动/停止Hadoop模块:
- 启动/停止HDFS:
sbin/start-dfs.sh
sbin/stop-dfs.sh
- 启动/停止YARN:
sbin/start-yarn.sh
sbin/stop-yarn.sh
6.访问控制页面:
- HDFS: http://localhost:50070
搭建Spark1.3.0环境(Standalone):
1.下载Spark: http://spark.apache.org/downloads.html
2.配置环境变量:
- SPARK_HOME:
#SPARK
export SPARK_HOME=/home/mupeng/Hadoop/spark-1.2.1-bin-hadoop2.4
export PATH=$SPARK_HOME/bin:$PATH
- 配置./conf/slaves
首先,将slaves.template拷贝一份cp slaves.template slaves
修改slaves文件
# A Spark Worker will be started on each of the machines listed below.spark-masterubuntu-workerspark-worker1
此处,我使用Standalone方式,修改slaves文件
# A Spark Worker will be started on each of the machines listed below.localhost
- 配置./conf/spark-env.sh
拷贝spark-env.sh.template
cp spark-env.sh.template spark-env.shspark-env.sh添加
export JAVA_HOME=/home/java/jdk1.8.0_40export SCALA_HOME=/home/scala/scala-2.11.6export SPARK_MASTER_IP=localhostexport SPARK_WORKER_MEMORY=1g
3.分发文件:
将配置好的文件spark-1.3.0-bin-hadoop2.4 拷贝到各个节点。
4.启动/停止 Spark:
sbin/start-master.sh
- Starts a master instance on the machine the script is executed on.sbin/start-slaves.sh
- Starts a slave instance on each machine specified in theconf/slaves
file.sbin/start-all.sh
- Starts both a master and a number of slaves as described above.sbin/stop-master.sh
- Stops the master that was started via thebin/start-master.sh
script.sbin/stop-slaves.sh
- Stops all slave instances on the machines specified in theconf/slaves
file.sbin/stop-all.sh
- Stops both the master and the slaves as described above.
5.访问控制页面:
Master’s web UI: http://localhost:8080