安装准备:
1.安装配置java1.8.0_141环境
2.增加master节点地址映射
vim /etc/profile
追加如下内容:
127.0.0.1 master
127.0.0.1 iZuf6hxhy307mpxxtvmtb3Z
iZuf6hxhy307mpxxtvmtb3Z 是我的阿里云服务器的主机名,防止出现异常:SHUTDOWN_MSG: Shutting down NameNode at java.net.UnknownHostException
下载、安装、配置Hadoop2.7.5
下载:
wget -c http://mirrors.hust.edu.cn/apache/hadoop/common/hadoop-2.7.5/hadoop-2.7.5.tar.gz
安装解压:
mkdir /opt/hadoop/ tar –zxvf hadoop-2.7.5.tar.gz -C /opt/hadoop/
配置hadoop单机模式:
vim /etc/profile #将以下内容追加到/etc/profile文件中 export HADOOP_HOME=/opt/hadoop/hadoop-2.7.5 export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin export PATH=$PATH:$HADOOP_HOME/lib export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop #更新配置文件 source /etc/profile #修改hadoop-env.sh文件中的JAVA_HOME环境变量: vim $HADOOP_HOME/etc/hadoop/hadoop-env.sh #追加如下内容 export JAVA_HOME=/opt/jdk/jdk1.8.0_141/ #更新hadoop-env.sh文件 source $HADOOP_HOME/etc/hadoop/hadoop-env.sh #查看hadoop版本: hadoop version #若能正确显示hadoop版本,则hadoop单机模式安装成功。
配置hadoop本地库
vim /etc/profile
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib:$HADOOP_COMMON_LIB_NATIVE_DIR"
source /etc/profile
配置hadoop伪分布模式:
vim $HADOOP_HOME/etc/hadoop/core-site.xml
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://master:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>file:/opt/hadoop/hadoop-2.7.5/tmp</value> <description>Abase for other temporarydirectories.</description> </property> </configuration>
vim $HADOOP_HOME/etc/hadoop/hdfs-site.xml
<configuration> <property> <name>dfs.namenode.secondary.http-address</name> <value>master:50090</value> </property> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:/opt/hadoop/hadoop-2.7.5/hdfs/name</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:/opt/hadoop/hadoop-2.7.5/hdfs/data</value> </property> </configuration>
cp $HADOOP_HOME/etc/hadoop/mapred-site.xml.template $HADOOP_HOME/etc/hadoop/mapred-site.xml
vim $HADOOP_HOME/etc/hadoop/mapred-site.xml
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapreduce.jobhistory.address</name> <value>master:10020</value> </property> <property> <name>mapreduce.jobhistory.awebapp.ddress</name> <value>master:19888</value> </property> </configuration>
vim $HADOOP_HOME/etc/hadoop/yarn-site.xml
<configuration> <!-- Site specific YARN configuration properties --> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.resourcemanager.hostname</name> <value>Master</value> </property> </configuration>
修改完配置文件后,对namenode进行格式化:
hdfs namenode –format
启动hadoop守护进程
启动dfs:
start-dfs.sh
启动YARN:
start-yarn.sh
启动JobHistoryServer:
mr-jobhistory-daemon.sh start historyserver
#因为mapred-site.xml文件中配置了JobHistoryServer,所以必须启动hadoop才能正常运行。
下载、安装、配置spark
下载
wget -c http://mirrors.hust.edu.cn/apache/spark/spark-2.2.1/spark-2.2.1-bin-hadoop2.7.tgz
解压安装
mkdir /opt/spark/ tar -xvf spark-2.2.1-bin-hadoop2.7.tgz -C /opt/spark/
配置spark
vim /etc/profile
export SPARK_HOME=/opt/spark/spark-2.2.1-bin-hadoop2.7/
export PATH=${SPARK_HOME}/bin:$PATH
source /etc/profile
配置pyspark
vim /etc/profile
export PYTHONPATH=$SPARK_HOME/python:/usr/bin/python
unzip $SPARK_HOME/python/lib/py4j-0.10.4-src.zip -d $SPARK_HOME/python
之后就可以用import pyspark在python中使用pyspark模块了
参考:
http://blog.csdn.net/u010171031/article/details/51849562
http://blog.csdn.net/xianglingchuan/article/details/61651339
http://blog.csdn.net/codeman_cdb/article/details/50986532
http://blog.csdn.net/young_kim1/article/details/50324345