spark安装部署
标签(空格分隔): spark
hadoop,spark,kafka交流群:224209501
1,spark环境的安装
创建四个目录
sudo mkdir /opt/modules
sudo mkdir /opt/softwares
sudo mkdir /opt/tools
sudo mkdir /opt/datas
sudo chmod 777 -R /opt/
1,安装jdk1.7
先卸载自带的jdk
rpm –qa | grep java
sudo rpm -e --nodeps (自带java包)
安装jdk1.7
export JAVA_HOME=/opt/modules/jdk1.7.0_67
export PATH=$PATH:$JAVA_HOME/bin
2,spark编译
安装mvn
export MAVEN_HOME=/usr/local/apache-maven-3.0.5
export PATH=$PATH:$MAVEN_HOME/bin
3,安装scala
export SCALA_HOME=/opt/modules/scala-2.10.4
export PATH=$PATH:$SCALA_HOME/bin
4,修改mvn镜像源
编译之前先配置镜像及域名服务器,来提高下载速度,进而提高编译速度,用nodepad++打开/opt/compileHadoop/apache-maven-3.0.5/conf/setting.xml。(nodepad已经通过sftp链接到了机器)
<mirror>
<id>nexus-spring</id>
<mirrorOf>cdh.repo</mirrorOf>
<name>spring</name>
<url>http://repo.spring.io/repo/</url>
</mirror>
<mirror>
<id>nexus-spring2</id>
<mirrorOf>cdh.releases.repo</mirrorOf>
<name>spring2</name>
<url>http://repo.spring.io/repo/</url>
</mirror>
5,配置域名解析服务器
sudo vi /etc/resolv.conf
添加内容:
nameserver 8.8.8.8
nameserver 8.8.4.4
6,编译spark
为了提高编译速度,修改如下内容
VERSION=1.3.0
SPARK_HADOOP_VERSION=2.6.0-cdh5.4.0
SPARK_HIVE=1
#VERSION=$("$MVN" help:evaluate -Dexpression=project.version 2>/dev/null | grep -v "INFO" | tail -n 1)
#SPARK_HADOOP_VERSION=$("$MVN" help:evaluate -Dexpression=hadoop.version $@ 2>/dev/null\
# | grep -v "INFO"\
# | tail -n 1)
#SPARK_HIVE=$("$MVN" help:evaluate -Dexpression=project.activeProfiles -pl sql/hive $@ 2>/dev/null\
# | grep -v "INFO"\
# | fgrep --count "<id>hive</id>";\
# # Reset exit status to 0, otherwise the script stops here if the last grep finds nothing\
# # because we use "set -o pipefail"
# echo -n)
执行编译指令:
./make-distribution.sh --tgz -Pyarn -Phadoop-2.4 -Dhadoop.version=2.6.0-cdh5.4.0 -Phive-0.13.1 -Phive-thriftserver
去掉下面编译会很快,即使编译失败也不会每次都清除
-DskipTests clean package
4 安装hadoop2.6
1,添加java主目录位置
hadoop-env.sh
mapred-env.sh
yarn-env.sh
添加如下:
export JAVA_HOME=/opt/modules/jdk1.7.0_67
2,core-site.xml配置
<property>
<name>hadoop.tmp.dir</name>