ubuntu安装spark
1、安装Ubuntu
2、设置root密码sudo passwd root
[sudo] password for you :---> 输入你的密码,不会显示
3、安装vmtools 复制到桌面 提取出来 su 命令 ./vm...install...
4、系统设置-语言支持-检查-更新
5、重启
判断Ubuntu是否安装了ssh服务:
ps -e |grep ssh 如果服务已经启动,则可以同时看到“ssh-agent”和“sshd”,否则表示没有安装服务,或没有开机启动
安装ssh服务,输入命令:sudo apt-get install openssh-server
启动服务:#/etc/init.d/ssh start
sudo apt-get install rpm
安装JDK--未安装
rpm -qa | grep java
rpm -e --nodeps java-1.6.0-openjdk-1.6.0.0-1.50.1.11.5.el6_3.x86_64
rpm -e --nodeps java-1.7.0-openjdk-1.7.0.9-2.3.4.1.el6_3.x86_64
rpm -e --nodeps tzdata-java-2012j-1.el6.noarch
mkdir ~/modules
mkdir ~/tools
mkdir ~/software
解压源码包,输入命令:
tar -zxvf jdk-7u79-linux-x64.tar.gz
mv jdk1.7.0_79 ~/modules/jdk1.7
配置环境变量命令:
sudo gedit ~/.bashrc
vi ~/.bashrc
##JAVA
export JAVA_HOME=/home/spark/modules/jdk1.7
export PATH=$PATH:$JAVA_HOME/bin
source ~/.bashrc
设置主机名 su
vi /etc/hosts 格式 ip aaa.bbb.ccc aaa
192.168.192.138 jarvan.dragon.org jarvan
永久生效
vi /etc/hostname
HOSTNAME=Jarvan
临时生效
hostname Jarvan
安装hadoop2.6
SSH本机免登陆密码
ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
tar -zxvf hadoop-2.6.0-x64.tar.gz -C ~/modules/
vi ~/.bashrc
vi编写注意事项del删除后按a才能输入
##HADOOP
export HADOOP_HOME=/home/spark/modules/hadoop-2.6.0
export PATH=$PATH:/home/spark/modules/hadoop-2.6.0/sbin:/home/spark/modules/hadoop-2.6.0/bin
source ~/.bashrc
cd ~/modules/hadoop-2.6.0/etc/hadoop
slaves
jarvan.dragon.org
hadoop-env.sh
export JAVA_HOME=/home/spark/modules/jdk1.7/
yarn-env.sh
export JAVA_HOME=/home/spark/modules/jdk1.7/
mapred-env.sh
export JAVA_HOME=/home/spark/modules/jdk1.7/
core-site.xml
<property>
<name>hadoop.tmp.dir</name>
<value>/home/spark/tools/hadoopdata</value>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://jarvan.dragon.org:9000</value>
</property>
hdfs-site.xml
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
yarn-site.xml
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop-master.dragon.org</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
cp mapred-site.xml.template mapred-site.xml
mapred-site.xml
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
格式化
bin/hdfs namenode -format
启动 start-dfs.sh start-yarn.sh
安装scala-安装spark
tar -zxvf scala-2.10.5.tgz -C ~/modules/
tar -zxvf spark-1.6.0-bin-hadoop2.6.tgz -C ~/modules/
vi ~/.bashrc
export SCALA_HOME=/home/spark/modules/scala-2.10.5
export PATH=$PATH:$SCALA_HOME/bin
export SPARK_HOME=/home/spark/modules/spark-1.6.0-bin-hadoop2.6
export PATH=$PATH:$SPARK_HOME/bin
source ~/.bashrc
Spark Standalone模式 cluster model
cp spark-env.sh.template spark-env.sh
conf/spark-env.sh 详细说明看注释
JAVA_HOME=/home/spark/modules/jdk1.7
SCALA_HOME=/home/spark/modules/scala-2.10.5
HADOOP_CONF_DIR=/home/spark/modules/hadoop-2.6.0/etc/hadoop
SPARK_MASTER_IP=jarvan.dragon.org
SPARK_MASTER_PORT=7077 #默认7077
SPARK_MASTER_WEBUI_PORT=8080 #默认8080
SPARK_WORKER_CORES=1
SPARK_WORKER_MEMORY=1000m
SPARK_WORKER_PORT=7078
SPARK_WORKER_WEBUI_PORT=8081
SPARK_WORKER_INSTANCES=1
cp slaves.template slaves
conf/slaves worker节点配置
jarvan.dragon.org
cp spark-defaults.conf.template spark-defaults.conf
conf/spark-defaults.conf master节点配置
spark.master spark://jarvan.dragon.org:7077
启动Standalone模式
sbin/start-master.sh
sbin/start-slaves.sh
查看
http://192.168.192.138:8080/
打开Standalone模式的spark-shell
spark-shell --master spark://jarvan.dragon.org:7077 --executor-memory 300m
查看
http://192.168.192.138:4040/
测试
val num=sc.parallelize(1 to 10)
val rdd = num.map(x=>(x,1))
rdd.collect
rdd.saveAsTextFile("hdfs://192.168.192.138:9000/data/output1")
本地
spark-shell --master local[1]
1、安装Ubuntu
2、设置root密码sudo passwd root
[sudo] password for you :---> 输入你的密码,不会显示
3、安装vmtools 复制到桌面 提取出来 su 命令 ./vm...install...
4、系统设置-语言支持-检查-更新
5、重启
判断Ubuntu是否安装了ssh服务:
ps -e |grep ssh 如果服务已经启动,则可以同时看到“ssh-agent”和“sshd”,否则表示没有安装服务,或没有开机启动
安装ssh服务,输入命令:sudo apt-get install openssh-server
启动服务:#/etc/init.d/ssh start
sudo apt-get install rpm
安装JDK--未安装
rpm -qa | grep java
rpm -e --nodeps java-1.6.0-openjdk-1.6.0.0-1.50.1.11.5.el6_3.x86_64
rpm -e --nodeps java-1.7.0-openjdk-1.7.0.9-2.3.4.1.el6_3.x86_64
rpm -e --nodeps tzdata-java-2012j-1.el6.noarch
mkdir ~/modules
mkdir ~/tools
mkdir ~/software
解压源码包,输入命令:
tar -zxvf jdk-7u79-linux-x64.tar.gz
mv jdk1.7.0_79 ~/modules/jdk1.7
配置环境变量命令:
sudo gedit ~/.bashrc
vi ~/.bashrc
##JAVA
export JAVA_HOME=/home/spark/modules/jdk1.7
export PATH=$PATH:$JAVA_HOME/bin
source ~/.bashrc
设置主机名 su
vi /etc/hosts 格式 ip aaa.bbb.ccc aaa
192.168.192.138 jarvan.dragon.org jarvan
永久生效
vi /etc/hostname
HOSTNAME=Jarvan
临时生效
hostname Jarvan
安装hadoop2.6
SSH本机免登陆密码
ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
tar -zxvf hadoop-2.6.0-x64.tar.gz -C ~/modules/
vi ~/.bashrc
vi编写注意事项del删除后按a才能输入
##HADOOP
export HADOOP_HOME=/home/spark/modules/hadoop-2.6.0
export PATH=$PATH:/home/spark/modules/hadoop-2.6.0/sbin:/home/spark/modules/hadoop-2.6.0/bin
source ~/.bashrc
cd ~/modules/hadoop-2.6.0/etc/hadoop
slaves
jarvan.dragon.org
hadoop-env.sh
export JAVA_HOME=/home/spark/modules/jdk1.7/
yarn-env.sh
export JAVA_HOME=/home/spark/modules/jdk1.7/
mapred-env.sh
export JAVA_HOME=/home/spark/modules/jdk1.7/
core-site.xml
<property>
<name>hadoop.tmp.dir</name>
<value>/home/spark/tools/hadoopdata</value>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://jarvan.dragon.org:9000</value>
</property>
hdfs-site.xml
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
yarn-site.xml
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop-master.dragon.org</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
cp mapred-site.xml.template mapred-site.xml
mapred-site.xml
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
格式化
bin/hdfs namenode -format
启动 start-dfs.sh start-yarn.sh
安装scala-安装spark
tar -zxvf scala-2.10.5.tgz -C ~/modules/
tar -zxvf spark-1.6.0-bin-hadoop2.6.tgz -C ~/modules/
vi ~/.bashrc
export SCALA_HOME=/home/spark/modules/scala-2.10.5
export PATH=$PATH:$SCALA_HOME/bin
export SPARK_HOME=/home/spark/modules/spark-1.6.0-bin-hadoop2.6
export PATH=$PATH:$SPARK_HOME/bin
source ~/.bashrc
Spark Standalone模式 cluster model
cp spark-env.sh.template spark-env.sh
conf/spark-env.sh 详细说明看注释
JAVA_HOME=/home/spark/modules/jdk1.7
SCALA_HOME=/home/spark/modules/scala-2.10.5
HADOOP_CONF_DIR=/home/spark/modules/hadoop-2.6.0/etc/hadoop
SPARK_MASTER_IP=jarvan.dragon.org
SPARK_MASTER_PORT=7077 #默认7077
SPARK_MASTER_WEBUI_PORT=8080 #默认8080
SPARK_WORKER_CORES=1
SPARK_WORKER_MEMORY=1000m
SPARK_WORKER_PORT=7078
SPARK_WORKER_WEBUI_PORT=8081
SPARK_WORKER_INSTANCES=1
cp slaves.template slaves
conf/slaves worker节点配置
jarvan.dragon.org
cp spark-defaults.conf.template spark-defaults.conf
conf/spark-defaults.conf master节点配置
spark.master spark://jarvan.dragon.org:7077
启动Standalone模式
sbin/start-master.sh
sbin/start-slaves.sh
查看
http://192.168.192.138:8080/
打开Standalone模式的spark-shell
spark-shell --master spark://jarvan.dragon.org:7077 --executor-memory 300m
查看
http://192.168.192.138:4040/
测试
val num=sc.parallelize(1 to 10)
val rdd = num.map(x=>(x,1))
rdd.collect
rdd.saveAsTextFile("hdfs://192.168.192.138:9000/data/output1")
本地
spark-shell --master local[1]