大数据基础(五)从零开始安装配置Hadoop 2.7.2+Spark 2.0.0到Ubuntu 16.04

raw to spark


0 install ubuntu 14.04.01 desktop x64


1 system基础配置
《以下都是root模式》


1.3 root password
sudo passwd root


1.5 root登录选项
a.在terminal下输入:
vi /usr/share/lightdm/lightdm.conf.d/50-ubuntu.conf
b.内容入下:
[SeatDefaults]
autologin-user=root #这个可以不写,到时候选择账户要登录
user-session=ubuntu
greeter-show-manual-login=true
c.在terminal下輸入
#gedit/root/.profile
d.用reboot命令重启即可
e.如果弹出/root/.profile错误框,
将mesg n换成
tty -s && mesg n


1.6 Permit root ssh
$ sudo vi /etc/ssh/sshd_config
找到PermitRootLogin no一行,改为PermitRootLogin yes
 重启 openssh server
$ sudo service ssh restart


1.1 install openssh-server
apt-get -y install openssh-server
reboot
防火墙关闭
ufw disable


1.2 vim
apt-get -y install vim-gtk






1.5 固定ip
最好在图形界面,命令行改了之后就没有可用网络选项了。
图形界面:方法一
edit network
manual
ip 192.168.10.121
255.255.255.0
gateway 192.168.10.2
dns 192.168.10.2
其他不用设,重启


命令行界面:方法二
vi /etc/network/interfaces
# interfaces(5) file used by ifup(8) and ifdown(8)
# The loopback network interface
auto lo
iface lo inet loopback


# The primary network interface
auto eth0
iface eth0 inet static
address 192.168.10.121
netmask 255.255.255.0
gateway 192.168.10.2


DNS
vi /etc/resolv.conf
nameserver 192.168.10.2


防止重启后消失
vi /etc/resolvconf/resolv.conf.d/base
nameserver 192.168.10.2
执行
/etc/init.d/networking restart
可以上网


xx1.6 hosts
vi /etc/hosts
192.168.10.121  spark01
192.168.10.122  spark02
192.168.10.123  spark03
/etc/hostname
spark01
/etc/init.d/networking restart


1.4 teamviewer
dpkg -i teamviewer.xxx
dependancies
apt-get install -f
start teamviewer
$teamviewer


锁屏
系统设置-亮度和锁屏-关闭锁屏-lock off


xx1.7 ssh免密码登陆
su root
a. root@py-server:/home/py# ssh-keygen -t rsa -P ''
b. root@py-server:~# ssh-copy-id -i "root@10.1.1.7"
看是否拷完全:vi /root/.ssh/authorized_keys
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDiMzpM0xIinRKMmC3HRxV91kl2cNpSkEFwJEm/P9NGRKdhqCtErA8Lo9rt+oI/7db0FrGgw15hJQnzr0Nht0raSFthb1HYttG0bpcfJJhp9BZmxNBSRlaWLHGe1B1NQL2micXaTa+SXCBFedrUIpNFSaJ7TCkCkTMJdsoWlrF8iE/IMCazK71jhD2k+MaomzdVfuAKR68tu2CK/D79+q9Apy8MusLhkrYmOPBPXtt72x1rVG7BqkCwz7AYqH39IJJCj0VSxdYSXnEMrnNzsA8kyAfnqz6hzyuerZfG7sp/u+4hcvgzCtO+pGfoy+m0lOGn+SJ0PBAhjiAZquI+ncGr root@spark02
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCqH4QiuqYmA92JLE500L/02xiKUACbw1iBTZFGpthEYsAi31sWPWt6cE6ydEB7qklyMXX6fMkQ1/RhRrLVEuNho8YSwCMyoioLyXg2iue540/Ft12pifa30Buu+V1tTSwlpYBuQuyM9qhmXJ91OMGDochaj0E7MtOddLAqWxlxlsMeo+Bln/QzMPe0F99QasUHNUKAXWf77XOLGR4CMYhV/pVpoCuCLiO3sK/8yv6wJa61DrRtX9+/ANW2J4dXM7Iv4OebYlDdr0POSA0Qsu/pE71Wk2BKF52RLXGxsSAak/UgsjT4Ye3r73ZS7SCUWtRleI3NLZMM/3pQWLY7uKHH root@spark03
有对方的主机和用户名
【特别强调:虚拟机本机也要给本机拷一份自己的,要不然可能报错登陆不了本机!!!】


2 spark基础环境




2.1 jdk 1.8
jdk-8u91-linux-x64.tar.gz官方下的是jdk-8u91-linux-x64.gz,重命名成jdk-8u91-linux-x64.tar.gz
http://download.oracle.com/otn-pub/java/jdk/8u91-b14/jdk-8u91-linux-x64.tar.gz?AuthParam=1461460538_c9fec92cd12aba54d9b6cdefeb14a986
mkdir /usr/lib/java
tar -xvf jdk--8u91-linux-x64.tar.gz
export JAVA_HOME=/usr/lib/java/jdk1.8.0_101
export JRE_HOME=${JAVA_HOME}/jre
export CLASS_PATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
export PATH=${JAVA_HOME}/bin:$PATH
java -version


2.2 scala

tar vxf scala-2.11.8.tgz 
mkdir /usr/lib/scala
export SCALA_HOME=/usr/lib/scala/scala-2.11.8
export PATH=$SCALA_HOME/bin:$JAVA_HOME/bin:$JAVA_HOME/jre/bin:$PATH


2.3 python
ananconda 2.7
bash Anaconda...
/server/anaconda2
source ~/.bashrc


2.4 sbt?
http://www.scala-sbt.org/0.13/docs/Installing-sbt-on-Linux.html

xx2.5 zookeeper
tar xvzf zookeeper.xxx /server/zookeeper/
export ZOOKEEPER_HOME=/server/zookeeper
export JAVA_HOME=/usr/lib/java/jdk1.8.0_101
export JRE_HOME=${JAVA_HOME}/jre
export SCALA_HOME=/usr/lib/scala/scala-2.11.8
export CLASS_PATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
export PATH=$ZOOKEEPER_HOME/bin:$ZOOKEEPER_HOME/conf:$SCALA_HOME/bin:${JAVA_HOME}/bin:$PATH
root@ubuntu:/server/zookeeper# mkdir data
root@ubuntu:/server/zookeeper# mkdir logs


配置myid文件--唯一标识
root@spark01:/server/zookeeper/conf# echo 1 > /server/zookeeper/data/myid


root@spark02:/server/zookeeper/conf# echo 2 > /server/zookeeper/data/myid


root@spark03:/server/zookeeper/conf# echo 3 > /server/zookeeper/data/myid




root@ubuntu:/server/zookeeper/conf# cp zoo_sample.cfg zoo.cfg
vi zoo.cfg
dataDir=/server/zookeeper/data
dataLogDir=/server/zookeeper/logs
server.1=spark01:2888:3888
server.2=spark02:2888:3888
server.3=spark03:2888:3888

修改日志位置【默认放在启动目录下,很碍眼】
改$ZOOKEEPER_HOME/bin目录下的zkEnv.sh文件,ZOO_LOG_DIR指定想要输出到哪个目录,
也可以继续改ZOO_LOG4J_PROP,指定INFO,ROLLINGFILE的日志APPENDER. 
ZOO_LOG_DIR="/server/zookeeper/logs"
参考:http://www.programgo.com/article/8705462646/




3 hadoop安装


3.1 hadoop 2.7.2


http://mirrors.cnnic.cn/apache/hadoop/common/hadoop-2.7.2/hadoop-2.7.2.tar.gz
tar xvzf hadoop...gz
环境变量
vi ~/.bashrc 
export HADOOP_HOME=/server/hadoop
export ZOOKEEPER_HOME=/server/zookeeper
export JAVA_HOME=/usr/lib/java/jdk1.8.0_101
export JRE_HOME=${JAVA_HOME}/jre
export SCALA_HOME=/usr/lib/scala/scala-2.11.8
export CLASS_PATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$ZOOKEEPER_HOME/bin:$ZOOKEEPER_HOME/conf:$SCALA_HOME/bin:${JAVA_HOME}/bin:$PATH
配置:
修改$HADOOP_HOME/etc/hadoop/hadoop-env.sh文件 
#export JAVA_HOME=${JAVA_HOME}
export JAVA_HOME=/usr/lib/java/jdk1.8.0_101
修改$HADOOP_HOME/etc/hadoop/yarn-env.sh文件 
# export JAVA_HOME=/home/y/libexec/jdk1.6.0/
export JAVA_HOME=/usr/lib/java/jdk1.8.0_101
修改core-site.xml
mkdir /server/hadoop/tmp
<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://spark01:9000</value>
    </property>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/server/hadoop/tmp</value>
    </property>
    <property>
        <name>hadoop.native.lib</name>
    </property>
</configuration>
修改hdfs-site.xml
mkdir dfs/name
mkdir dfs/data
<configuration>
        <property>
                <name>dfs.replication</name>
                <value>2</value>
        </property>
        <property>
                <name>dfs.namenode.secondary.http-address</name>
                <value>spark01:50090</value>
                <description>The secondary namenode http server address and port.</description>
        </property>
        <property>
                <name>dfs.namenode.name.dir</name>
                <value>/server/hadoop/dfs/name</value>
        </property>
        <property>
                <name>dfs.datanode.data.dir</name>
                <value>/server/hadoop/dfs/data</value>
        </property>
        <property>
                <name>dfs.namenode.checkpoint.dir</name>
                <value>file:///server/hadoop/dfs/namesecondary</value>
                <description>Determines where on the local filesystem the DFSsecondary name node should store the temporary images to
merge. If this is acomma-delimited list of directories then the image is replicated in all of thedirectories for redundancy.</description>
     </property>
</configuration>


修改yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
    <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>spark01</value>
    </property>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
</configuration>


修改mapred-site.xml
<configuration>
        <property>
            <name>mapreduce.framework.name</name>
            <value>yarn</value>
        </property>
</configuration>
修改slaves (slaves是指定子节点的位置可能有一个默认值localhost,将其改为主机master)
vi slaves
spark01
spark02
spark03


此时Hadoop+HA配置文件已经配好,就差ssh免密码登录+格式化Hadoop系统。 
等我们装完所有软件(Zookeeper+hbase),克隆机器后再进行ssh免密码登录及Hadoop格式化。克隆后还需要更改每个节点的/etc/sysconfig/network中的hostname,以及更改master2中$HADOOP_HOME/etc/hadoop/yarn-site.xml文件的yarn.resourcemanager.ha.id属性值为rm2




4 spark


4.1 spark-1.6.2-bin-hadoop2.6.tgz (没有hadoop2.7相应的spark-hadoop bin编译好的版本,暂时用这个版,如果用spark-1.6.2.tgz,需要用sbt编译,太慢了,网速慢几个小时都不完。)
tar xvzf spark-1.6.2-bin-hadoop2.6.tgz
move to /server/spark
vi ~/.bashrc
export SPARK_MASTER_IP=192.168.10.121
export SPARK_WORKER_MEMORY=1g
export SPARK_HOME=/server/spark
export HADOOP_HOME=/server/hadoop
export HADOOP_CONF_DIR=/server/hadoop/etc/hadoop
export ZOOKEEPER_HOME=/server/zookeeper
export JAVA_HOME=/usr/lib/java/jdk1.8.0_101
export JRE_HOME=${JAVA_HOME}/jre
export SCALA_HOME=/usr/lib/scala/scala-2.11.8
export CLASS_PATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
export PATH=$SPARK_HOME/bin:$SPARK_HOME/sbin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$ZOOKEEPER_HOME/bin:$ZOOKEEPER_HOME/conf:$SCALA_HOME/bin:${JAVA_HOME}/bin:$PATH
# If not running interactively, don't do anything
修改 slaves
root@spark01:/server/spark/conf# cp slaves.template 
删除localhost
spark01
spark02
spark03
测试是否成功
spark-shell --version
1.6.2


【4.1.1 升级spark 2.0.0】
mv /server/spark/ /server/spark-1.6.2/ 
cd /server
tar xvzf spark-2.0.0-bin-hadoop2.7.tgz
mv /server/spark-2.0.0-bin-hadoop2.7 /server/spark
cd spark/conf
cp slaves.template slaves
vi slaves
spark01
spark02
spark03


4.2 安装到其他机器或者拷贝镜像到其他虚拟机


拷贝ubuntu镜像到两个新的虚拟机或者在其他两台服务器安装以上环境


修改新机器等ip hostname,其他配置不用改
新机器的ssh重新gen一下并分发,hadoop格式化
可以查看vi /root/.ssh/authorized_keys文件,就能发现每台机器的密钥。
如果不全,一定要拷贝全!!! 
!!!注意:
虚拟机可能出现不同机器相同的id_rsa.pub里hostname,虽然/etc/hosts,/etc/hostname是对的,说明还是拷贝镜像有点问题,只要在id_rsa.pub改正确了,再拷贝。
!!!注意:
虚拟机上,本机也要给本机ssh-copy-id,要不然还是错


4.3格式化hdfs
在spark01也就是master上,运行hdfs namenode -format格式化namenode




5 启动


5.1 启动zookeeper集群


启动zookeeper集群
###注意:严格按照下面的步骤


在 MasterWorker1 Worker2上启动zookeeper
Master spark01
root@spark01:~# cd /server/zookeeper/bin/
root@spark01:/server/zookeeper/bin# ./zkServer.sh start


Worker1 spark02
root@spark02:~# cd /server/zookeeper/bin/
root@spark02:/server/zookeeper/bin# ./zkServer.sh start


Worker2 spark03
root@spark03:~# cd /server/zookeeper/bin/
root@spark03:/server/zookeeper/bin# ./zkServer.sh start


#查看状态:一个leader,两个follower
root@spark01:/server/zookeeper/bin# ./zkServer.sh status
root@spark02:/server/zookeeper/bin# ./zkServer.sh status
root@spark03:/server/zookeeper/bin# ./zkServer.sh status
【注意:之前报not running error 是因为myid没有配置】


5.2 启动 Hadoop 集群
以下5.2,5.3都在master节点即spark01上操作
5.2.1 启动dfs
cd /server/hadoop/sbin/
root@spark01:/server/hadoop/sbin# ./start-dfs.sh
ssh有问题,还得输密码???
5.2.2启动yarn
root@spark01:/server/hadoop/sbin# ./start-yarn.sh
5.2.3启动job history【可选】


5.3 启动 Spark集群
cd /server/spark/sbin
root@spark01:/server/spark/sbin# ./start-all.sh
启动history server【可选】
root@spark01:/server/spark/sbin#./start-history-server.sh


5.4 验证
5.4.1 jps
spark01 6个必须的
quorumpeermain zookeeper的
namenode
master
resourcemanager
secondarynamenode
historyserver


root@spark01:/server/spark/sbin# jps
4194 Main
3314 ResourceManager
2678 NameNode
4199 Jps
3961 Master
3468 NodeManager
4093 Worker
2830 DataNode
2415 QuorumPeerMain
3039 SecondaryNameNode






spark02 4个必须的
datanode
nodemanager
worker
quorumpeermain




root@spark02:/server/zookeeper/bin# jps
2640 DataNode
3169 Jps
2486 QuorumPeerMain
3048 Worker
2798 NodeManager


root@spark03:/server/zookeeper/bin# jps
2817 NodeManager
2659 DataNode
2505 QuorumPeerMain
3194 Jps
3067 Worker




5.4.2
如果想在windows下看以下网址,需要在C:\Windows\System32\drivers\etc\host增加spark01,02,03的ip和hostname
192.168.10.121      spark01
192.168.10.122      spark02
192.168.10.123      spark03
【如无法保存,可以先保存到其他路径,然后再拷过来覆盖即可】
hadoop
http://spark01:50070/
http://spark01:8088
spark
http://spark01:8080/
http://spark01:18080/


5.4.3 spark submit
root@spark01:/server/spark/sbin# spark-submit --master yarn-cluster --class org.apache.spark.examples.SparkLR --name SparkLR /server/spark/lib/spark-examples-1.6.2-hadoop2.6.0.jar
结果:
16/07/25 06:22:32 INFO yarn.Client: 
     client token: N/A
     diagnostics: N/A
     ApplicationMaster host: 192.168.10.121
     ApplicationMaster RPC port: 0
     queue: default
     start time: 1469452521987
     final status: SUCCEEDED
     tracking URL: http://spark01:8088/proxy/application_1469451821296_0002/
     user: root
16/07/25 06:22:32 INFO yarn.Client: Deleting staging directory .sparkStaging/application_1469451821296_0002
16/07/25 06:22:33 INFO util.ShutdownHookManager: Shutdown hook called
16/07/25 06:22:33 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-0a4ae85e-0e77-4e57-bd46-a2371a6a20ee


5.4.4 hadoop submit
root@spark01:/home/alex# vi words
java
c++
hello
hello
python
java
java
java
c
c


root@spark01:/home/alex# hadoop fs -mkdir /data
root@spark01:/home/alex# hadoop fs -put words /data
root@spark01:/home/alex# hadoop jar /server/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar wordcount /data /output
root@spark01:/home/alex# hadoop fs -cat /output/part-r-00000
c   2
c++ 1
hello   2
java    4
python  1






6 安装ubuntu 16.04.01 desktop x64
6.0 linux配置参考1.1-1.7
ip 192.168.10.124 host spark04
6.1 安装Anaconda2
在124上
bash Anaconda2-4.1.1-Linux-x86_64.sh
6.2 ssh免密码
在121上
ssh-copy-id -i "root@192.168.10.124"
在124上
ssh-copy-id -i "root@192.168.10.121"
ssh-copy-id -i "root@192.168.10.122"
ssh-copy-id -i "root@192.168.10.123"
ssh-copy-id -i "root@192.168.10.124"
在122上
ssh-copy-id -i "root@192.168.10.124"
在123上
ssh-copy-id -i "root@192.168.10.124"
6.3 复制已配置等程序(都是不需要编译等二进制程序)
在121上
scp -r /usr/lib/java root@192.168.10.124:/usr/lib/java
scp -r /usr/lib/scala root@192.168.10.124:/usr/lib/scala
scp -r /server/zookeeper/ root@192.168.10.124:/server/zookeeper/
scp -r /server/hadoop/ root@192.168.10.124:/server/hadoop/
scp -r /server/spark/ root@192.168.10.124:/server/spark/
#可选scp -r /server/spark-1.6.2/ root@192.168.10.124:/server/spark-1.6.2/
6.4 修改配置
6.4.1 ~/.bashrc
在124上修改或者scp复制
export SPARK_MASTER_IP=192.168.10.121
export SPARK_WORKER_MEMORY=1g
export SPARK_HOME=/server/spark
export HADOOP_HOME=/server/hadoop
export HADOOP_CONF_DIR=/server/hadoop/etc/hadoop
export ZOOKEEPER_HOME=/server/zookeeper
export JAVA_HOME=/usr/lib/java/jdk1.8.0_101
export JRE_HOME=${JAVA_HOME}/jre
export SCALA_HOME=/usr/lib/scala/scala-2.11.8
export CLASS_PATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
export PATH=$SPARK_HOME/bin:$SPARK_HOME/sbin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$ZOOKEEPER_HOME/bin:$ZOOKEEPER_HOME/conf:$SCALA_HOME/bin:${JAVA_HOME}/bin:$PATH
root@spark04:/server# vi ~/.bashrc
root@spark04:/server# source ~/.bashrc
6.4.2 hosts
在121上
vi /etc/hosts
192.168.10.121  spark01
192.168.10.122  spark02
192.168.10.123  spark03
192.168.10.124  spark04
scp /etc/hosts root@192.168.10.122:/etc/hosts
scp /etc/hosts root@192.168.10.123:/etc/hosts
scp /etc/hosts root@192.168.10.124:/etc/hosts
6.4.3 zookeeper
在121上
vi /server/zookeeper/conf/zoo.cfg
dataDir=/server/zookeeper/data
dataLogDir=/server/zookeeper/logs
server.1=spark01:2888:3888
server.2=spark02:2888:3888
server.3=spark03:2888:3888
server.4=spark04:2888:3888
scp /server/zookeeper/conf/zoo.cfg root@192.168.10.122:/server/zookeeper/conf/zoo.cfg
scp /server/zookeeper/conf/zoo.cfg root@192.168.10.123:/server/zookeeper/conf/zoo.cfg
scp /server/zookeeper/conf/zoo.cfg root@192.168.10.124:/server/zookeeper/conf/zoo.cfg
在124上
vi /server/zookeeper/data/myid 
1改成4
6.4.4 hadoop slaves
在121上
vi /server/hadoop/etc/hadoop/slaves 
spark01
spark02
spark03
spark04
方法一:
vi copy.sh
destfile=('root@192.168.10.122' 'root@192.168.10.123' 'root@192.168.10.124')
for dest in ${destfile[@]}; do
    scp /server/hadoop/etc/hadoop/slaves ${dest}:/server/hadoop/etc/hadoop/slaves
done
chmod +x copy.sh
bash copy.sh
方法二:
scp /server/hadoop/etc/hadoop/slaves root@192.168.10.122:/server/hadoop/etc/hadoop/slaves
scp /server/hadoop/etc/hadoop/slaves root@192.168.10.123:/server/hadoop/etc/hadoop/slaves
scp /server/hadoop/etc/hadoop/slaves root@192.168.10.124:/server/hadoop/etc/hadoop/slaves
6.4.5 hadoop格式化新节点的namenode
增加slave不需要再格式化namenode
格式化新节点的namenode:
在124上
cd $HADOOP_HOME/bin
./hdfs namenode -format
若弹出Re-format 选Y,重新格式化
http://www.cnblogs.com/simplestupid/p/4695644.html
在121上
在$HADOOP_HOME/sbin/下执行start-balancer.sh
具体可参考本文附录


6.4.6 spark slaves
在121上
vi /server/spark/conf/slaves
spark01
spark02
spark03
spark04
scp /server/spark/conf/slaves root@192.168.10.122:/server/spark/conf/slaves
scp /server/spark/conf/slaves root@192.168.10.123:/server/spark/conf/slaves
scp /server/spark/conf/slaves root@192.168.10.124:/server/spark/conf/slaves
6.4.7
在windows上,在C:\Windows\System32\drivers\etc\host增加
192.168.10.124      spark04
###########################
可选1:
vi /server/spark-1.6.2/conf/slaves
spark01
spark02
spark03
spark04
scp /server/spark-1.6.2/conf/slaves root@192.168.10.122:/server/spark/conf/slaves
scp /server/spark-1.6.2/conf/slaves root@192.168.10.123:/server/spark/conf/slaves
scp /server/spark-1.6.2/conf/slaves root@192.168.10.124:/server/spark/conf/slaves




7 关机
7.1 zookeeper
cd /server/zookeeper/bin/
root@spark01:/server/zookeeper/bin# ./zkServer.sh stop
cd /server/zookeeper/bin/
root@spark02:/server/zookeeper/bin# ./zkServer.sh stop
cd /server/zookeeper/bin/
root@spark03:/server/zookeeper/bin# ./zkServer.sh stop


7.2 spark
root@spark01:/server/spark/sbin# ./stop-all.sh


7.3 hadoop
root@spark01:/server/hadoop/sbin# ./stop-all.sh


############################
参考
http://www.aboutyun.com/thread-17546-1-1.html
http://blog.csdn.net/onepiecehuiyu/article/details/45271493 同步时间、hbase等
http://blog.csdn.net/yeruby/article/details/49805121
http://my.oschina.net/amui/blog/610288
http://blog.chinaunix.net/uid-20682147-id-4220311.html
http://www.cnblogs.com/simplestupid/p/4695644.html 新节点安装以及负载查看
http://ribbonchen.blog.163.com/blog/static/118316505201421824512391/ 新增删除节点


############################
如果ssh ubuntu 16 不行的话
在124上sudo service ssh restart
先在121-123上ssh spark04 yes 之后再启动hadoop就不会出现yes/no的ssh登陆提示


如果发现某个worker没启动,先检查master上的slaves文件是否含有这个worker。


############################
新增节点等负载平衡
hadoop负载平衡
4.均衡block 
转载:http://ribbonchen.blog.163.com/blog/static/118316505201421824512391/
http://www.cnblogs.com/simplestupid/p/4695644.html
root@spark04 hadoop# ./bin/start-balancer.sh
1)如果不balance,那么cluster会把新的数据都存放在新的node上,这样会降低mapred的工作效率 
2)设置平衡阈值,默认是10%,值越低各节点越平衡,但消耗时间也更长 
root@spark04 hadoop# ./bin/start-balancer.sh -threshold 5
3)设置balance的带宽,默认只有1M/s
<property>
  <name>dfs.balance.bandwidthPerSec</name>  
  <value>1048576</value>  
  <description>  
    Specifies the maximum amount of bandwidth that each datanode   
    can utilize for the balancing purpose in term of   
    the number of bytes per second.   
  </description> 
</property>

####################################
实体机的spark日志错误
启动history-server时出现的。
在启动时,去掉history,要不就在后边加个路径
具体配置在conf/spark-defaults.conf里修改。

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值