Spark Hadoop Kafka 部署安装文档

SPARK集群 端口使用整理

服务
端口
备注
spark-master 7077  
spark-slave    
hadoop-master 9000  
kafka-zookeeper
2181
 
kafka-master 9092  

说明 带master的服务端口 需要暴露给业务程序;  hadoop master slave 和 spark master slave 部分通讯是以 ssh为通道, 所以master和slave之间需要开启ssh免密码登录

 

1.系统环境

1.1 master slave 之间ssh免密码登录


cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
# 保证 .ssh 权限 700
#      authorized_keys 权限 744
#      id_rsa.pub      权限 744

1.2 设置master slave 域名 vim /etc/hosts

{host ip}  spark-master
{host ip}  hdfs-master
{host ip}  kafka-master

2 hadoop

2.1 下载 解压


# 下载
wget http://********/hadoop-2.7.3/hadoop-2.7.3.tar.gz

# 解压
tar -zxvf hadoop-2.7.3.tar.gz

cd hadoop-2.7.3

2.2 配置环境变量 vim ~/.bashrc


export JAVA_HOME=/usr/java/latest
export HADOOP_HOME=/*****/hadoop-2.7.3
export JRE_HOME=$JAVA_HOME/jre
export CLASSPATH=$JAVA_HOME/lib:$JRE_HOME/lib:$CLASSPATH
export HADOOP_HOME_WARN_SUPPRESS=1
export PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$HADOOP_HOME/bin:$PATH

2.3 hadoop 配置

2.3.1 core-site.xml

vim conf/core-site.xml
 

<configuration>
        <property>
                <name>fs.defaultFS</name>
                <value>hdfs://hdfs-master:9000</value>
        </property>
        <property>
                <name>hadoop.tmp.dir</name>
                <value>********/hadoop/tmp</value>
        </property>
         <property>
                 <name>fs.trash.interval</name>
                 <value>1440</value>
        </property>
</configuration>

mkdir -pfr ******/hadoop/tmp

2.3.2 hdfs-site.xml

 
vim conf/hdfs-site.xml


<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
    <property>
        <name>dfs.permissions</name>
        <value>false</value>
    </property>
    <property>
        <name>dfs.support.append</name>
        <value>true</value>
    </property>
    <property>
        <name>dfs.client.block.write.replace-datanode-on-failure.enable</name>
        <value>true</value>
    </property>
    <property>
        <name>dfs.client.block.write.replace-datanode-on-failure.policy</name>
        <value>never</value>
    </property>
</configuration>

2.3.3 log4j.properties

vim conf/log4j.properties
log4j.logger.org.apache.hadoop.util.NativeCodeLoader=ERROR

2.3.4 mapred-site.xml

cp mapred-site.xml.template mapred-site.xml 
vim conf/mapred-site.xml


<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
</configuration>

2.3.5 yarn-site.xml

 
vim conf/yarn-site.xml
 
<configuration> 

<property> 

<name>yarn.nodemanager.aux-services</name> 

<value>mapreduce_shuffle</value> 

</property> 

<property> 

<name>yarn.log-aggregation-enable</name>

  <value>true</value> 

</property>

 </configuration>

2.4 格式化namenode

hadoop namenode -format

 
看到倒数N行 包含
xx/xx/xx xx:xx:xx INFO common.Storage: Storage directory /*****/hadoop/tmp/dfs/name has been successfully formatted.
说明创建成功

2.5 启动hadoop


$HADOOP_HOME/sbin/start-dfs.sh

2.6 测试hdfs


hdfs dfs -put README.txt /README.txt
hdfs dfs -cat /README.txt
# 能打印README内容说明OK

hadoop(hdfs) 使用端口 hdfs-master:9000

3 spark

3.1 下载


wget http://*****/spark-2.1.0-bin-hadoop2.7.tgz

tar -zxvf spark-2.1.0-bin-hadoop2.7.tgz
cd spark-2.1.0-bin-hadoop2.7

3.2 vim ~/.bashrc


export SPARK_HOME=/home/*****/spark-2.1.0-bin-hadoop2.7

3.3 修改 conf/spark-defaults.conf

cp spark-defaults.conf.template spark-defaults.conf


spark.master                     spark://spark-master:7077
spark.eventLog.enabled           true
spark.serializer                 org.apache.spark.serializer.KryoSerializer
spark.driver.memory              1g
spark.executor.extraJavaOptions  -XX:+PrintGCDetails -Dkey=value -Dnumbers="one two three"

spark.ui.enabled                 false
spark.executor.memory            1g

3.4 修改 vim $SPARK_HOME/conf/log4j.properties

cp log4j.properties.template log4j.propertie

修改
log4j.rootCategory=INFO, console, file   
 
新增
log4j.appender.file=org.apache.log4j.DailyRollingFileAppender 

log4j.appender.file.File=/home/******/spark/log 

log4j.appender.file.DatePattern='.'yyyy-MM-dd 

log4j.appender.file.layout=org.apache.log4j.PatternLayout log4j.appender.file.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss} %-5p %c{1} - %m%n


mkdir -p /home/******/spark/

3.5 启动SPARK


$SPARK_HOME/sbin/start-all.sh

 

3.6 验证Spark是否启动

、、、
jps
```
 
$ jps


14340 SecondaryNameNode
14132 DataNode
13960 NameNode
14760 Master
14953 Jps
14892 Worker
出现Master Worker说明已启动

spark master 使用端口 spark-master 7077

4 kafka

4.1 下载


wget http://apache.mirror.iweb.ca/kafka/0.10.2.0/kafka_2.11-0.10.2.0.tgz
tar -zxvf kafka_2.11-0.10.2.0.tgz
cd kafka_2.11-0.10.2.0

4.2 kafka端口配置

vim config/server.properties


############################# Socket Server Settings #############################

# The address the socket server listens on. It will get the value returned from 
# java.net.InetAddress.getCanonicalHostName() if not configured.
#   FORMAT:
#     listeners = listener_name://host_name:port
#   EXAMPLE:
#     listeners = PLAINTEXT://your.host.name:9092
#listeners=PLAINTEXT://:9092

4.3 启动


nohup bin/zookeeper-server-start.sh config/zookeeper.properties  >/dev/null 2>&1 &
nohup bin/kafka-server-start.sh config/server.properties >/dev/null 2>&1 &

4.4 创建topic


bin/kafka-topics.sh --create --zookeeper kafka-master:2181 --replication-factor 1 --partitions 1 --topic log

4.5 测试


# 创建一个 test 主题
bin/kafka-topics.sh --create --zookeeper kafka-master:2181 --replication-factor 1 --partitions 1 --topic test
# 发送一个消息
bin/kafka-console-producer.sh --broker-list kafka-master:9092 --topic test
>  it's a test message!


bin/kafka-console-consumer.sh --bootstrap-server kafka-master:9092 --topic test --fm-beginning
 
# 看到接收到 "it's a test message!" 就OK
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值