1.flume安装部署
1.1、下载安装介质,并解压:
cd /usr/local/
wget http://archive.cloudera.com/cdh5/cdh/5/flume-ng-1.6.0-cdh5.7.0.tar.gz
tar -zxvf flume-ng-1.6.0-cdh5.7.0.tar.gz
ln -s apache-flume-1.6.0-cdh5.7.0-bin/ flume
1.2、配置flume工作
cd /usr/local/flume/conf
cp flume-env.sh.template flume-env.sh
vi flume-env.sh 【增加Java Home路径】
export JAVA_HOME=/usr/java/jdk1.8.0_152
vi /etc/profile
export FLUME_HOME=/usr/local/flume
export PATH=$FLUME_HOME/bin:$PATH
source /etc/profile
1.3、验证flume安装,编写配置文件
新建example.conf文件,注此配置输入源为netcat,通道为memory,输出为logger
vi example.conf
# example.conf: A single-node Flume configuration
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# Describe/configure the source
a1.sources.r1.type = netcat
a1.sources.r1.bind = 192.168.0.186
a1.sources.r1.port = 45678
# Describe the sink
a1.sinks.k1.type = logger
a1.sinks.k1.maxBytesToLog = 10
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
1.4、测试验证
启动flume监听进程:
flume-ng agent --name a1 \
--conf $FLUME_HOME/conf \
--conf-file $FLUME_HOME/conf/example.conf -Dflume.root.logger=INFO,console
安装telnet
yum install -y telnet
telnet 192.168.0.186 45678
2、zookeeper部署
因Kafka把它的meta数据都存储在ZK上,所以说ZK是他的必要存在没有ZK没法运行Kafka;在老版本(0.8.1以前)里面消费段(consumer)也是依赖ZK的,在新版本中移除了客户端对ZK的依赖,但是broker依然依赖于ZK。所以必须在kafka配置前部署完成ZK.
2.1、下载安装介质,并解压:
cd /usr/local/
wget https://archive.apache.org/dist/zookeeper/zookeeper-3.4.6/zookeeper-3.4.6.tar.gz
tar -zxvf zookeeper-3.4.6.tar.gz
2.2、配置flume工作
chown -R root:root zookeeper-3.4.6
ln -s zookeeper-3.4.6 zookeeper
vi /etc/profile
export ZOOKEEPER_HOME=/root/apps/zookeeper
export PATH=$ZOOKEEPER_HOME/bin:$PATH
source /etc/profile
cd zookeeper
mkdir data
cd conf
cp zoo_sample.cfg zoo.cfg
vi zoo.cfg
dataDir=/usr/local/zookeeper/data
2.3、启动zookeeper
服务器:zkServer.sh start/stop/status
3、kafka部署
3.1、下载安装介质,并解压:
cd /usr/local/
wget https://archive.apache.org/dist/kafka/0.10.0.1/kafka_2.11-0.10.0.1.tgz
tar -zxvf kafka_2.11-0.10.0.1.tgz
3.2、配置kafka工作
cd kafka/config/
vi server.properties
#broker的ID,在集群中需要唯一
broker.id=1
#Socket Server端号
port=9092
#Socket Server服务IP地址
host.name=192.168.137.132
#kafka日志文件存储
log.dirs=/usr/local/kafka/kafka-logs
#连接ZK存放kafka元数据位置
zookeeper.connect=192.168.0.186:2181/kafka
mkdir -p /usr/local/kafka/kafka-logs
vi /etc/profile
export KAFKA_HOME=/usr/local/kafka
export PATH=$KAFKA_HOME/bin:$PATH
source /etc/profile
3.3、验证kafka安装
启动kafka
nohup bin/kafka-server-start.sh config/server.properties &
创建kafka的broker
kafka-topics.sh --create \
--zookeeper 192.168.0.186:2181/kafka \
--replication-factor 1 --partitions 1 --topic test
生产者
kafka-console-producer.sh \
--broker-list 192.168.0.186:9082 --topic test
消费者
kafka-console-consumer.sh \
--zookeeper 192.168.0.186:2181/kafka \
--from-beginning --topic test
4、flume+kafka
4.1、flume配置
vi flume-kafka-memory.conf
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# Describe/configure the custom exec source
a1.sources.r1.type = netcat
a1.sources.r1.bind = 0.0.0.0
a1.sources.r1.port = 44444
# Describe the sink
a1.sinks.k1.type = org.apache.flume.sink.kafka.KafkaSink
a1.sinks.k1.topic = test
a1.sinks.k1.brokerList = 192.168.0.186:9092
//a1.sinks.kai.kafka.producer.compression.type = snappy
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.keep-alive = 90
a1.channels.c1.capacity = 2000000
a1.channels.c1.transactionCapacity = 6000
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
4.2、flume启动
flume-ng agent --name a1 --conf $FLUME_HOME/conf --conf-file $FLUME_HOME/conf/netcat-memory-Kafka.conf -Dflume.root.logger=INFO,console
启动kafka
kafka-server-start.sh -daemon /usr/local/kafka/config/server.properties
创建kafka的blocker
kafka-topics.sh --create \
--zookeeper 192.168.0.186:2181/kafka \
--replication-factor 1 --partitions 1 --topic topicD
查看topics
kafka-topics.sh --list --zookeeper localhost:2181/kafka
import org.apache.kafka.common.serialization.StringDeserializer import org.apache.spark.{SparkConf, TaskContext} import org.apache.spark.streaming.{Seconds, StreamingContext} import org.apache.spark.streaming.kafka010._ import org.apache.spark.streaming.kafka010.LocationStrategies.PreferConsistent import org.apache.spark.streaming.kafka010.ConsumerStrategies.Subscribe object DkafkatoStreaming { def main(args: Array[String]) { val sparkconf=new SparkConf().setAppName("project").setMaster("local") val ssc=new StreamingContext(sparkconf,Seconds(5)) val kafkaParams = Map[String, Object]( "bootstrap.servers" -> "116.207.129.109:9082", "key.deserializer" -> classOf[StringDeserializer], "value.deserializer" -> classOf[StringDeserializer], "group.id" -> "use_a_separate_group_id_for_each_stream", "auto.offset.reset" -> "latest", "enable.auto.commit" -> (false: java.lang.Boolean) ) //ssc.checkpoint("hdfs://116.207.129.109:9000/checkproint") val topics = Array("streaming_topic") val stream = KafkaUtils.createDirectStream[String, String]( ssc, PreferConsistent, Subscribe[String, String](topics, kafkaParams) ) val lines=stream.map(_.value) val word=lines.flatMap(_.split(",")).map(x=>(x,1)).reduceByKey(_+_).print ssc.start() ssc.awaitTermination() } }
5、遇到的错误
Error while executing topic command : replication factor: 1 larger than available brokers: 0
[2018-04-24 01:26:45,715] ERROR kafka.admin.AdminOperationException: replication factor: 1 larger than available brokers: 0
at kafka.admin.AdminUtils$.assignReplicasToBrokers(AdminUtils.scala:117)
at kafka.admin.AdminUtils$.createTopic(AdminUtils.scala:403)
at kafka.admin.TopicCommand$.createTopic(TopicCommand.scala:110)
at kafka.admin.TopicCommand$.main(TopicCommand.scala:61)
at kafka.admin.TopicCommand.main(TopicCommand.scala)
(kafka.admin.TopicCommand$)
启动命令加上/kafka
6、调优点:
timeout heap rpc
producter:
acks
buffer.memory
compression.type
retries
batch.size 数据的条数,而不是数据一批次的大小
broker:
max.message.bytes 每条消息的最大size 2M
replica.fetch.max.bytes 尝试接收信息的最大字节,要大于等于上面的 4M
zookeeper.connection.timeout.ms
consumer:
fetch.message.max.bytes
【来自@若泽大数据】