安装好flume
配置flume配置文件,确定flume数据源以及要将数据发送给谁
安装telnet
apt-getinstall xinetd telnetd
安装后使用显示
root@master:/usr/local/hadoop-2.7.5/sbin#telnet
bash:telnet: command not found
因为telnet依赖xinetd启动,所以xinetd得先启动
root@master:/etc/xinetd.d#service xinetd status
* isnot running
root@master:/etc/xinetd.d#service xinetd staart
Usage:/etc/init.d/xinetd {start|stop|reload|force-reload|restart|status}
root@master:/etc/xinetd.d#service xinetd start
* Starting internet superserver xinetd [ OK ]
root@master:/etc/xinetd.d#
出现问题
root@master:/etc/xinetd.d#apt-get install telnetd
Readingpackage lists... Done
Buildingdependency tree
Readingstate information... Done
telnetdis already the newest version (0.17-40).
0upgraded, 0 newly installed, 0 to remove and 4 not upgraded.
root@master:/etc/xinetd.d#telnetd
bash:telnetd: command not found
root@master:/etc/xinetd.d#
然后百度,竟然百度到自己之前写的博客。。。。。。
问题解决
不是apt-get install xinetd telnetd,是apt-getinstall telnet
使用apt-get install telnet安装后就能用了,同样的错误两次。。。。。。
配置flume链接spark-streaming需要的jar包
首先看本地的scala版本,spark版本
root@master:/usr/local/spark/bin#spark-shell
Settingdefault log level to "WARN".
Toadjust logging level use sc.setLogLevel(newLevel).
18/04/2601:28:22 WARN spark.SparkContext: Use an existing SparkContext, someconfiguration may not take effect.
Sparkcontext Web UI available at http://172.17.0.2:4040
Sparkcontext available as 'sc' (master = local[*], app id = local-1524706101693).
Sparksession available as 'spark'.
Welcometo
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.0.2
/_/
UsingScala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_162)
Typein expressions to have them evaluated.
Type:help for more information.
scala>
然后到官网下载相应的包
http://mvnrepository.com/artifact/org.apache.spark/spark-streaming-flume_2.11/2.0.2
然后复制到docker中
sudodocker cp spark-streaming-flume_2.11-2.0.2.jar master:/root/build
放入spark的jars目录下
root@master:/usr/local/spark/jars#mkdir flume
root@master:/usr/local/spark/jars#cd flume/
root@master:/usr/local/spark/jars/flume#ll
total8
drwxr-xr-x2 root root 4096 Apr 26 01:33 ./
drwxr-xr-x3 500 500 4096 Apr 26 01:33 ../
root@master:/usr/local/spark/jars/flume#cp /root/build/spark-streaming-flume_2.11-2.0.2.jar .
root@master:/usr/local/spark/jars/flume#ll
total112
drwxr-xr-x2 root root 4096 Apr 26 01:34 ./
drwxr-xr-x3 500 500 4096 Apr 26 01:33 ../
-rw-------1 root root 105087 Apr 26 01:34 spark-streaming-flume_2.11-2.0.2.jar
root@master:/usr/local/spark/jars/flume#
修改spark-env.sh文件中的SPARK_DIST_CLASSPATH变量
原来是export SPARK_DIST_CLASSPATH=$(/usr/local/hadoop-2.7.5/bin/hadoopclasspath)
添加为export SPARK_DIST_CLASSPATH=$(/usr/local/hadoop-2.7.5/bin/hadoopclasspath):/usr/local/spark/jars/flume/*:/usr/local/flume/lib/*
编写spark程序测试
网上找到一段代码,来源http://dblab.xmu.edu.cn/blog/1745-2/
from__future__ import print_function
importsys
frompyspark import SparkContext
frompyspark.streaming import StreamingContext
frompyspark.streaming.flume import F