Hadoop集群搭建（四）----sqoop和flume安装

最新推荐文章于 2021-10-01 09:46:24 发布

俗人的浅吟^ω^

最新推荐文章于 2021-10-01 09:46:24 发布

阅读量506

点赞数

分类专栏： flume sqoop 文章标签： flume sqoop hadoop 大数据 mysql

本文链接：https://blog.csdn.net/qq_43646876/article/details/108942251

版权

flume 同时被 2 个专栏收录

1 篇文章 0 订阅

订阅专栏

sqoop

1 篇文章 0 订阅

订阅专栏

这篇博客就继续接着说如何在虚拟机中安装及配置flume和sqoop

Ⅰ，相关组件（附带下载链接）

hadoop生态圈还有很多的组件。例如Spark,Hbase,hive等等，因为篇幅问题，这些软件在此处不介绍，给出下载链接，其余的安装教程将会在其余的博客给出,(本文需要使用链接中的flume和sqoop，将其安装包上传到root目录)

集群组件下载链接
密码：zccy

Ⅱ，安装Flume

1，解压安装包

将准备好的flume安装包解压

tar -xvf /root/apache-flume-1.8.0-bin.tar.gz
mv apache-flume-1.8.0-bin ./flume

2，设置环境变量

vi .bash_profile

添加下面几条

export FLUME_HOME=/root/flume
export FLUME_CONF_DIR=$FLUME_HOME/conf
export PATH=$PATH:$FLUME_HOME/bin

如图
更新配置

source .bash_profile

3，设置flume-env.sh配置文件

复制模板文件

cp /root/flume/conf/flume-env.sh.template /root/flume/conf/flume-env.sh
vi /root/flume/conf/flume-env.sh

新增下面内容（JAVA_HOME以自己的路径为准）

JAVA_HOME=/root/jdk/jdk1.8.0_144/
JAVA_OPTS="-Xms100m -Xmx200m -Dcom.sun.management.jmxremote"

4，修改 flume-conf 配置文件

复制模板文件

cp /root/flume/conf/flume-conf.properties.template /root/flume/conf/flume-conf.properties
vi /root/flume/conf/flume-conf.properties

编辑如下


# The configuration file needs to define the sources,
# the channels and the sinks.
# Sources, channels and sinks are defined per agent,
# in this case called 'agent'

a1.sources = r1
a1.sinks = k1
a1.channels = c1

# For each one of the sources, the type is defined

a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 44444

# The channel can be defined as follows.

a1.sources.r1.channels = c1

# Each sink's type must be defined

a1.sinks.k1.type = logger

#Specify the channel the sink should use

a1.sinks.k1.channel = c1

# Each channel's type is defined.

a1.channels.c1.type = memory

# Other config values specific to each type of channel(sink or source)
# can be defined as well
# In this case, it specifies the capacity of the memory channel
a1.channels.c1.capacity = 1000
a1.channels.c1.transactioncapacity= 100

如图
（注意：a1.sources.r1.bind填的是接收方的ip（如果ip不是自动获取的可以使用localhost））

5，功能验证

1，导入jar包

为了让我们的flume和hadoop交互，首先我们需要将链接（密码zccy）中的六个jar包导入到flume/lib/中
进入flume 的安装目录下运行

cd /root/flume

先安装一些服务


yum install telnet(可能需要执行两次)
yum install telnet-server
yum -y install xinetd
systemctl enable xinetd.service
systemctl enable telnet.socket
systemctl start telnet.socket
systemctl start xinetd

输入指令

/root/flume/bin/flume-ng agent --conf ./conf/ --conf-file ./conf/flume-conf.properties --name a1 -Dflume.root.logger=INFO,console

打开另一个节点，也安装一些服务

yum install telnet
yum install telnet-server
yum -y install xinetd
systemctl enable xinetd.service
systemctl enable telnet.socket
systemctl start telnet.socket
systemctl start xinetd

连接输入hello

telnet 192.168.0.2 44444
hello

出现ok说明配置成功

此时接受方就接到了信息

2，测试收集日志到hdfs

(此部分借鉴借鉴连接配置,
注意：
① a2.sources.r2.command为收集日志位置需要根据自己位置调整（这里检测的是hive的log，默认存在tmp/用户 里）
②a2.sinks.k2.hdfs.path为上传位置需要根据自己位置调整
)

cd /root/flume/conf
touch flume-file-hdfs.conf
vim flume-file-hdfs.conf

修改如下

# Name the components on this agent
a2.sources = r2
a2.sinks = k2
a2.channels = c2

# Describe/configure the source
a2.sources.r2.type = exec
a2.sources.r2.command = tail -F /tmp/root/hive.log
a2.sources.r2.shell = /bin/bash -c

# Describe the sink
a2.sinks.k2.type = hdfs
a2.sinks.k2.hdfs.path = hdfs://master:9000/flume/%Y%m%d/%H
#上传文件的前缀
a2.sinks.k2.hdfs.filePrefix = logs-
#是否按照时间滚动文件夹
a2.sinks.k2.hdfs.round = true
#多少时间单位创建一个新的文件夹
a2.sinks.k2.hdfs.roundValue = 1
#重新定义时间单位
a2.sinks.k2.hdfs.roundUnit = hour
#是否使用本地时间戳
a2.sinks.k2.hdfs.useLocalTimeStamp = true  
#积攒多少个Event才flush到HDFS一次
a2.sinks.k2.hdfs.batchSize = 1000
#设置文件类型，可支持压缩     
a2.sinks.k2.hdfs.fileType = DataStream
#多久生成一个新的文件
a2.sinks.k2.hdfs.rollInterval = 600
#设置每个文件的滚动大小
a2.sinks.k2.hdfs.rollSize = 134217700
#设置响应时间
a2.sinks.k2.hdfs.callTimeout = 3600000
#文件的滚动与Event数量无关 
a2.sinks.k2.hdfs.rollCount = 0
#最小冗余数
a2.sinks.k2.hdfs.minBlockReplicas = 1
    
# Use a channel which buffers events in memory
a2.channels.c2.type = memory
a2.channels.c2.capacity = 1000
a2.channels.c2.transactionCapacity = 100

进入/root/flume测试一下

cd /root/flume
bin/flume-ng agent --conf conf/ --name a2 --conf-file conf/flume-file-hdfs.conf

如果出现错误: 找不到或无法加载主类 org.apache.flume.tools.GetJavaProperty

将 hbase 的 hbase.env.sh 的一行配置注释掉
在这里插入图片描述

开启另一个终端打开hive
可以看到成成功上传
成功，一个是九点生成的一个是十点生成的

3，设置自动更新

新建一个文件夹用于测试

mkdir  /root/flume/upload

依旧是再conf目录下

touch flume-dir-hdfs.conf
vim flume-dir-hdfs.conf

添加如下内容
(a3.sinks.k3.hdfs.path为上传路径，需要自己更改
a3.sources.r3.spoolDir为监控文件夹，需要自己更改)

a3.sources = r3
a3.sinks = k3
a3.channels = c3

# Describe/configure the source
a3.sources.r3.type = spooldir
a3.sources.r3.spoolDir = /root/flume/upload
a3.sources.r3.fileSuffix = .COMPLETED
a3.sources.r3.fileHeader = true
#忽略所有以.tmp结尾的文件，不上传
a3.sources.r3.ignorePattern = ([^ ]*\.tmp)

# Describe the sink
a3.sinks.k3.type = hdfs
a3.sinks.k3.hdfs.path = hdfs://master:9000/flume/upload/%Y%m%d/%H
#上传文件的前缀
a3.sinks.k3.hdfs.filePrefix = upload-
#是否按照时间滚动文件夹
a3.sinks.k3.hdfs.round = true
#多少时间单位创建一个新的文件夹
a3.sinks.k3.hdfs.roundValue = 1
#重新定义时间单位
a3.sinks.k3.hdfs.roundUnit = minute
#是否使用本地时间戳
a3.sinks.k3.hdfs.useLocalTimeStamp = true
#积攒多少个Event才flush到HDFS一次
a3.sinks.k3.hdfs.batchSize = 100
#设置文件类型，可支持压缩
a3.sinks.k3.hdfs.fileType = DataStream
#多久生成一个新的文件
a3.sinks.k3.hdfs.rollInterval = 600
#设置每个文件的滚动大小大概是128M
a3.sinks.k3.hdfs.rollSize = 134217700
#文件的滚动与Event数量无关
a3.sinks.k3.hdfs.rollCount = 0
#最小冗余数
a3.sinks.k3.hdfs.minBlockReplicas = 1

# Use a channel which buffers events in memory
a3.channels.c3.type = memory
a3.channels.c3.capacity = 1000
a3.channels.c3.transactionCapacity = 100

# Bind the source and sink to the channel
a3.sources.r3.channels = c3
a3.sinks.k3.channel = c3

测试：

/root/flume/bin/flume-ng agent --conf conf/ --name a3 --conf-file conf/flume-dir-hdfs.conf

给监控文件夹传入一个文件

cd /root/flume/upload
vi a

随意输入hello

再查看hdfs

可以看到已经按我们设置的自动上传了

Ⅲ,安装sqoop

1，解压安装包

将准备好的sqoop安装包解压

tar -xvf /root/sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz

mv  /root/sqoop-1.4.7.bin__hadoop-2.6.0 ./sqoop

2，设置环境变量

vi .bash_profile

添加下面几条

export SQOOP_HOME=/root/sqoop
export PATH=$PATH:$SQOOP_HOME/bin
export CLASSPATH=$CLASSPATH:${SQOOP_HOME}/lib

更新配置

source .bash_profile

3，修改配置文件sqoop-env.sh

cd /root/sqoop/conf
cp sqoop-env-template.sh sqoop-env.sh
vim sqoop-env.sh

根据自己情况修改，装了啥填啥地址。

export HADOOP_COMMON_HOME=/root/hadoop-2.7.7

#Set path to where hadoop-*-core.jar is available
export HADOOP_MAPRED_HOME=/root/hadoop-2.7.7

#set the path to where bin/hbase is available
export HBASE_HOME=/root/hbase

#Set the path to where bin/hive is available
export HIVE_HOME=/root/hive

#Set the path for where zookeper config dir is
export ZOOCFGDIR=/root/zoo

4，复制相关依赖包到$SQOOP_HOME/lib

我这里使用的是mysql的jdbc驱动，如果自己本身没有下载的话，也可以在陪我上面的下载连接里找到，
复制到sqoo路劲下的lib即可

5，修改$SQOOP_HOME/bin/configure-sqoop

注释掉没有用到的插件检查(这里我注释了HCatalog，Accumulo等HADOOP的组件)

cd /root/sqoop/bin
vi configure-sqoop

6，验证

输入

sqoop version

如下

测试连接数据库（命令根据自己修改）

俗人的浅吟^ω^

关注

0
点赞
踩
4

收藏

觉得还不错? 一键收藏
1
评论
Hadoop集群搭建（四）----sqoop和flume安装

这篇博客就继续接着说如何在虚拟机中安装及配置flume和sqoopⅠ，相关组件（附带下载链接）hadoop生态圈还有很多的组件。例如Spark,Hbase,hive等等，因为篇幅问题，这些软件在此处不介绍，给出下载链接，其余的安装教程将会在之后的博客给出,(本文需要使用链接中的flume和sqoop，将其安装包上传到root目录)集群组件下载链接密码：zccyⅡ，安装Flume1，解压安装包将准备好的flume安装包解压tar -xvf /root/apache-flume-1.8.0-bi
复制链接

扫一扫

专栏目录