Hadoop集群搭建(四)----sqoop和flume安装

这篇博客就继续接着说如何在虚拟机中安装及配置flume和sqoop

Ⅰ,相关组件(附带下载链接)

hadoop生态圈还有很多的组件。例如Spark,Hbase,hive等等,因为篇幅问题,这些软件在此处不介绍,给出下载链接,其余的安装教程将会在其余的博客给出,(本文需要使用链接中的flume和sqoop,将其安装包上传到root目录)

集群组件下载链接
密码:zccy

Ⅱ,安装Flume

1,解压安装包

将准备好的flume安装包解压

tar -xvf /root/apache-flume-1.8.0-bin.tar.gz
mv apache-flume-1.8.0-bin ./flume

2,设置环境变量

vi .bash_profile

添加下面几条

export FLUME_HOME=/root/flume
export FLUME_CONF_DIR=$FLUME_HOME/conf
export PATH=$PATH:$FLUME_HOME/bin

如图
更新配置

source .bash_profile

3,设置flume-env.sh配置文件

复制模板文件

cp /root/flume/conf/flume-env.sh.template /root/flume/conf/flume-env.sh
vi /root/flume/conf/flume-env.sh

新增下面内容(JAVA_HOME以自己的路径为准)

JAVA_HOME=/root/jdk/jdk1.8.0_144/
JAVA_OPTS="-Xms100m -Xmx200m -Dcom.sun.management.jmxremote"

4,修改 flume-conf 配置文件

复制模板文件

cp /root/flume/conf/flume-conf.properties.template /root/flume/conf/flume-conf.properties
vi /root/flume/conf/flume-conf.properties

编辑如下


# The configuration file needs to define the sources,
# the channels and the sinks.
# Sources, channels and sinks are defined per agent,
# in this case called 'agent'

a1.sources = r1
a1.sinks = k1
a1.channels = c1

# For each one of the sources, the type is defined

a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 44444

# The channel can be defined as follows.

a1.sources.r1.channels = c1

# Each sink's type must be defined

a1.sinks.k1.type = logger

#Specify the channel the sink should use

a1.sinks.k1.channel = c1

# Each channel's type is defined.

a1.channels.c1.type = memory

# Other config values specific to each type of channel(sink or source)
# can be defined as well
# In this case, it specifies the capacity of the memory channel
a1.channels.c1.capacity = 1000
a1.channels.c1.transactioncapacity= 100

如图
(注意:a1.sources.r1.bind填的是接收方的ip(如果ip不是自动获取的可以使用localhost))

5,功能验证

1,导入jar包

为了让我们的flume和hadoop交互,首先我们需要将链接(密码zccy)中的六个jar包导入到flume/lib/中
进入flume 的安装目录下运行

cd /root/flume

先安装一些服务


yum install telnet(可能需要执行两次)
yum install telnet-server
yum -y install xinetd
systemctl enable xinetd.service
systemctl enable telnet.socket
systemctl start telnet.socket
systemctl start xinetd

输入指令

/root/flume/bin/flume-ng agent --conf ./conf/ --conf-file ./conf/flume-conf.properties --name a1 -Dflume.root.logger=INFO,console

打开另一个节点,也安装一些服务

yum install telnet
yum install telnet-server
yum -y install xinetd
systemctl enable xinetd.service
systemctl enable telnet.socket
systemctl start telnet.socket
systemctl start xinetd

连接输入hello

telnet 192.168.0.2 44444
hello

出现ok说明配置成功

此时接受方就接到了信息

2,测试收集日志到hdfs

(此部分借鉴借鉴连接配置,
注意:
① a2.sources.r2.command为收集日志位置需要根据自己位置调整(这里检测的是hive的log,默认存在tmp/用户 里)
②a2.sinks.k2.hdfs.path为上传位置需要根据自己位置调整
)

cd /root/flume/conf
touch flume-file-hdfs.conf
vim flume-file-hdfs.conf

修改如下

# Name the components on this agent
a2.sources = r2
a2.sinks = k2
a2.channels = c2

# Describe/configure the source
a2.sources.r2.type = exec
a2.sources.r2.command = tail -F /tmp/root/hive.log
a2.sources.r2.shell = /bin/bash -c

# Describe the sink
a2.sinks.k2.type = hdfs
a2.sinks.k2.hdfs.path = hdfs://master:9000/flume/%Y%m%d/%H
#上传文件的前缀
a2.sinks.k2.hdfs.filePrefix = logs-
#是否按照时间滚动文件夹
a2.sinks.k2.hdfs.round = true
#多少时间单位创建一个新的文件夹
a2.sinks.k2.hdfs.roundValue = 1
#重新定义时间单位
a2.sinks.k2.hdfs.roundUnit = hour
#是否使用本地时间戳
a2.sinks.k2.hdfs.useLocalTimeStamp = true  
#积攒多少个Event才flush到HDFS一次
a2.sinks.k2.hdfs.batchSize = 1000
#设置文件类型,可支持压缩     
a2.sinks.k2.hdfs.fileType = DataStream
#多久生成一个新的文件
a2.sinks.k2.hdfs.rollInterval = 600
#设置每个文件的滚动大小
a2.sinks.k2.hdfs.rollSize = 134217700
#设置响应时间
a2.sinks.k2.hdfs.callTimeout = 3600000
#文件的滚动与Event数量无关 
a2.sinks.k2.hdfs.rollCount = 0
#最小冗余数
a2.sinks.k2.hdfs.minBlockReplicas = 1
    
# Use a channel which buffers events in memory
a2.channels.c2.type = memory
a2.channels.c2.capacity = 1000
a2.channels.c2.transactionCapacity = 100

进入/root/flume测试一下

cd /root/flume
bin/flume-ng agent --conf conf/ --name a2 --conf-file conf/flume-file-hdfs.conf

如果出现错误: 找不到或无法加载主类 org.apache.flume.tools.GetJavaProperty

将 hbase 的 hbase.env.sh 的一行配置注释掉
在这里插入图片描述


开启另一个终端打开hive
可以看到成成功上传
成功,一个是九点生成的一个是十点生成的

3,设置自动更新

新建一个文件夹用于测试

mkdir  /root/flume/upload

依旧是再conf目录下

touch flume-dir-hdfs.conf
vim flume-dir-hdfs.conf

添加如下内容
(a3.sinks.k3.hdfs.path为上传路径,需要自己更改
a3.sources.r3.spoolDir为监控文件夹,需要自己更改)

a3.sources = r3
a3.sinks = k3
a3.channels = c3

# Describe/configure the source
a3.sources.r3.type = spooldir
a3.sources.r3.spoolDir = /root/flume/upload
a3.sources.r3.fileSuffix = .COMPLETED
a3.sources.r3.fileHeader = true
#忽略所有以.tmp结尾的文件,不上传
a3.sources.r3.ignorePattern = ([^ ]*\.tmp)

# Describe the sink
a3.sinks.k3.type = hdfs
a3.sinks.k3.hdfs.path = hdfs://master:9000/flume/upload/%Y%m%d/%H
#上传文件的前缀
a3.sinks.k3.hdfs.filePrefix = upload-
#是否按照时间滚动文件夹
a3.sinks.k3.hdfs.round = true
#多少时间单位创建一个新的文件夹
a3.sinks.k3.hdfs.roundValue = 1
#重新定义时间单位
a3.sinks.k3.hdfs.roundUnit = minute
#是否使用本地时间戳
a3.sinks.k3.hdfs.useLocalTimeStamp = true
#积攒多少个Event才flush到HDFS一次
a3.sinks.k3.hdfs.batchSize = 100
#设置文件类型,可支持压缩
a3.sinks.k3.hdfs.fileType = DataStream
#多久生成一个新的文件
a3.sinks.k3.hdfs.rollInterval = 600
#设置每个文件的滚动大小大概是128M
a3.sinks.k3.hdfs.rollSize = 134217700
#文件的滚动与Event数量无关
a3.sinks.k3.hdfs.rollCount = 0
#最小冗余数
a3.sinks.k3.hdfs.minBlockReplicas = 1

# Use a channel which buffers events in memory
a3.channels.c3.type = memory
a3.channels.c3.capacity = 1000
a3.channels.c3.transactionCapacity = 100

# Bind the source and sink to the channel
a3.sources.r3.channels = c3
a3.sinks.k3.channel = c3

测试:

/root/flume/bin/flume-ng agent --conf conf/ --name a3 --conf-file conf/flume-dir-hdfs.conf


给监控文件夹传入一个文件

cd /root/flume/upload
vi a

随意输入hello

再查看hdfs

可以看到已经按我们设置的自动上传了

Ⅲ,安装sqoop

1,解压安装包

将准备好的sqoop安装包解压

tar -xvf /root/sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz

mv  /root/sqoop-1.4.7.bin__hadoop-2.6.0 ./sqoop

2,设置环境变量

vi .bash_profile

添加下面几条

export SQOOP_HOME=/root/sqoop
export PATH=$PATH:$SQOOP_HOME/bin
export CLASSPATH=$CLASSPATH:${SQOOP_HOME}/lib

更新配置

source .bash_profile

3,修改配置文件sqoop-env.sh

cd /root/sqoop/conf
cp sqoop-env-template.sh sqoop-env.sh
vim sqoop-env.sh 

根据自己情况修改,装了啥填啥地址。

export HADOOP_COMMON_HOME=/root/hadoop-2.7.7

#Set path to where hadoop-*-core.jar is available
export HADOOP_MAPRED_HOME=/root/hadoop-2.7.7

#set the path to where bin/hbase is available
export HBASE_HOME=/root/hbase

#Set the path to where bin/hive is available
export HIVE_HOME=/root/hive

#Set the path for where zookeper config dir is
export ZOOCFGDIR=/root/zoo        

4,复制相关依赖包到$SQOOP_HOME/lib

我这里使用的是mysql的jdbc驱动,如果自己本身没有下载的话,也可以在陪我上面的下载连接里找到,
复制到sqoo路劲下的lib即可

5,修改$SQOOP_HOME/bin/configure-sqoop

注释掉没有用到的插件检查(这里我注释了HCatalog,Accumulo等HADOOP的组件)

cd /root/sqoop/bin
vi configure-sqoop




6,验证

输入

sqoop version

如下

测试连接数据库(命令根据自己修改)

  • 0
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值