master数据导入练习-flume-kafka

2 篇文章 0 订阅
1 篇文章 0 订阅

数据
数据
1)任务概要
2)数据导入数据库命令【查看数据文件 README 得知】
3)补充:将数据库表格转到本地语句(.csv格式)
1、使用sqoop导入数据到hdfs(.csv格式)
2、flume导入文件至kafka
agent文件格式(范例)
1).创建agent文件(我是使用的nodeone)
departments.conf、dept_emp.conf、dept_manager.conf、employees.conf、salaries.conf、titles.conf
2)创建topic命令【nodethree、nodetwo、nodefour任意一台均可】
3). flume操作本机 【我是使用的nodeone】创建本地目录:
4)启动agent 位置nodeone: flume的目录下,配置文件在当前目录 注意修改配置文件和agent的名字
5)使用console启动一个消费者
6)把文件送到目标目录【nodeone,需将文件修改为名称+日期格式进行操作】 install mv
7)其他几个同操作步骤
8)验证的导入是否成功
1)任务概要

2)数据导入数据库命令【查看数据文件 README 得知】
mysql -uroot -p12345 < employees.sql

mysql -uroot -p12345 < employees_partitioned.sql

3)补充:将数据库表格转到本地语句(.csv格式)
1)不带表头:
SELECT * INTO OUTFILE ‘/var/lib/mysql-files/employees.csv’ FIELDS TERMINATED BY ‘,’ FROM employees;

2)带表头
SELECT * INTO OUTFILE ‘/var/lib/mysql-files/departments.csv’ FIELDS TERMINATED BY ‘,’ FROM
(select ‘dept_no’,‘dept_name’ union select dept_no,dept_name from departments) t

SELECT * INTO OUTFILE ‘/var/lib/mysql-files/dept_emp.csv’ FIELDS TERMINATED BY ‘,’ FROM
(select ‘emp_no’,‘dept_no’ ,‘from_date’ ,‘to_date’ union select emp_no,dept_no ,from_date,to_date from dept_emp) t

SELECT * INTO OUTFILE ‘/var/lib/mysql-files/dept_manager.csv’ FIELDS TERMINATED BY ‘,’ FROM
(select ‘emp_no’,‘dept_no’ ,‘from_date’ ,‘to_date’ union select emp_no,dept_no ,from_date,to_date from dept_manager) t

SELECT * INTO OUTFILE ‘/var/lib/mysql-files/employees.csv’ FIELDS TERMINATED BY ‘,’ FROM
(select ‘emp_no’,‘birth_date’ ,‘first_name’ ,‘last_name’ ,‘gender’,‘hire_date’
union select emp_no,birth_date ,first_name,last_name,gender,hire_date from employees) t

SELECT * INTO OUTFILE ‘/var/lib/mysql-files/salaries.csv’ FIELDS TERMINATED BY ‘,’ FROM
(select ‘emp_no’,‘salary’ ,‘from_date’ ,‘to_date’
union select emp_no,salary ,from_date,to_date from salaries) t

SELECT * INTO OUTFILE ‘/var/lib/mysql-files/titles.csv’ FIELDS TERMINATED BY ‘,’ FROM
(select ‘emp_no’,‘title’ ,‘from_date’ ,‘to_date’
union select emp_no,title ,from_date,to_date from titles) t

1、使用sqoop导入数据到hdfs(.csv格式)
1)将MySQL中employees.departments dept_emp dept_manager employees salaries titles 表的所有数据导入到HDFS
sqoop import
–connect jdbc:mysql://nodefour:3306/employees
–username root
–password 12345
–table titles
–target-dir /sqoop/data/titles
–m 1 \

语句补充:通过Where语句过滤导入表
sqoop import
–connect jdbc:mysql://nodefour:3306/scott
–table emp
–where “sal >1000”
–username root
–password 12345
–target-dir /sqoop/data/emp-where
–m 1 \

2、flume导入文件至kafka
agent文件格式(范例)

**********************************************************************************

Deploy the following content into Flume

-------------------------------------------------

Initialize agent’s source, channel and sink

users.sources = usersSource
users.channels = usersChannel
users.sinks = usersSink

Use a channel which buffers events in a directory

users.channels.usersChannel.type = file
users.channels.usersChannel.checkpointDir = /var/flume/checkpoint/users //需创建
users.channels.usersChannel.dataDirs = /var/flume/data/users //需创建

Setting the source to spool directory where the file exists

users.sources.usersSource.type = spooldir
users.sources.usersSource.deserializer = LINE
users.sources.usersSource.deserializer.maxLineLength = 6400 //需修改
users.sources.usersSource.spoolDir = /events/input/intra/users
users.sources.usersSource.includePattern = users_[0-9]{4}-[0-9]{2}-[0-9]{2}.csv //需修改

Define / Configure sink

users.sinks.usersSink.type = org.apache.flume.sink.kafka.KafkaSink
users.sinks.usersSink.batchSize = 640
users.sinks.usersSink.brokerList = nodetwo:9092,nodethree:9092,nodefour:9092
users.sinks.usersSink.topic = users //一定要修改topic名称

users.sources.usersSource.channels = usersChannel
users.sinks.usersSink.channel = usersChannel

1).创建agent文件(我是使用的nodeone)
departments.conf、dept_emp.conf、dept_manager.conf、employees.conf、salaries.conf、titles.conf

2)创建topic命令【nodethree、nodetwo、nodefour任意一台均可】
bin/kafka-topics.sh --zookeeper nodethree:2181,nodefour:2181,nodetwo:2181 --create --topic departments --partitions 3 -replication-factor 3
bin/kafka-topics.sh --zookeeper nodethree:2181,nodefour:2181,nodetwo:2181 --create --topic dept_emp --partitions 3 -replication-factor 3
bin/kafka-topics.sh --zookeeper nodethree:2181,nodefour:2181,nodetwo:2181 --create --topic dept_manager --partitions 3 -replication-factor 3
bin/kafka-topics.sh --zookeeper nodethree:2181,nodefour:2181,nodetwo:2181 --create --topic employees --partitions 3 -replication-factor 3
bin/kafka-topics.sh --zookeeper nodethree:2181,nodefour:2181,nodetwo:2181 --create --topic salaries --partitions 3 -replication-factor 3
bin/kafka-topics.sh --zookeeper nodethree:2181,nodefour:2181,nodetwo:2181 --create --topic titles --partitions 3 -replication-factor 3

3). flume操作本机 【我是使用的nodeone】创建本地目录:
mkdir -p /events/input/intra/users
/var/flume/data/users
/var/flume/checkpoint/users

4)启动agent 位置nodeone: flume的目录下,配置文件在当前目录 注意修改配置文件和agent的名字
bin/flume-ng agent --conf conf --conf-file departments.conf --name users -Dflume.root.logger=INFO,console

5)使用console启动一个消费者
bin/kafka-console-consumer.sh --zookeeper nodethree:2181,nodefour:2181,nodetwo:2181 --topic departments --from-beginning

6)把文件送到目标目录【nodeone,需将文件修改为名称+日期格式进行操作】 install mv
install departments_2021-01-27.csv /events/input/intra/users

7)其他几个同操作步骤

####### dept_emp.csv
//nodeone
bin/flume-ng agent --conf conf --conf-file dept_emp.conf --name users -Dflume.root.logger=INFO,console
//nodefour
bin/kafka-console-consumer.sh --zookeeper nodethree:2181,nodefour:2181,nodetwo:2181 --topic dept_emp --from-beginning
//nodeone
install dept_emp_2021-01-27.csv /events/input/intra/users

####### dept_manager.csv
bin/flume-ng agent --conf conf --conf-file dept_manager.conf --name users -Dflume.root.logger=INFO,console
bin/kafka-console-consumer.sh --zookeeper nodethree:2181,nodefour:2181,nodetwo:2181 --topic dept_manager --from-beginning
install dept_manager_2021-01-27.csv /events/input/intra/users

####### employees.csv
bin/flume-ng agent --conf conf --conf-file employees.conf --name users -Dflume.root.logger=INFO,console
bin/kafka-console-consumer.sh --zookeeper nodethree:2181,nodefour:2181,nodetwo:2181 --topic employees --from-beginning
install employees_2021-01-27.csv /events/input/intra/users

bin/flume-ng agent --conf conf --conf-file salaries.conf --name users -Dflume.root.logger=INFO,console
bin/kafka-console-consumer.sh --zookeeper nodethree:2181,nodefour:2181,nodetwo:2181 --topic salaries --from-beginning
install salaries_2021-01-27.csv /events/input/intra/users

####### titles.csv
bin/flume-ng agent --conf conf --conf-file titles.conf --name users -Dflume.root.logger=INFO,console
bin/kafka-console-consumer.sh --zookeeper nodethree:2181,nodefour:2181,nodetwo:2181 --topic titles --from-beginning
install titles_2021-01-27.csv /events/input/intra/users

8)验证的导入是否成功

###使用console启动一个消费者
bin/kafka-console-consumer.sh --zookeeper nodethree:2181,nodefour:2181,nodetwo:2181 --topic users --from-beginning
####统计topic中的记录数
bin/kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list nodethree:9092,nodefour:9092,nodetwo:9092 --topic departments -time -1 --offsets 1

创建topic
显示topic命令
bin/kafka-topics.sh --zookeeper nodethree:2181,nodefour:2181,nodetwo:2181 --list
创建topic命令
bin/kafka-topics.sh --zookeeper nodethree:2181,nodefour:2181,nodetwo:2181 --create --topic users --partitions 3 -replication-factor 3
详细显示topic信息
bin/kafka-topics.sh --zookeeper nodethree:2181,nodefour:2181,nodetwo:2181 --describe --topic users

生产者
bin/kafka-console-producer.sh --broker-list nodethree:9092,nodefour:9092,nodetwo:9092 --topic users

消费者
bin/kafka-console-consumer.sh --zookeeper nodethree:2181,nodefour:2181,nodetwo:2181 --topic users --from-beginning

  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值