本文是将MySQL中的增量数据准实时采集到HBase和Hive的集成表里,实质上就是在Hive中创建HBase的映射表,两张表数据同步。HBase适合做在线实时的数据存储,而Hive适合做离线数据处理(将SQL语句转换为MapReduce作业)。要从MySQL中准实时的采集数据,但官方没有相关实现,于是用自定义开源的source:https://github.com/keedio/flume-ng-sql-source
但启动flume之后,DEBUG报错:select * from student limit ?,?,我的query语句里并没有limit,卡在这一天,各种尝试无果
下面是自己开发的功能类似flume-ng-source,但显然不如flume-ng-sql-source功能强大,但能满足我个人需求,与flume-ng-sql-source 不同的是,我将当前索引存进MySQL的一张表(flume_meta)里
下面就来具体操作一下,有错误请斧正
先在mysql中创建student表,并插入两条数据
student必须具有增量字段,这里设置id为主键自增
mysql> show databases;
+--------------------+
| Database |
+--------------------+
| information_schema |
| flume |
| hive |
| mysql |
| performance_schema |
| sys |
+--------------------+
6 rows in set (0.00 sec)
mysql> use flume;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed
mysql> create table student(id int primary key auto_increment,name varchar(20),age int );
Query OK, 0 rows affected (0.36 sec)
mysql> show tables;
+-----------------+
| Tables_in_flume |
+-----------------+
| student |
+-----------------+
1 row in set (0.00 sec)
mysql> insert into student values(1,"wang",20);
Query OK, 1 row affected (0.05 sec)
mysql> insert into student values(2,"zhang",18);
Query OK, 1 row affected (0.12 sec)
mysql> select *from student;
+----+-------+------+
| id | name | age |
+----+-------+------+
| 1 | wang | 20 |
| 2 | zhang | 18 |
+----+-------+------+
2 rows in set (0.00 sec)
再新建一个元数据表,用于存储采集的表名和当前索引&#