大数据学习笔记-2020-10-09--flume自定义source--mysqlsource案例

自定义source案例:自定义mysqlSource

  • source是负责接收数据到flume agent的组件,flume本身提供了各种各样的source用于处理各式各样的需求,但是在有些时候还是不能很好的实现我们自身的需求,这种时候就需要我们自定义source来实现
  • 根据官方的闻到那,自定义一个MYSQLSource需要继承AbstractSource类并实现Configurable和PollableSource接口,并实现其中如下方法:
    • getBackOffSleepIncrement()//暂不用
    • getMaxBackOffSleepInterval()//暂不用
    • configure(Context context)//初始化context
    • process()//获取数据
      • 因为从mysql中获取数据的业务比较复杂,因此额外定义一个mysql工具类SQLSourceHelper来处理相关业务
      • process方法只需要将数据封装成Event并写入channel即可
      • 该方法会被循环调用
    • stop()//关闭相关的资源

代码实现

  • 导入maven依赖:

    <dependencies>
        <dependency>
            <groupId>org.apache.flume</groupId>
            <artifactId>flume-ng-core</artifactId>
            <version>1.7.0</version>
        </dependency>
        <dependency>
            <groupId>mysql</groupId>
            <artifactId>mysql-connector-java</artifactId>
            <version>5.1.27</version>
        </dependency>
    </dependencies>
    
  • 配置jdbc与log4j:

    //jdbc
    dbDriver=com.mysql.jdbc.Driver
    dbUrl=jdbc:mysql://hadoop201:3306/mysqlsource?useUnicode=true&characterEncoding=utf-8
    dbUser=root
    dbPassword=123456
    
    //log4j
    #--------console-----------
    log4j.rootLogger=info,myconsole,myfile
    log4j.appender.myconsole=org.apache.log4j.ConsoleAppender
    log4j.appender.myconsole.layout=org.apache.log4j.SimpleLayout
    #log4j.appender.myconsole.layout.ConversionPattern =%d [%t] %-5p [%c] - %m%n
    
    #log4j.rootLogger=error,myfile
    log4j.appender.myfile=org.apache.log4j.DailyRollingFileAppender
    log4j.appender.myfile.File=/tmp/flume.log
    log4j.appender.myfile.layout=org.apache.log4j.PatternLayout
    log4j.appender.myfile.layout.ConversionPattern =%d [%t] %-5p [%c] - %m%n
    

    编写代码:

    package cn.bigdata.defineSource;
    
    import org.apache.flume.Context;
    import org.apache.flume.conf.ConfigurationException;
    import org.slf4j.Logger;
    import org.slf4j.LoggerFactory;
    
    import java.io.IOException;
    import java.sql.*;
    import java.text.ParseException;
    import java.util.ArrayList;
    import java.util.List;
    import java.util.Properties;
    
    public class SQLSourceHelper {
    
        private static final Logger LOG = LoggerFactory.getLogger(SQLSourceHelper.class);
    
        private int runQueryDelay, //两次查询的时间间隔
                startFrom,            //开始id
                currentIndex,	        //当前id
                recordSixe = 0,      //每次查询返回结果的条数
                maxRow;                //每次查询的最大条数
    
        private String table,       //要操作的表
                columnsToSelect,     //用户传入的查询的列
                customQuery,          //用户传入的查询语句
                query,                 //构建的查询语句
                defaultCharsetResultSet;//编码集
    
        //上下文,用来获取配置文件
        private Context context;
    
        //为定义的变量赋值(默认值),可在flume任务的配置文件中修改
        private static final int DEFAULT_QUERY_DELAY = 10000;
        private static final int DEFAULT_START_VALUE = 0;
        private static final int DEFAULT_MAX_ROWS = 2000;
        private static final String DEFAULT_COLUMNS_SELECT = "*";
        private static final String DEFAULT_CHARSET_RESULTSET = "UTF-8";
    
        private static Connection conn = null;
        private static PreparedStatement ps = null;
        private static String connectionURL, connectionUserName, connectionPassword;
    
        //加载静态资源
        static {
    
            Properties p = new Properties();
    
            try {
                p.load(SQLSourceHelper.class.getClassLoader().getResourceAsStream("jdbc.properties"));
                connectionURL = p.getProperty("dbUrl");
                connectionUserName = p.getProperty("dbUser");
                connectionPassword = p.getProperty("dbPassword");
                Class.forName(p.getProperty("dbDriver"));
    
            } catch (ClassNotFoundException e) {
                e.printStackTrace();
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
    
        //获取JDBC连接
        private static Connection InitConnection(String url, String user, String pw) {
            try {
    
                Connection conn = DriverManager.getConnection(url, user, pw);
    
                if (conn == null)
                    throw new SQLException();
    
                return conn;
    
            } catch (SQLException e) {
                e.printStackTrace();
            }
    
            return null;
        }
    
        //构造方法
        SQLSourceHelper(Context context) throws ParseException {
    
            //初始化上下文
            this.context = context;
    
            //有默认值参数:获取flume任务配置文件中的参数,读不到的采用默认值
            this.columnsToSelect = context.getString("columns.to.select", DEFAULT_COLUMNS_SELECT);
    
            this.runQueryDelay = context.getInteger("run.query.delay", DEFAULT_QUERY_DELAY);
    
            this.startFrom = context.getInteger("start.from", DEFAULT_START_VALUE);
    
            this.defaultCharsetResultSet = context.getString("default.charset.resultset", DEFAULT_CHARSET_RESULTSET);
    
            //无默认值参数:获取flume任务配置文件中的参数
            this.table = context.getString("table");
            this.customQuery = context.getString("custom.query");
    
            connectionURL = context.getString("connection.url");
    
            connectionUserName = context.getString("connection.user");
    
            connectionPassword = context.getString("connection.password");
    
            conn = InitConnection(connectionURL, connectionUserName, connectionPassword);
    
            //校验相应的配置信息,如果没有默认值的参数也没赋值,抛出异常
            checkMandatoryProperties();
    
            //获取当前的id
            currentIndex = getStatusDBIndex(startFrom);
    
            //构建查询语句
            query = buildQuery();
        }
    
        //校验相应的配置信息(表,查询语句以及数据库连接的参数)
        private void checkMandatoryProperties() {
    
            if (table == null) {
                throw new ConfigurationException("property table not set");
            }
    
            if (connectionURL == null) {
                throw new ConfigurationException("connection.url property not set");
            }
    
            if (connectionUserName == null) {
                throw new ConfigurationException("connection.user property not set");
            }
    
            if (connectionPassword == null) {
                throw new ConfigurationException("connection.password property not set");
            }
        }
    
        //构建sql语句
        private String buildQuery() {
    
            String sql = "";
    
            //获取当前id
            currentIndex = getStatusDBIndex(startFrom);
            LOG.info(currentIndex + "");
    
            if (customQuery == null) {
                sql = "SELECT " + columnsToSelect + " FROM " + table;
            } else {
                sql = customQuery;
            }
    
            StringBuilder execSql = new StringBuilder(sql);
    
            //以id作为offset
            if (!sql.contains("where")) {
                execSql.append(" where ");
                execSql.append("id").append(">").append(currentIndex);
    
                return execSql.toString();
            } else {
                int length = execSql.toString().length();
    
                return execSql.toString().substring(0, length - String.valueOf(currentIndex).length()) + currentIndex;
            }
        }
    
        //执行查询
        List<List<Object>> executeQuery() {
    
            try {
                //每次执行查询时都要重新生成sql,因为id不同
                customQuery = buildQuery();
    
                //存放结果的集合
                List<List<Object>> results = new ArrayList();
    
                if (ps == null) {
                    //
                    ps = conn.prepareStatement(customQuery);
                }
    
                ResultSet result = ps.executeQuery(customQuery);
    
                while (result.next()) {
    
                    //存放一条数据的集合(多个列)
                    List<Object> row = new ArrayList();
    
                    //将返回结果放入集合
                    for (int i = 1; i <= result.getMetaData().getColumnCount(); i++) {
                        row.add(result.getObject(i));
                    }
    
                    results.add(row);
                }
    
                LOG.info("execSql:" + customQuery + "\nresultSize:" + results.size());
    
                return results;
            } catch (SQLException e) {
                LOG.error(e.toString());
    
                // 重新连接
                conn = InitConnection(connectionURL, connectionUserName, connectionPassword);
    
            }
    
            return null;
        }
    
        //将结果集转化为字符串,每一条数据是一个list集合,将每一个小的list集合转化为字符串
        List<String> getAllRows(List<List<Object>> queryResult) {
    
            List<String> allRows = new ArrayList();
    
            if (queryResult == null || queryResult.isEmpty())
                return allRows;
    
            StringBuilder row = new StringBuilder();
    
            for (List<Object> rawRow : queryResult) {
    
                Object value = null;
    
                for (Object aRawRow : rawRow) {
    
                    value = aRawRow;
    
                    if (value == null) {
                        row.append(",");
                    } else {
                        row.append(aRawRow.toString()).append(",");
                    }
                }
    
                allRows.add(row.toString());
                row = new StringBuilder();
            }
    
            return allRows;
        }
    
        //更新offset元数据状态,每次返回结果集后调用。必须记录每次查询的offset值,为程序中断续跑数据时使用,以id为offset
        void updateOffset2DB(int size) {
            //以source_tab做为KEY,如果不存在则插入,存在则更新(每个源表对应一条记录)
            String sql = "insert into flume_meta(source_tab,currentIndex) VALUES('"
                    + this.table
                    + "','" + (recordSixe += size)
                    + "') on DUPLICATE key update source_tab=values(source_tab),currentIndex=values(currentIndex)";
    
            LOG.info("updateStatus Sql:" + sql);
    
            execSql(sql);
        }
    
        //执行sql语句
        private void execSql(String sql) {
    
            try {
                ps = conn.prepareStatement(sql);
    
                LOG.info("exec::" + sql);
    
                ps.execute();
            } catch (SQLException e) {
                e.printStackTrace();
            }
        }
    
        //获取当前id的offset
        private Integer getStatusDBIndex(int startFrom) {
    
            //从flume_meta表中查询出当前的id是多少
            String dbIndex = queryOne("select currentIndex from flume_meta where source_tab='" + table + "'");
    
            if (dbIndex != null) {
                return Integer.parseInt(dbIndex);
            }
    
            //如果没有数据,则说明是第一次查询或者数据表中还没有存入数据,返回最初传入的值
            return startFrom;
        }
    
        //查询一条数据的执行语句(当前id)
        private String queryOne(String sql) {
    
            ResultSet result = null;
    
            try {
                ps = conn.prepareStatement(sql);
                result = ps.executeQuery();
    
                while (result.next()) {
                    return result.getString(1);
                }
            } catch (SQLException e) {
                e.printStackTrace();
            }
    
            return null;
        }
    
        //关闭相关资源
        void close() {
    
            try {
                ps.close();
                conn.close();
            } catch (SQLException e) {
                e.printStackTrace();
            }
        }
    
        int getCurrentIndex() {
            return currentIndex;
        }
    
        void setCurrentIndex(int newValue) {
            currentIndex = newValue;
        }
    
        int getRunQueryDelay() {
            return runQueryDelay;
        }
    
        String getQuery() {
            return query;
        }
    
        String getConnectionURL() {
            return connectionURL;
        }
    
        private boolean isCustomQuerySet() {
            return (customQuery != null);
        }
    
        Context getContext() {
            return context;
        }
    
        public String getConnectionUserName() {
            return connectionUserName;
        }
    
        public String getConnectionPassword() {
            return connectionPassword;
        }
    
        String getDefaultCharsetResultSet() {
            return defaultCharsetResultSet;
        }
    }
    
    package cn.bigdata.defineSource;
    
    import org.apache.flume.Context;
    import org.apache.flume.Event;
    import org.apache.flume.EventDeliveryException;
    import org.apache.flume.PollableSource;
    import org.apache.flume.conf.Configurable;
    import org.apache.flume.event.SimpleEvent;
    import org.apache.flume.source.AbstractSource;
    import org.slf4j.Logger;
    import org.slf4j.LoggerFactory;
    
    import java.text.ParseException;
    import java.util.ArrayList;
    import java.util.HashMap;
    import java.util.List;
    
    public class SQLSource extends AbstractSource
            implements Configurable, PollableSource {
    
        //打印日志
        private static final Logger LOG = LoggerFactory.getLogger(SQLSource.class);
    
        //定义sqlHelper
        private SQLSourceHelper sqlSourceHelper;
    
    
            public long getBackOffSleepIncrement() {
            return 0;
        }
    
        public long getMaxBackOffSleepInterval() {
            return 0;
        }
    
    
        public void configure(Context context) {
    
            try {
                //初始化
                sqlSourceHelper = new SQLSourceHelper(context);
            } catch (ParseException e) {
                e.printStackTrace();
            }
        }
    
    
        public Status process() throws EventDeliveryException {
    
            try {
                //查询数据表
                List<List<Object>> result = sqlSourceHelper.executeQuery();
    
                //存放event的集合
                List<Event> events = new ArrayList();
    
                //存放event头集合
                HashMap<String, String> header = new HashMap();
    
                //如果有返回数据,则将数据封装为event
                if (!result.isEmpty()) {
    
                    List<String> allRows = sqlSourceHelper.getAllRows(result);
    
                    Event event = null;
    
                    for (String row : allRows) {
                        event = new SimpleEvent();
                        event.setBody(row.getBytes());
                        event.setHeaders(header);
                        events.add(event);
                    }
    
                    //将event写入channel
                    this.getChannelProcessor().processEventBatch(events);
    
                    //更新数据表中的offset信息
                    sqlSourceHelper.updateOffset2DB(result.size());
                }
    
                //等待时长
                Thread.sleep(sqlSourceHelper.getRunQueryDelay());
    
                return Status.READY;
            } catch (InterruptedException e) {
                LOG.error("Error procesing row", e);
    
                return Status.BACKOFF;
            }
        }
    
        @Override
        public synchronized void stop() {
    
            LOG.info("Stopping sql source {} ...", getName());
    
            try {
                //关闭资源
                sqlSourceHelper.close();
            } finally {
                super.stop();
            }
        }
    }
    
  • 通过maven将其打包成jar包,并上传到linux中:

    [kgg@hadoop201 ~]$ ls /opt/software/*.jar
    /opt/software/flumeSource-1.0-SNAPSHOT.jar
    
  • 将mysql驱动和我们自己的jar包放入flume的lib目录下:

    [kgg@hadoop201 software]$ cp /opt/module/hive/lib/mysql-connector-java-5.1.27-bin.jar /opt/module/flume/lib/
    [kgg@hadoop201 software]$ ls /opt/module/flume/lib/*mysql*
    /opt/module/flume/lib/mysql-connector-java-5.1.27-bin.jar
    
    [kgg@hadoop201 flume]$ cp /opt/software/flumeSource-1.0-SNAPSHOT.jar /opt/module/flume/lib/
    [kgg@hadoop201 flume]$ ls ./lib/flumeSource*
    ./lib/flumeSource-1.0-SNAPSHOT.jar
    
  • 写配置文件:

    [kgg@hadoop201 flume]$ vi job/f-mysql.conf
    
    # Name the components on this agent
    a1.sources = r1
    a1.sinks = k1
    a1.channels = c1
    
    # Describe/configure the source
    a1.sources.r1.type = cn.bigdata.defineSource.SQLSource 
    a1.sources.r1.connection.url = jdbc:mysql://192.168.1.201:3306/mysqlsource
    a1.sources.r1.connection.user = root  
    a1.sources.r1.connection.password = 123456  
    a1.sources.r1.table = student  
    a1.sources.r1.columns.to.select = *  
    #a1.sources.r1.incremental.column.name = id  
    #a1.sources.r1.incremental.value = 0 
    a1.sources.r1.run.query.delay=5000
    
    # Describe the sink
    a1.sinks.k1.type = logger
    
    # Describe the channel
    a1.channels.c1.type = memory
    a1.channels.c1.capacity = 1000
    a1.channels.c1.transactionCapacity = 100
    
    # Bind the source and sink to the channel
    a1.sources.r1.channels = c1
    a1.sinks.k1.channel = c1
    
  • 预先创建数据库和表:

    mysql> create database mysqlsource;
    Query OK, 1 row affected (0.01 sec)
    
    mysql> CREATE TABLE `student` (
        -> `id` int(11) NOT NULL AUTO_INCREMENT,
        -> `name` varchar(255) NOT NULL,
        -> PRIMARY KEY (`id`)
        -> );
    Query OK, 0 rows affected (0.02 sec)
    
    mysql> CREATE TABLE `flume_meta` (
        -> `source_tab` varchar(255) NOT NULL,
        -> `currentIndex` varchar(255) NOT NULL,
        -> PRIMARY KEY (`source_tab`)
        -> );
    Query OK, 0 rows affected (0.01 sec)
    
  • 插入几行数据:

    mysql> insert into student values(1,'zhangsan');
    Query OK, 1 row affected (0.00 sec)
    
    mysql> insert into student values(2,'lisi');
    Query OK, 1 row affected (0.01 sec)
    
    mysql> insert into student values(3,'wangwu');
    Query OK, 1 row affected (0.01 sec)
    
  • 执行flume任务

    [kgg@hadoop201 flume]$ bin/flume-ng agent -f job/f-mysql.conf -n a1
    
    
    20/10/09 20:08:40 INFO sink.LoggerSink: Event: { headers:{} body: 31 2C 7A 68 61 6E 67 73 61 6E 2C                1,zhangsan, }
    20/10/09 20:08:40 INFO sink.LoggerSink: Event: { headers:{} body: 32 2C 6C 69 73 69 2C                            2,lisi, }
    20/10/09 20:08:40 INFO sink.LoggerSink: Event: { headers:{} body: 33 2C 77 61 6E 67 77 75 2C                      3,wangwu, }
    20/10/09 20:08:45 INFO defineSource.SQLSourceHelper: 3
    20/10/09 20:08:45 INFO defineSource.SQLSourceHelper: execSql:SELECT * FROM student where id>3
    resultSize:0
    ##可以看到查询出了指定表中的数据,同时会定时查询该表,看有无新数据插入
    
  • 往表中再插入一条数据:

    mysql> insert into student values(4,'erdan');
    Query OK, 1 row affected (0.00 sec)
    
  • 可以看到新的数据也被读取:

    20/10/09 20:11:05 INFO defineSource.SQLSourceHelper: updateStatus Sql:insert into flume_meta(source_tab,currentIndex) VALUES('student','4') on DUPLICATE key update source_tab=values(source_tab),currentIndex=values(currentIndex)
    20/10/09 20:11:05 INFO defineSource.SQLSourceHelper: exec::insert into flume_meta(source_tab,currentIndex) VALUES('student','4') on DUPLICATE key update source_tab=values(source_tab),currentIndex=values(currentIndex)
    20/10/09 20:11:10 INFO sink.LoggerSink: Event: { headers:{} body: 34 2C 65 72 64 61 6E 2C                         4,erdan, }
    

    自定义source并执行成功。

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 2
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值