flume采集sqlserver测试

zhcha_xr

于 2022-04-13 00:00:00 发布

阅读量2.4k

点赞数

文章标签： elk 大数据 flume sql sqlserver

本文链接：https://blog.csdn.net/a150791038/article/details/124128799

版权

本文档记录了一次使用Flume从SQL Server数据库采集数据时遇到的重复数据问题。通过测试和日志分析，发现问题是由于状态文件名包含特殊字符‘_’导致Flume无法正确读取。解决方案是避免在配置文件中使用特殊字符，并确保递增字段值大于现有值以实现增量采集。

摘要由CSDN通过智能技术生成

一、背景

由于客户数据库迁移，更改flume配置后，数据采集重复，故测试环境进行验证

二、准备

sqlserver：由于公司没有环境，选择linux服务器安装docker，然后docker sqlserver

三、测试准备

3.1 sqlserver

通过数据库连接工具连接上去，新建数据库，里面插入4条数据

3.2 flume

采集配置文件如下：

agent.sources = sqlSource
agent.sinks = k1
agent.channels = c1

agent.sources.sqlSource.start_time =17:00:00
agent.sources.sqlSource.channels = c1
agent.sources.sqlSource.cycle_time =5*60*1000
agent.sources.sqlSource.is_select_table =false
agent.sources.sqlSource.is_up_than_field =false
agent.sources.sqlSource.identity_field =id
agent.sources.sqlSource.type = com.fusionskye.flume.dbSource.source.DBSource
agent.sources.sqlSource.url = jdbc:sqlserver://119.91.130.53:1435;databaseName=mcafee
agent.sources.sqlSource.driver_class = com.microsoft.sqlserver.jdbc.SQLServerDriver
agent.sources.sqlSource.database = sqlserver
agent.sources.sqlSource.user =SA
agent.sources.sqlSource.password =Aa-111111
agent.sources.sqlSource.table = NewView
agent.sources.sqlSource.columns.to.select = *
agent.sources.sqlSource.where =
agent.sources.sqlSource.up_columns = count
agent.sources.sqlSource.as_columns = count
agent.sources.sqlSource.column_type=integer
agent.sources.sqlSource.time_type=date
agent.sources.sqlSource.time_type_type=timestamp
agent.sources.sqlSource.start.from = 0
agent.sources.sqlSource.begin_time =2020-10-01 14:48:56
agent.sources.sqlSource.end_time =2020-11-02 14:48:56
agent.sources.sqlSource.max.rows = 500
agent.sources.sqlSource.run.query.delay=180000
agent.sources.sqlSource.run.query.delayTimes=180000
agent.sources.sqlSource.status.file.path = /opt/accur/flume/virus/
agent.sources.sqlSource.status.file.name = virus_count.status
agent.sources.sqlSource.custom.query = 0
agent.sources.sqlSource.batch.size=1000
agent.sources.sqlSource.interceptors=i1
agent.sources.sqlSource.interceptors.i1.type = static
agent.sources.sqlSource.interceptors.i1.key=key
agent.sources.sqlSource.interceptors.i1.value={"vendor":"McAfee","product":"防病毒","agent_ip":"172.16.1.58","agent_name":"virus","topicName":"flume1", "hostIP":"172.16.1.58"}

agent.channels.c1.type = memory
agent.channels.c1.capacity = 100000
agent.channels.c1.transactionCapacity = 10000

agent.sinks.k1.type = thrift
agent.sinks.k1.channel=c1
agent.sinks.k1.hostname = 172.16.1.58
agent.sinks.k1.port = 5330
agent.sinks.k1.connect.timeout = 0
agent.sinks.k1.request.timeout = 0

四、测试验证

4.1 进入kafka/bin目录，输入./kafka-console-consumer.sh --bootstrap-server 172.16.1.58:9092 --topic accurLogic |grep McAfee 查看数据结果

4.2 进入flume/bin目录，输入./flume-ng agent -n agent -c ../conf/ -f ../conf/mcafee.conf -Dflume.root.logger=INFO,console 启动flume，并且可以查看日志

结果验证（依据如下步骤，逐步验证）：

第一次验证：启动flume：

4.1获取了完整的数据