Flume监控之Ganglia

最新推荐文章于 2023-11-20 16:13:52 发布

yidianyidei

最新推荐文章于 2023-11-20 16:13:52 发布

阅读量695

点赞数

分类专栏：从零开始学大数据-Flume 文章标签：大数据 flume

本文链接：https://blog.csdn.net/yidianyidei/article/details/108399142

版权

从零开始学大数据-Flume 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

文章目录

一 Flume监控之Ganglia
二自定义Source
三自定义Sink
四自定义MySQLSource

一 Flume监控之Ganglia

1 Ganglia的安装与部署

1) 安装httpd服务与php

[root@hadoop102 flume]$ sudo rpm -Uvh http://dl.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm

[root@hadoop102 flume]$ sudo yum -y install httpd php

2) 安装其他依赖

[root@hadoop102 flume]$ sudo yum -y install rrdtool perl-rrdtool rrdtool-devel

[root@hadoop102 flume]$ sudo yum -y install apr-devel

3) 安装ganglia

[root@hadoop102 flume]$ sudo yum -y install ganglia-gmetad

[root@hadoop102 flume]$ sudo yum -y install ganglia-web

[root@hadoop102 flume]$ sudo yum install -y ganglia-gmond

Ganglia由gmond、gmetad和gweb三部分组成。

gmond（Ganglia Monitoring Daemon）是一种轻量级服务，安装在每台需要收集指标数据的节点主机上。使用gmond，你可以很容易收集很多系统指标数据，如CPU、内存、磁盘、网络和活跃进程的数据等。

gmetad（Ganglia Meta Daemon）整合所有信息，并将其以RRD格式存储至磁盘的服务。

gweb（Ganglia Web）Ganglia可视化工具，gweb是一种利用浏览器显示gmetad所存储数据的PHP前端。在Web界面中以图表方式展现集群的运行状态下收集的多种不同指标数据。

4) 修改配置文件/etc/httpd/conf.d/ganglia.conf

[root@hadoop102 flume]$ sudo vim /etc/httpd/conf.d/ganglia.conf

在这里插入图片描述

5) 修改配置文件/etc/ganglia/gmetad.conf

[root@hadoop102 flume]$ sudo vim /etc/ganglia/gmetad.conf

修改为：

data_source “hadoop102” 192.168.1.102 (修改hadoop 地址）

6) 修改配置文件/etc/ganglia/gmond.conf

[root@hadoop102 flume]$ sudo vim /etc/ganglia/gmond.conf

修改为：


cluster {

 name = "hadoop102"

 owner = "unspecified"

 latlong = "unspecified"

 url = "unspecified"

}

udp_send_channel {

 \#bind_hostname = yes # Highly recommended, soon to be default.

            \# This option tells gmond to use a source address

            \# that resolves to the machine's hostname. Without

            \# this, the metrics may appear to come from any

            \# interface and the DNS names associated with

            \# those IPs will be used to create the RRDs.

 \# mcast_join = 239.2.11.71

 host = 192.168.1.102

 port = 8649

 ttl = 1

}

udp_recv_channel {

 \# mcast_join = 239.2.11.71

 port = 8649

 bind = 192.168.1.102

 retry_bind = true

 \# Size of the UDP buffer. If you are handling lots of metrics you really

 \# should bump it up to e.g. 10MB or even higher.

 \# buffer = 10485760

}

7) 修改配置文件/etc/selinux/config

[root@hadoop102 flume]$ sudo vim /etc/selinux/config

**修改为：**

\# This file controls the state of SELinux on the system.

\# SELINUX= can take one of these three values:

\#   enforcing - SELinux security policy is enforced.

\#   permissive - SELinux prints warnings instead of enforcing.

\#   disabled - No SELinux policy is loaded.

SELINUX=disabled

\# SELINUXTYPE= can take one of these two values:

\#   targeted - Targeted processes are protected,

\#   mls - Multi Level Security protection.

SELINUXTYPE=targeted

尖叫提示：selinux本次生效关闭必须重启，如果此时不想重启，可以临时生效之：

[root@hadoop102 flume]$ sudo setenforce 0

5) 启动ganglia

[root@hadoop102 flume]$ sudo service httpd start

[root@hadoop102 flume]$ sudo service gmetad start

[root@hadoop102 flume]$ sudo service gmond start

6) 打开网页浏览ganglia****页面

http://hadoop/ganglia

在这里插入图片描述

尖叫提示：如果完成以上操作依然出现权限不足错误，请修改/var/lib/ganglia目录的权限：

[root@hadoop102 flume]$ sudo chmod -R 777 /var/lib/ganglia

2 操作Flume测试监控

1) 修改/opt/module/flume/conf目录下的flume-env.sh配置：

JAVA_OPTS="-Dflume.monitoring.type=ganglia

-Dflume.monitoring.hosts=本机ip:8649

-Xms100m

-Xmx200m"

2) 启动Flume任务

[root@hadoop102 flume]$

bin/flume-ng agent \
--conf conf/ \
--name a1 \
--conf-file job/flume-netcat-logger.conf \
-Dflume.root.logger==INFO,console \
-Dflume.monitoring.type=ganglia \
-Dflume.monitoring.hosts=本机ip:8649

3) 发送数据观察ganglia监测图

[root@hadoop102 flume]$ nc localhost 44444

图例说明：

字段（图表名称）	字段含义
EventPutAttemptCount	source尝试写入channel的事件总数量
EventPutSuccessCount	成功写入channel且提交的事件总数量
EventTakeAttemptCount	sink尝试从channel拉取事件的总数量。这不意味着每次事件都被返回，因为sink拉取的时候channel可能没有任何数据。
EventTakeSuccessCount	sink成功读取的事件的总数量
StartTime	channel启动的时间（毫秒）
StopTime	channel停止的时间（毫秒）
ChannelSize	目前channel中事件的总数量
ChannelFillPercentage	channel占用百分比
ChannelCapacity	channel的容量

二自定义Source

1 介绍

Source是负责接收数据到Flume Agent的组件。Source组件可以处理各种类型、各种格式的日志数据，包括avro、thrift、exec、jms、spooling directory、netcat、sequence generator、syslog、http、legacy。官方提供的source类型已经很多，但是有时候并不能满足实际开发当中的需求，此时我们就需要根据实际需求自定义某些source。

官方也提供了自定义source的接口：

https://flume.apache.org/FlumeDeveloperGuide.html#source根据官方说明自定义MySource需要继承AbstractSource类并实现Configurable和PollableSource接口。
实现相应方法：
getBackOffSleepIncrement()//暂不用
getMaxBackOffSleepInterval()//暂不用
configure(Context context)//初始化context（读取配置文件内容）
process()//获取数据封装成event并写入channel，这个方法将被循环调用。
使用场景：读取MySQL数据或者其他文件系统。

2 需求

使用flume接收数据，并给每条数据添加前缀，输出到控制台。前缀可从flume配置文件中配置。

在这里插入图片描述

3 分析

在这里插入图片描述

4 编码

导入pom****依赖

<dependencies>
    <dependency>
        <groupId>org.apache.flume</groupId>
        <artifactId>flume-ng-core</artifactId>
        <version>1.7.0</version>
</dependency>

</dependencies>

package com.atguigu;

import org.apache.flume.Context;
import org.apache.flume.EventDeliveryException;
import org.apache.flume.PollableSource;
import org.apache.flume.conf.Configurable;
import org.apache.flume.event.SimpleEvent;
import org.apache.flume.source.AbstractSource;

import java.util.HashMap;

public class MySource extends AbstractSource implements Configurable, PollableSource {

    //定义配置文件将来要读取的字段
    private Long delay;
    private String field;

    //初始化配置信息
    @Override
    public void configure(Context context) {
        delay = context.getLong("delay");
        field = context.getString("field", "Hello!");
    }

    @Override
    public Status process() throws EventDeliveryException {

        try {
            //创建事件头信息
            HashMap<String, String> hearderMap = new HashMap<>();
            //创建事件
            SimpleEvent event = new SimpleEvent();
            //循环封装事件
            for (int i = 0; i < 5; i++) {
                //给事件设置头信息
                event.setHeaders(hearderMap);
                //给事件设置内容
                event.setBody((field + i).getBytes());
                //将事件写入channel
                getChannelProcessor().processEvent(event);
                Thread.sleep(delay);
            }
        } catch (Exception e) {
            e.printStackTrace();
            return Status.BACKOFF;
        }
        return Status.READY;
    }

    @Override
    public long getBackOffSleepIncrement() {
        return 0;
    }

    @Override
    public long getMaxBackOffSleepInterval() {
        return 0;
    }
}

5 测试

1）打包

将写好的代码打包，并放到flume的lib目录（/opt/module/flume）下。

2）配置文件

# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# Describe/configure the source
a1.sources.r1.type = com.atguigu.MySource
a1.sources.r1.delay = 1000
#a1.sources.r1.field = atguigu

# Describe the sink
a1.sinks.k1.type = logger

# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

3）开启任务

[root@hadoop102 flume]$ pwd

/opt/module/flume

[root@hadoop102 flume]$ bin/flume-ng agent -c conf/ -f job/mysource.conf -n a1 -Dflume.root.logger=INFO,console

4）结果展示

在这里插入图片描述

三自定义Sink

3.1 介绍

Sink不断地轮询Channel中的事件且批量地移除它们，并将这些事件批量写入到存储或索引系统、或者被发送到另一个Flume Agent。

Sink是完全事务性的。在从Channel批量删除数据之前，每个Sink用Channel启动一个事务。批量事件一旦成功写出到存储系统或下一个Flume Agent，Sink就利用Channel提交事务。事务一旦被提交，该Channel从自己的内部缓冲区删除事件。

Sink组件目的地包括hdfs、logger、avro、thrift、ipc、file、null、HBase、solr、自定义。官方提供的Sink类型已经很多，但是有时候并不能满足实际开发当中的需求，此时我们就需要根据实际需求自定义某些Sink。

官方也提供了自定义source的接口：

https://flume.apache.org/FlumeDeveloperGuide.html#sink根据官方说明自定义MySink需要继承AbstractSink类并实现Configurable接口。

实现相应方法：

configure(Context context)//初始化context（读取配置文件内容）

process()//从Channel读取获取数据（event），这个方法将被循环调用。

使用场景：读取Channel数据写入MySQL或者其他文件系统。

3.2 需求

使用flume接收数据，并在Sink端给每条数据添加前缀和后缀，输出到控制台。前后缀可在flume任务配置文件中配置。

流程分析：

在这里插入图片描述

3.3 编码

package com.atguigu;

import org.apache.flume.*;
import org.apache.flume.conf.Configurable;
import org.apache.flume.sink.AbstractSink;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

public class MySink extends AbstractSink implements Configurable {

    //创建Logger对象
    private static final Logger LOG = LoggerFactory.getLogger(AbstractSink.class);

    private String prefix;
    private String suffix;

    @Override
    public Status process() throws EventDeliveryException {

        //声明返回值状态信息
        Status status;

        //获取当前Sink绑定的Channel
        Channel ch = getChannel();

        //获取事务
        Transaction txn = ch.getTransaction();

        //声明事件
        Event event;

        //开启事务
        txn.begin();

        //读取Channel中的事件，直到读取到事件结束循环
        while (true) {
            event = ch.take();
            if (event != null) {
                break;
            }
        }
        try {
            //处理事件（打印）
            LOG.info(prefix + new String(event.getBody()) + suffix);

            //事务提交
            txn.commit();
            status = Status.READY;
        } catch (Exception e) {

            //遇到异常，事务回滚
            txn.rollback();
            status = Status.BACKOFF;
        } finally {

            //关闭事务
            txn.close();
        }
        return status;
    }

    @Override
    public void configure(Context context) {

        //读取配置文件内容，有默认值
        prefix = context.getString("prefix", "hello:");

        //读取配置文件内容，无默认值
        suffix = context.getString("suffix");
    }
}

3.4 测试

1）打包

将写好的代码打包，并放到flume的lib目录（/opt/module/flume）下。

2）配置文件

# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# Describe/configure the source
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 44444

# Describe the sink
a1.sinks.k1.type = com.atguigu.MySink
#a1.sinks.k1.prefix = atguigu:
a1.sinks.k1.suffix = :atguigu

# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

3）开启任务

[root@hadoop102 flume]$ pwd

/opt/module/flume

[root@hadoop102 flume]$ bin/flume-ng agent -c conf/ -f job/mysink.conf -n a1 -Dflume.root.logger=INFO,console

[root@hadoop102 ~]$ nc localhost 44444

hello

atguigu

四自定义MySQLSource

4.1 自定义Source说明

实时监控MySQL，从MySQL中获取数据传输到HDFS或者其他存储框架，所以此时需要我们自己实现MySQLSource。

官方也提供了自定义source的接口：

官网说明：https://flume.apache.org/FlumeDeveloperGuide.html#source

4.2 自定义MySQLSource组成

图6-1 自定义MySQLSource组成

在这里插入图片描述

4.3 自定义MySQLSource步骤

根据官方说明自定义mysqlsource需要继承AbstractSource类并实现Configurable和PollableSource接口。

实现相应方法：

getBackOffSleepIncrement()//暂不用

getMaxBackOffSleepInterval()//暂不用

configure(Context context)//初始化context

process()//获取数据（从mysql获取数据，业务处理比较复杂，所以我们定义一个专门的类——SQLSourceHelper来处理跟mysql的交互），封装成event并写入channel，这个方法被循环调用

stop()//关闭相关的资源

PollableSource：从source中提取数据，将其发送到channel。

Configurable：实现了Configurable的任何类都含有一个context，使用context获取配置信息。

4.4 代码实现

1导入pom依赖

<dependencies>
    <dependency>
        <groupId>org.apache.flume</groupId>
        <artifactId>flume-ng-core</artifactId>
        <version>1.7.0</version>
    </dependency>
    <dependency>
        <groupId>mysql</groupId>
        <artifactId>mysql-connector-java</artifactId>
        <version>5.1.27</version>
    </dependency>
</dependencies>

2、添加配置信息

在classpath下添加jdbc.properties和log4j.properties

jdbc.properties:

dbDriver=com.mysql.jdbc.Driver

dbUrl=jdbc:mysql://hadoop102:3306/mysqlsource?useUnicode=true&characterEncoding=utf-8
 dbUser=root
 dbPassword=000000

log4j.properties:

\#--------console-----------

log4j.rootLogger=info,myconsole,myfile
 log4j.appender.myconsole=org.apache.log4j.ConsoleAppender
 log4j.appender.myconsole.layout=org.apache.log4j.SimpleLayout
 \#log4j.appender.myconsole.layout.ConversionPattern =%d [%t] %-5p [%c] - %m%n
 
 \#log4j.rootLogger=error,myfile
 log4j.appender.myfile=org.apache.log4j.DailyRollingFileAppender
 log4j.appender.myfile.File=/tmp/flume.log
 log4j.appender.myfile.layout=org.apache.log4j.PatternLayout
 log4j.appender.myfile.layout.ConversionPattern =%d [%t] %-5p [%c] - %m%n

3 SQLSourceHelper

1) 属性说明：

属性	说明（括号中为默认值）
runQueryDelay	查询时间间隔（10000）
batchSize	缓存大小（100）
startFrom	查询语句开始id（0）
currentIndex	查询语句当前id，每次查询之前需要查元数据表
recordSixe	查询返回条数
table	监控的表名
columnsToSelect	查询字段（*）
customQuery	用户传入的查询语句
query	查询语句
defaultCharsetResultSet	编码格式（UTF-8）

2) 方法说明：

方法	说明
SQLSourceHelper(Context context)	构造方法，初始化属性及获取JDBC连接
InitConnection(String url, String user, String pw)	获取JDBC连接
checkMandatoryProperties()	校验相关属性是否设置（实际开发中可增加内容）
buildQuery()	根据实际情况构建sql语句，返回值String
executeQuery()	执行sql语句的查询操作，返回值List<List>
getAllRows(List<List> queryResult)	将查询结果转换为String，方便后续操作
updateOffset2DB(int size)	根据每次查询结果将offset写入元数据表
execSql(String sql)	具体执行sql语句方法
getStatusDBIndex(int startFrom)	获取元数据表中的offset
queryOne(String sql)	获取元数据表中的offset实际sql语句执行方法
close()	关闭资源

yidianyidei

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
Flume监控之Ganglia

文章目录一 Flume监控之Ganglia1 Ganglia的安装与部署1) 安装httpd服务与php**2)** **安装其他依赖****3)** **安装ganglia**2 操作Flume测试监控**1)** **修改/opt/module/flume/conf****目录下的flume-env.sh****配置：****2)** **启动Flume****任务****3)** **发送数据观察ganglia****监测图**图例说明：二自定义Source1 介绍官方也提供了自定义source的接
复制链接

扫一扫