hadoop状态分析系统chukwa

 

转:http://blog.csdn.net/anghlq/article/details/6271820

Apache 的开源项目 hadoop, 作为一个分布式存储和计算系统,已经被业界广泛应用。很多大型企业都有了各自基于 hadoop 的应用和相关扩展。当 1000+ 以上个节点的 hadoop 集群变得常见时,集群自身的相关信息如何收集和分析呢?针对这个问题, Apache 同样提出了相应的解决方案,那就是 chukwa。

概述
chukwa 的官方网站是这样描述自己的: chukwa 是一个开源的用于监控大型分布式系统的数据收集系统。这是构建在 hadoop 的 hdfs 和 map/reduce 框架之上的,继承了 hadoop 的可伸缩性和鲁棒性。Chukwa 还包含了一个强大和灵活的工具集,可用于展示、监控和分析已收集的数据。
在一些网站上,甚至声称 chukwa 是一个“日志处理/分析的full stack solution”。

说了这么多,你心动了吗?
我们先来看看 chukwa 是什么样子的: 

 

 
 

chukwa 不是什么
1. chukwa 不是一个单机系统. 在单个节点部署一个 chukwa 系统,基本没有什么用处. chukwa 是一个构建在 hadoop 基础上的分布式日志处理系统.换言之,在搭建 chukwa 环境之前,你需要先构建一个 hadoop 环境,然后在 hadoop 的基础上构建 chukwa 环境,这个关系也可以从稍后的 chukwa 架构图上看出来.这也是因为 chukwa 的假设是要处理的数据量是在 T 级别的.
2. chukwa 不是一个实时错误监控系统.在解决这个问题方面, ganglia,nagios 等等系统已经做得很好了,这些系统对数据的敏感性都可以达到秒级. chukwa 分析的是数据是分钟级别的,它认为像集群的整体 cpu 使用率这样的数据,延迟几分钟拿到,不是什么问题.
3. chukwa 不是一个封闭的系统.虽然 chukwa 自带了许多针对 hadoop 集群的分析项,但是这并不是说它只能监控和分析 hadoop.chukwa 提供了一个对大数据量日志类数据采集、存储、分析和展示的全套解决方案和框架,在这类数据生命周期的各个阶段, chukwa 都提供了近乎完美的解决方案,这一点也可以从它的架构中看出来.

chukwa 是什么
上一节说了很多 chukwa 不是什么,下面来看下 chukwa 具体是干什么的一个系统呢?
具体而言, chukwa 致力于以下几个方面的工作:
1. 总体而言, chukwa 可以用于监控大规模(2000+ 以上的节点, 每天产生数据量在T级别) hadoop 集群的整体运行情况并对它们的日志进行分析
2. 对于集群的用户而言: chukwa 展示他们的作业已经运行了多久,占用了多少资源,还有多少资源可用,一个作业是为什么失败了,一个读写操作在哪个节点出了问题.
3. 对于集群的运维工程师而言: chukwa 展示了集群中的硬件错误,集群的性能变化,集群的资源瓶颈在哪里.
4. 对于集群的管理者而言: chukwa 展示了集群的资源消耗情况,集群的整体作业执行情况,可以用以辅助预算和集群资源协调.
5. 对于集群的开发者而言: chukwa 展示了集群中主要的性能瓶颈,经常出现的错误,从而可以着力重点解决重要问题.

基本架构
有了一个感性的认识后,我们来看下它的构架, chukwa 的整体结构图是下面这个样子的:

 

 


其中主要的部件为:
1. agents : 负责采集最原始的数据,并发送给 collectors
2. adaptor : 直接采集数据的接口和工具,一个 agent 可以管理多个 adaptor 的数据采集
3. collectors 负责收集 agents 收送来的数据,并定时写入集群中
4. map/reduce jobs 定时启动,负责把集群中的数据分类、排序、去重和合并
5. HICC 负责数据的展示

相关设计
adaptors 和 agents
在每个数据的产生端(基本上是集群中每一个节点上), chukwa 使用一个 agent 来采集它感兴趣的数据,每一类数据通过一个 adaptor 来实现, 数据的类型(DataType?)在相应的配置中指定. 默认地, chukwa 对以下常见的数据来源已经提供了相应的 adaptor : 命令行输出、log 文件和 httpSender等等. 这些 adaptor 会定期运行(比如每分钟读一次 df 的结果)或事件驱动地执行(比如 kernel 打了一条错误日志). 如果这些 adaptor 还不够用,用户也可以方便地自己实现一个 adaptor 来满足需求。

为防止数据采集端的 agent 出现故障,chukwa 的 agent 采用了所谓的 ‘watchdog’ 机制,会自动重启终止的数据采集进程,防止原始数据的丢失。
另一方面, 对于重复采集的数据, 在 chukwa 的数据处理过程中,会自动对它们进行去重. 这样,就可以对于关键的数据在多台机器上部署相同的 agent,从而实现容错的功能.
collectors
agents 采集到的数据,是存储到 hadoop 集群上的. hadoop 集群擅长于处理少量大文件,而对于大量小文件的处理则不是它的强项,针对这一点,chukwa 设计了 collector 这个角色,用于把数据先进行部分合并,再写入集群,防止大量小文件的写入。
另一方面,为防止 collector 成为性能瓶颈或成为单点,产生故障, chukwa 允许和鼓励设置多个 collector, agents 随机地从 collectors 列表中选择一个 collector 传输数据,如果一个 collector 失败或繁忙,就换下一个 collector. 从而可以实现负载的均衡,实践证明,多个 collector 的负载几乎是平均的.

demux 和 archive
放在集群上的数据,是通过 map/reduce 作业来实现数据分析的. 在 map/reduce 阶段, chukwa 提供了 demux 和 archive 任务两种内置的作业类型.
demux 作业负责对数据的分类、排序和去重. 在 agent 一节中,我们提到了数据类型(DataType?)的概念.由 collector 写入集群中的数据,都有自己的类型. demux 作业在执行过程中,通过数据类型和配置文件中指定的数据处理类,执行相应的数据分析工作,一般是把非结构化的数据结构化,抽取中其中的数据属性.由于 demux 的本质是一个 map/reduce 作业,所以我们可以根据自己的需求制定自己的 demux 作业,进行各种复杂的逻辑分析. chukwa 提供的 demux interface 可以用 java 语言来方便地扩展.
而 archive 作业则负责把同类型的数据文件合并,一方面保证了同一类的数据都在一起,便于进一步分析, 另一方面减少文件数量, 减轻 hadoop 集群的存储压力。
dbadmin
放在集群上的数据,虽然可以满足数据的长期存储和大数据量计算需求,但是不便于展示.为此, chukwa 做了两方面的努力:
1. 使用 mdl 语言,把集群上的数据抽取到 mysql 数据库中,对近一周的数据,完整保存,超过一周的数据,按数据离现在的时间长短作稀释,离现在越久的数据,所保存的数据时间间隔越长.通过 mysql 来作数据源,展示数据.
2. 使用 hbase 或类似的技术,直接把索引化的数据在存储在集群上
到 chukwa 0.4.0 版本为止, chukwa 都是用的第一种方法,但是第二种方法更优雅也更方便一些.
hicc
hicc 是 chukwa 的数据展示端的名字.在展示端, chukwa 提供了一些默认的数据展示 widget,可以使用“列表”、“曲线图”、“多曲线图”、“柱状图”、“面积图式展示一类或多类数据,给用户直观的数据趋势展示。而且,在 hicc 展示端,对不断生成的新数据和历史数据,采用 robin 策略,防止数据的不断增长增大服务器压力,并对数据在时间轴上“稀释”,可以提供长时间段的数据展示
从本质上, hicc 是用 jetty 来实现的一个 web 服务端,内部用的是 jsp 技术和 javascript 技术.各种需要展示的数据类型和页面的局都可以通过简直地拖拽方式来实现,更复杂的数据展示方式,可以使用 sql 语言组合出各种需要的数据.如果这样还不能满足需求,不用怕,动手修改它的 jsp 代码就可以了.
其它数据接口
如果对原始数据还有新的需要,用户还可以通过 map/reduce 作业或 pig 语言直接访问集群上的原始数据,以生成所需要的结果。chukwa 还提供了命令行的接口,可以直接访问到集群上数据。
默认数据支持
对于集群各节点的cpu使用率、内存使用率、硬盘使用率、集群整体的 cpu 平均使用率、集群整体的内存使用率、集群整体的存储使用率、集群文件数变化、作业数变化等等 hadoop 相关数据,从采集到展示的一整套流程, chukwa 都提供了内建的支持,只需要配置一下就可以使用.可以说是相当方便的.
可以看出,chukwa 从数据的产生、收集、存储、分析到展示的整个生命周期都提供了全面的支持。

 

 

Collector

PipelineStageWriter, SocketTeeWriter

One particularly useful Writer class is PipelineStageWriter , which lets you string together a series of PipelineableWriters for pre-processing or post-processing incoming data. As an example, the SocketTeeWriter class allows other programs to get incoming chunks fed to them over a socket by the collector.

<property>
<name>chukwaCollector.writerClass</name>
<value>org.apache.hadoop.chukwa.datacollection.writer.PipelineStageWriter</value>
</property>
 
<property>
<name>chukwaCollector.pipeline</name>
<value>org.apache.hadoop.chukwa.datacollection.writer.SocketTeeWriter,org.apache.hadoop.chukwa.datacollection.writer.SeqFileWriter</value>
</property>
   

SocketTeeWriter

The SocketTeeWriter allows external processes to watch the stream of chunks passing through the collector. This allows certain kinds of real-time monitoring to be done on-top of Chukwa.

SocketTeeWriter listens on a port (specified by conf option chukwaCollector.tee.port , defaulting to 9094.) Applications that want Chunks should connect to that port, and issue a command of the form RAW|WRITABLE <filter>/n . Filters use the same syntax as the Dump command . If the filter is accepted, the Writer will respond OK/n .

Subsequently, Chunks matching the filter will be serialized and sent back over the socket. Specifying "WRITABLE" will cause the chunks to be written using Hadoop's Writable serialization framework. "RAW" will send the internal data of the Chunk, without any metadata, prefixed by its length encoded as a 32-bit int, big-endian. "HEADER" is similar to "RAW", but with a one-line header in front of the content. Header format is hostname datatype stream name offset , separated by spaces.

The filter will be de-activated when the socket is closed.

Socket s2 = new Socket("host", SocketTeeWriter.DEFAULT_PORT); s2.getOutputStream().write("RAW datatype=XTrace/n".getBytes()); dis = new DataInputStream(s2.getInputStream()); dis.readFully(new byte[3]); //read "OK/n" while(true) { int len = dis.readInt(); byte[] data = new byte[len]; dis.readFully(data); DoSomethingUsing(data); }

 

 

Simple Archiver

The simple archiver is designed to consolidate a large number of data sink files into a small number of archive files, with the contents grouped in a useful way. Archive files, like raw sink files, are in Hadoop sequence file format. Unlike the data sink, however, duplicates have been removed. (Future versions of the Simple Archiver will indicate the presence of gaps.)

The simple archiver moves every .done file out of the sink, and then runs a MapReduce job to group the data. Output Chunks will be placed into files with names of the form hdfs:///chukwa/archive/clustername/Datatype_date.arc . Date corresponds to when the data was collected; Datatype is the datatype of each Chunk.

If archived data corresponds to an existing filename, a new file will be created with a disambiguating suffix.

Demux

A key use for Chukwa is processing arriving data, in parallel, using MapReduce. The most common way to do this is using the Chukwa demux framework. As data flows through Chukwa , the demux job is often the first job that runs.

By default, Chukwa will use the default TsProcessor. This parser will try to extract the real log statement from the log entry using the ISO8601 date format. If it fails, it will use the time at which the chunk was written to disk (collector timestamp).

 

 

 

Data Model

Chukwa Adaptors emit data in Chunks . A Chunk is a sequence of bytes, with some metadata. Several of these are set automatically by the Agent or Adaptors. Two of them require user intervention: cluster name and datatype . Cluster name is specified in conf/chukwa-env.sh , and is global to each Agent process. Datatype describes the expected format of the data collected by an Adaptor instance, and it is specified when that instance is started.

The following table lists the Chunk metadata fields.

FieldMeaningSource
SourceHostname where Chunk was generatedAutomatic
ClusterCluster host is associated withSpecified by user in agent config
DatatypeFormat of outputSpecified by user when Adaptor started
Sequence IDOffset of Chunk in streamAutomatic, initial offset specified when Adaptor started
NameName of data sourceAutomatic, chosen by Adaptor

Conceptually, each Adaptor emits a semi-infinite stream of bytes, numbered starting from zero. The sequence ID specifies how many bytes each Adaptor has sent, including the current chunk. So if an adaptor emits a chunk containing the first 100 bytes from a file, the sequenceID of that Chunk will be 100. And the second hundred bytes will have sequence ID 200. This may seem a little peculiar, but it's actually the same way that TCP sequence numbers work.

Adaptors need to take sequence ID as a parameter so that they can resume correctly after a crash, and not send redundant data. When starting adaptors, it's usually save to specify 0 as an ID, but it's sometimes useful to specify something else. For instance, it lets you do things like only tail the second half of a file.

 

 

Adaptors

This section lists the standard adaptors, and the arguments they take.

  • FileAdaptor : Pushes a whole file, as one Chunk, then exits. Takes one mandatory parameter; the file to push.
    add FileTailer FooData /tmp/foo 0
    This pushes file /tmp/foo as one chunk, with datatype FooData .
  • filetailer.LWFTAdaptor Repeatedly tails a file, treating the file as a sequence of bytes, ignoring the content. Chunk boundaries are arbitrary. This is useful for streaming binary data. Takes one mandatory parameter; a path to the file to tail. If log file is rotated while there is unread data, this adaptor will not attempt to recover it.
    add filetailer.LWFTAdaptor BarData /foo/bar 0
    This pushes /foo/bar in a sequence of Chunks of type BarData
  • filetailer.FileTailingAdaptor Repeatedly tails a file, again ignoring content and with unspecified Chunk boundaries. Takes one mandatory parameter; a path to the file to tail. Keeps a file handle open in order to detect log file rotation.
    add filetailer.FileTailingAdaptor BarData /foo/bar 0
    This pushes /foo/bar in a sequence of Chunks of type BarData
  • filetailer.RCheckFTAdaptor An experimental modification of the above, which avoids the need to keep a file handle open. Same parameters and usage as the above.
  • filetailer.CharFileTailingAdaptorUTF8 The same as the base FileTailingAdaptor, except that chunks are guaranteed to end only at carriage returns. This is useful for most ASCII log file formats.
  • filetailer.CharFileTailingAdaptorUTF8NewLineEscaped The same, except that chunks are guaranteed to end only at non-escaped carriage returns. This is useful for pushing Chukwa-formatted log files, where exception stack traces stay in a single chunk.
  • DirTailingAdaptor Takes a directory path and an adaptor name as mandatory parameters; repeatedly scans that directory and all subdirectories, and starts the indicated adaptor running on each file. Since the DirTailingAdaptor does not, itself, emit data, the datatype parameter is applied to the newly-spawned adaptors. Note that if you try this on a large directory with an adaptor that keeps file handles open, it is possible to exceed your system's limit on open files. A file pattern can be specified as an optional second parameter.
    add DirTailingAdaptor logs /var/log/ *.log filetailer.CharFileTailingAdaptorUTF8 0
  • ExecAdaptor Takes a frequency (in milliseconds) as optional parameter, and then program name as mandatory parameter. Runs that program repeatedly at a rate specified by frequency.
    add ExecAdaptor Df 60000 /bin/df -x nfs -x none 0
    This adaptor will run df every minute, labeling output as Df.
  • UDPAdaptor Takes a port number as mandatory parameter. Binds to the indicated UDP port, and emits one Chunk for each received packet.
    add UdpAdaptor Packets 1234 0
    This adaptor will listen for incoming traffic on port 1234, labeling output as Packets.
  • edu.berkeley.chukwa_xtrace.XtrAdaptor (available in contrib ) Takes an Xtrace ReportSource class name [without package] as mandatory argument, and no optional parameters. Listens for incoming reports in the same way as that ReportSource would.
    add edu.berkeley.chukwa_xtrace.XtrAdaptor Xtrace UdpReportSource 0
    This adaptor will create and start a UdpReportSource , labeling its output datatype as Xtrace.

 

 

 

HDFS File System Structure

The general layout of the Chukwa filesystem is as follows.

/chukwa/ archivesProcessing/ dataSinkArchives/ demuxProcessing/ finalArchives/ logs/ postProcess/ repos/ rolling/ temp/

Raw Log Collection and Aggregation Workflow

What data is stored where is best described by stepping through the Chukwa workflow.

  1. Collectors write chunks to logs/*.chukwa files until a 64MB chunk size is reached or a given time interval has passed.
    • logs/*.chukwa
  2. Collectors close chunks and rename them to *.done
    • from logs/*.chukwa
    • to logs/*.done
  3. DemuxManager checks for *.done files every 20 seconds.
    1. If *.done files exist, moves files in place for demux processing:
      • from: logs/*.done
      • to: demuxProcessing/mrInput
    2. The Demux MapReduce job is run on the data in demuxProcessing/mrInput .
    3. If demux is successful within 3 attempts, archives the completed files:
      • from: demuxProcessing/mrOutput
      • to: dataSinkArchives/[yyyyMMdd]/*/*.done
    4. Otherwise moves the completed files to an error folder:
      • from: demuxProcessing/mrOutput
      • to: dataSinkArchives/InError/[yyyyMMdd]/*/*.done
  4. PostProcessManager wakes up every few minutes and aggregates, orders and de-dups record files.
    • from: postProcess/demuxOutputDir_*/[clusterName]/[dataType]/[dataType]_[yyyyMMdd]_[HH].R.evt
    • to: repos/[clusterName]/[dataType]/[yyyyMMdd]/[HH]/[mm]/[dataType]_[yyyyMMdd]_[HH]_[N].[N].evt
  5. HourlyChukwaRecordRolling runs M/R jobs at 16 past the hour to group 5 minute logs to hourly.
    • from: repos/[clusterName]/[dataType]/[yyyyMMdd]/[HH]/[mm]/[dataType]_[yyyyMMdd]_[mm].[N].evt
    • to: temp/hourlyRolling/[clusterName]/[dataType]/[yyyyMMdd]
    • to: repos/[clusterName]/[dataType]/[yyyyMMdd]/[HH]/[dataType]_HourlyDone_[yyyyMMdd]_[HH].[N].evt
    • leaves: repos/[clusterName]/[dataType]/[yyyyMMdd]/[HH]/rotateDone/
  6. DailyChukwaRecordRolling runs M/R jobs at 1:30AM to group hourly logs to daily.
    • from: repos/[clusterName]/[dataType]/[yyyyMMdd]/[HH]/[dataType]_[yyyyMMdd]_[HH].[N].evt
    • to: temp/dailyRolling/[clusterName]/[dataType]/[yyyyMMdd]
    • to: repos/[clusterName]/[dataType]/[yyyyMMdd]/[dataType]_DailyDone_[yyyyMMdd].[N].evt
    • leaves: repos/[clusterName]/[dataType]/[yyyyMMdd]/rotateDone/
  7. ChukwaArchiveManager every half hour or so aggregates and removes dataSinkArchives data using M/R.
    • from: dataSinkArchives/[yyyyMMdd]/*/*.done
    • to: archivesProcessing/mrInput
    • to: archivesProcessing/mrOutput
    • to: finalArchives/[yyyyMMdd]/*/chukwaArchive-part-*

Log Directories Requiring Cleanup

The following directories will grow over time and will need to be periodically pruned:

  • finalArchives/[yyyyMMdd]/*
  • repos/[clusterName]/[dataType]/[yyyyMMdd]/*.evt
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值