flume+ Elasticsearch +kibana环境搭建及讲解

最新推荐文章于 2023-07-13 19:45:50 发布

pincharensheng

最新推荐文章于 2023-07-13 19:45:50 发布

阅读量3.1k

点赞数

分类专栏：大数据文章标签： flume kibana elasticsearch 分布式

本文链接：https://blog.csdn.net/pincharensheng/article/details/52052966

版权

大数据专栏收录该内容

3 篇文章 0 订阅

订阅专栏

1、软件介绍

1.1、flume

1.1.1、flume介绍

1）flume概念

1、flume是一个分布式的日志收集系统，具有高可靠、高可用、事务管理、失败重启等功能。数据处理速度快，完全可以用于生产环境；

2、flume的核心是agent。agent是一个java进程，运行在日志收集端，通过agent接收日志，然后暂存起来，再发送到目的地；

3、agent里面包含3个核心组件：source、channel、sink，介绍如下：

source组件是专用于收集日志的，可以处理各种类型各种格式的日志数据,包括avro、thrift、exec、jms、spooling directory、netcat、sequence generator、syslog、http、legacy、自定义等。source组件把数据收集来以后，临时存放在channel中。

channel 组件是在agent中专用于临时存储数据的，可以存放在memory、jdbc、file、自定义。channel中的数据只有在sink发送成功之后才会被删除。

sink组件是用于把数据发送到目的地的组件，目的地包括hdfs、logger、avro、thrift、ipc、file、null、hbase、solr、Elasticsearch、自定义等。

4、在整个数据传输过程中，流动的是事件event。事件event是Flume的基本数据单位，它携带日志数据(字节数组形式)并且携带有头信息，这些Event由Agent外部的Source生成；

5、flume事务保证是在event级别；

6、flume可以支持多级flume的agent ，支持扇入(fan-in)、扇出(fan-out)；

2）flume数据流模型

Flume以agent为最小的独立运行单位。一个agent就是一个JVM。单agent由Source、Sink和Channel三大组件构成。

Flume支持用户建立多级流，也就是说，多个agent可以协同工作。常见模型如下：

1、单agent模型：

2、多agent模型：

图（一）

图（二）

图（三）

3）flume读取文件方式

对于直接读取文件Source, 主要有两种方式：

1、Exec source

可通过写Unix command的方式组织数据，最常用的就是tail -F [file]。可以实现实时传输，但在flume不运行和脚本错误时，会丢数据，也不支持断点续传功能。因为没有记录上次文件读到的位置，从而没办法知道，下次再读时，从什么地方开始读。特别是在日志文件一直在增加的时候。flume的source挂了。等flume的source再次开启的这段时间内，增加的日志内容，就没办法被source读取到了。不过flume有一个execStream的扩展，可以自己写一个监控日志增加情况，把增加的日志，通过自己写的工具把增加的内容，传送给flume的node。再传送给sink的node。要是能在tail类的source中能支持，在node挂掉这段时间的内容，等下次node开启后在继续传送，那就更完美了。

2、Spooling Directory Source

SpoolSource是监测配置的目录下新增的文件，并将文件中的数据读取出来，可实现准实时。需要注意两点：1、拷贝到spool目录下的文件不可以再打开编辑。2、spool目录下不可包含相应的子目录。在实际使用的过程中，可以结合log4j使用，使用log4j的时候，将log4j的文件分割机制设为1分钟一次，将文件拷贝到spool的监控目录。log4j有一个TimeRolling的插件，可以把log4j分割的文件到spool目录。基本实现了实时的监控。Flume在传完文件之后，将会修改文件的后缀，变为.COMPLETED（后缀也可以在配置文件中灵活指定）。

Exec与Spooling Directory比较：

ExecSource可以实现对日志的实时收集，但是存在Flume不运行或者指令执行出错时，将无法收集到日志数据，无法保证日志数据的完整性。SpoolSource虽然无法实现实时的收集数据，但是可以使用以分钟的方式分割文件，趋近于实时。如果应用无法实现以分钟切割日志文件的话，可以两种收集方式结合使用。

4）flume channel介绍

Channel有多种方式：有MemoryChannel,JDBC Channel, MemoryRecoverChannel, FileChannel。MemoryChannel可以实现高速的吞吐，但是无法保证数据的完整性。MemoryRecoverChannel在官方文档的建议上已经建义使用FileChannel来替换。FileChannel保证数据的完整性与一致性。在具体配置FileChannel时，建议FileChannel设置的目录和程序日志文件保存的目录设成不同的磁盘，以便提高效率。

5）flume特性

1、高可靠性

作为生产环境运行的软件，高可靠性是必须的。从单agent来看，Flume使用基于事务的数据传递方式来保证事件传递的可靠性。Source和Sink分别被封装进一个事务。事件被存放在Channel中直到该事件被处理，Channel中的事件才会被移除。这是Flume提供的点到点的可靠机制。

从多级流来看，前一个agent的sink和后一个agent的source同样有它们的事务来保障数据的可靠性。

事务流程图如下：

2、可恢复性

还是靠Channel。推荐使用FileChannel，将事件持久化在本地文件系统里，但性能相对MemoryChannel较差。

1.1.2、flume配置说明

1、source常见配置（详细配置见官方文档）

1）Avro source

flume可以监听avro-client发送过来的内容，然后进行处理。

配置文件内容如下：

a1.sources = r1

a1.sinks = k1

a1.channels = c1

# Describe/configure the source

a1.sources.r1.type = avro

a1.sources.r1.bind = 0.0.0.0

a1.sources.r1.port = 4141

# Describe the sink

a1.sinks.k1.type = logger

# Use a channel which buffers eventsin memory

a1.channels.c1.type = memory

a1.channels.c1.capacity = 1000

a1.channels.c1.transactionCapacity =100

# Bind the source and sink to thechannel

a1.sources.r1.channels = c1

a1.sinks.k1.channel = c1

2）Spool source，

Spool监测配置的目录下新增的文件，并将文件中的数据读取出来。需要注意两点：

1、拷贝到spool目录下的文件不可以再打开编辑；

2、spool目录下不可包含相应的子目录；

配置文件内容如下：

a1.sources = r1

a1.sinks = k1

a1.channels = c1

# Describe/configure the source

a1.sources.r1.type = spooldir

a1.sources.r1.spoolDir = /home /flume-1.6.0-bin/logs

a1.sources.r1.fileHeader = true

# Describe the sink

a1.sinks.k1.type = logger

# Use a channel which buffers eventsin memory

a1.channels.c1.type = memory

a1.channels.c1.capacity = 1000

a1.channels.c1.transactionCapacity =100

# Bind the source and sink to thechannel

a1.sources.r1.channels = c1

a1.sinks.k1.channel = c1

3）Exec source

EXEC执行一个给定的命令获得输出的源，如使用tail命令。

配置文件内容如下：

a1.sources = r1

a1.sinks = k1

a1.channels = c1

# Describe/configure the source

a1.sources.r1.type = exec

a1.sources.r1.command = tail -F /home/flume-1.6.0-bin/log_exec_tail

# Describe the sink

a1.sinks.k1.type = logger

# Use a channel which buffers eventsin memory

a1.channels.c1.type = memory

a1.channels.c1.capacity = 1000

a1.channels.c1.transactionCapacity =100

# Bind the source and sink to thechannel

a1.sources.r1.channels = c1

a1.sinks.k1.channel = c1

4）Syslogtcp source

Syslogtcp监听TCP的端口做为数据源

配置文件内容如下：

a1.sources = r1

a1.sinks = k1

a1.channels = c1

# Describe/configure the source

a1.sources.r1.type = syslogtcp

a1.sources.r1.port = 5140

a1.sources.r1.host = localhost

# Describe the sink

a1.sinks.k1.type = logger

# Use a channel which buffers eventsin memory

a1.channels.c1.type = memory

a1.channels.c1.capacity = 1000

a1.channels.c1.transactionCapacity =100

# Bind the source and sink to thechannel

a1.sources.r1.channels = c1

a1.sinks.k1.channel = c1

2、sink常见配置（详细配置见官方文档）

1）Hadoop sink

配置文件内容如下：

a1.sources = r1

a1.sinks = k1

a1.channels = c1

# Describe/configure the source

a1.sources.r1.type = syslogtcp

a1.sources.r1.port = 5140

a1.sources.r1.host = localhost

# Describe the sink

a1.sinks.k1.type = hdfs

a1.sinks.k1.hdfs.path =hdfs://m1:9000/user/flume/syslogtcp

a1.sinks.k1.hdfs.filePrefix = Syslog

a1.sinks.k1.hdfs.round = true

a1.sinks.k1.hdfs.roundValue = 10

a1.sinks.k1.hdfs.roundUnit = minute

# Use a channel which buffers eventsin memory

a1.channels.c1.type = memory

a1.channels.c1.capacity = 1000

a1.channels.c1.transactionCapacity =100

# Bind the source and sink to thechannel

a1.sources.r1.channels = c1

a1.sinks.k1.channel = c1

2）File Roll Sink

配置文件内容如下：

a1.sources = r1

a1.sinks = k1

a1.channels = c1

# Describe/configure the source

a1.sources.r1.type = syslogtcp

a1.sources.r1.port = 5555

a1.sources.r1.host = localhost

# Describe the sink

a1.sinks.k1.type = file_roll

a1.sinks.k1.sink.directory = /home/flume-1.6.0-bin/logs

# Use a channel which buffers eventsin memory

a1.channels.c1.type = memory

a1.channels.c1.capacity = 1000

a1.channels.c1.transactionCapacity =100

# Bind the source and sink to thechannel

a1.sources.r1.channels = c1

a1.sinks.k1.channel = c1

3）elasticsearch Sink

配置文件内容如下：

# Name the components on this agent

a1.sources = r1

a1.sinks = k1

a1.channels = c1

# Describe/configure the source

a1.sources.r1.type = avro

a1.sources.r1.bind = localhost

a1.sources.r1.port = 44444

# Describe the sink

a1.sinks. k1.type = elasticsearch

a1.sinks. k1.hostNames =127.0.0.1:9300

a1.sinks. k1.indexName = test_index

a1.sinks. k1.indexType = test_type_1

a1.sinks. k1.clusterName =vie61_yanshi

a1.sinks. k1.batchSize = 10

a1.sinks. k1.ttl = 5d

a1.sinks. k1.serializer =org.apache.flume.sink.elasticsearch.ElasticSearchDynamicSerializer

# Use a channel which buffers eventsin memory

a1.channels.c1.type = memory

a1.channels.c1.capacity = 1000

a1.channels.c1.transactionCapacity =100

# Bind the source and sink to thechannel

a1.sources.r1.channels = c1

a1.sinks.k1.channel = c1

3、Flume Sink Processors常见配置（详细配置见官方文档）

1）Failover SinkProcessor

将多个sink归为一个组，先向组中优先级高的sink发送消息，如果发送失败，则会向组中优先级低的sink发送消息。

配置文件内容如下：

a1.sinkgroups = g1

a1.sinkgroups.g1.sinks = k1 k2

a1.sinkgroups.g1.processor.type =failover

a1.sinkgroups.g1.processor.priority.k1= 5

a1.sinkgroups.g1.processor.priority.k2= 10

a1.sinkgroups.g1.processor.maxpenalty= 10000

2）Load balancing SinkProcessor

将多个sink归为一个组，向组中sink负载均衡的发送消息。负载均衡的方式分为：循环遍历（round_robin）、随机（random）和自定义实现。

配置文件内容如下：

a1.sinkgroups = g1

a1.sinkgroups.g1.sinks = k1 k2

a1.sinkgroups.g1.processor.type =load_balance

a1.sinkgroups.g1.processor.backoff =true

a1.sinkgroups.g1.processor.selector =random

4、channel常见配置（详细配置见官方文档）

1）Memory Channel

此种方式将event内容放在缓存中。优点是效率高，缺点是放在channel中的数据丢失后不可恢复。

配置文件内容如下：

a1.sources = r1

a1.sinks = k1

a1.channels = c1

# Describe/configure the source

a1.sources.r1.type = avro

a1.sources.r1.bind = 0.0.0.0

a1.sources.r1.port = 4141

# Describe the sink

a1.sinks.k1.type = logger

# Use a channel which buffers eventsin memory

a1.channels.c1.type = memory

a1.channels.c1.capacity = 100

a1.channels.c1.transactionCapacity =100

# Bind the source and sink to thechannel

a1.sources.r1.channels = c1

a1.sinks.k1.channel = c1

2）File Channel

此种方式将event内容放在文件系统中。优点是数据达到channel后即使flume挂了，在flume重启后数据依然可以恢复发送。缺点是效率较低。

配置文件内容如下：

a1.sources = r1

a1.sinks = k1

a1.channels = c1

# Describe/configure the source

a1.sources.r1.type = avro

a1.sources.r1.bind = 0.0.0.0

a1.sources.r1.port = 4141

# Describe the sink

a1.sinks.k1.type = logger

# Use a channel which buffers eventsin memory

a1.channels.c1.type = file

a1.channels.c1.checkpointDir =/mnt/flume/checkpoint

a1.channels.c1.dataDirs =/mnt/flume/data

# Bind the source and sink to thechannel

a1.sources.r1.channels = c1

a1.sinks.k1.channel = c1

5、Fan Out常见配置（详细配置见官方文档）

Fan Out（扇出）分为两种方式：replicating（复制）和multiplexing（复用）。

1）replicating（复制）

复制会将一个source的event内容发送给所有配置的channel。

配置文件内容如下：

# List the sources, sinks andchannels for the agent

<Agent>.sources =<Source1>

<Agent>.sinks = <Sink1><Sink2>

<Agent>.channels =<Channel1> <Channel2>

# set list of channels for source(separated by space)

<Agent>.sources.<Source1>.channels= <Channel1> <Channel2>

# set channel for sinks

<Agent>.sinks.<Sink1>.channel= <Channel1>

<Agent>.sinks.<Sink2>.channel= <Channel2>

<Agent>.sources.<Source1>.selector.type= replicating

2）multiplexing（复用）

复用会将一个source的event内容发送给匹配到结果的一个子集channels。

配置文件内容如下：

# Mapping for multiplexing selector

<Agent>.sources.<Source1>.selector.type= multiplexing

<Agent>.sources.<Source1>.selector.header= <someHeader>

<Agent>.sources.<Source1>.selector.mapping.<Value1>= <Channel1>

<Agent>.sources.<Source1>.selector.mapping.<Value2>= <Channel1> <Channel2>

<Agent>.sources.<Source1>.selector.mapping.<Value3>= <Channel2>

#...

<Agent>.sources.<Source1>.selector.default= <Channel2>

具体示例：

# list the sources, sinks andchannels in the agent

agent_foo.sources =avro-AppSrv-source1

agent_foo.sinks = hdfs-Cluster1-sink1avro-forward-sink2

agent_foo.channels = mem-channel-1file-channel-2

# set channels for source

agent_foo.sources.avro-AppSrv-source1.channels= mem-channel-1 file-channel-2

# set channel for sinks

agent_foo.sinks.hdfs-Cluster1-sink1.channel= mem-channel-1

agent_foo.sinks.avro-forward-sink2.channel= file-channel-2

# channel selector configuration

agent_foo.sources.avro-AppSrv-source1.selector.type= multiplexing

agent_foo.sources.avro-AppSrv-source1.selector.header= State

agent_foo.sources.avro-AppSrv-source1.selector.mapping.CA= mem-channel-1

agent_foo.sources.avro-AppSrv-source1.selector.mapping.AZ= file-channel-2

agent_foo.sources.avro-AppSrv-source1.selector.mapping.NY= mem-channel-1 file-channel-2

agent_foo.sources.avro-AppSrv-source1.selector.default= mem-channel-1

1.1.3、flume扩展说明

Flume对source、channel和sink提供了多种实现方式，但因为需求的复杂性，可能这些方式满足不了我们的需求。好在flume提供了灵活的扩展方式，我们通过实现flume提供的接口，就可以方便的实现自定义的source、sink等。具体实现请参考官方源码。

1、 source扩展

public class MySource extendsAbstractSource implements Configurable, PollableSource {

private String myProp;

@Override

public void configure(Context context) {

String myProp = context.getString("myProp","defaultValue");

// Process the myProp value (e.g. validation, convert to another type,...)

// Store myProp for later retrieval by process() method

this.myProp = myProp;

}

@Override

public void start() {

// Initialize the connection to the external client

}

@Override

public void stop () {

// Disconnect from external client and do any additional cleanup

// (e.g. releasing resources or nulling-out field values) ..

}

@Override

public Status process() throws EventDeliveryException {

Status status = null;

try {

// This try clause includes whatever Channel/Event operations you wantto do

// Receive new data

Event e = getSomeData();

// Store the Event into this Source's associated Channel(s)

getChannelProcessor().processEvent(e);

status = Status.READY;

} catch (Throwable t) {

// Log exception, handle individual exceptions as needed

status = Status.BACKOFF;

// re-throw all Errors

if (t instanceof Error) {

throw (Error)t;

}

} finally {

txn.close();

}

return status;

}

2、 sink扩展

public class MySink extendsAbstractSink implements Configurable {

private String myProp;

@Override

public void configure(Context context) {

String myProp = context.getString("myProp","defaultValue");

// Process the myProp value (e.g. validation)

// Store myProp for later retrieval by process() method

this.myProp = myProp;

}

@Override

public void start() {

// Initialize the connection to the external repository (e.g. HDFS) that

// this Sink will forward Events to ..

}

@Override

public void stop () {

// Disconnect from the external respository and do any

// additional cleanup (e.g. releasing resources or nulling-out

// field values) ..

}

@Override

public Status process() throws EventDeliveryException {

Status status = null;

// Start transaction

Channel ch = getChannel();

Transaction txn = ch.getTransaction();

txn.begin();

try {

// This try clause includes whatever Channel operations you want to do

Event event = ch.take();

// Send the Event to the external repository.

// storeSomeData(e);

txn.commit();

status = Status.READY;

} catch (Throwable t) {

txn.rollback();

// Log exception, handle individual exceptions as needed

status = Status.BACKOFF;

// re-throw all Errors

if (t instanceof Error) {

throw (Error)t;

}

return status;

}

3、 Interceptor扩展

public class MyInterceptor implementsInterceptor {

@Override

public void initialize() {

}

/**

* Modifies events in-place.

@Override

public Event intercept(Event event) {

}

/**

* Delegates to {@link #intercept(Event)} ina loop.

* @param events

* @return

@Override

public List<Event>intercept(List<Event> events) {

}

@Override

public void close() {

// no-op

}

/**

* Builder which builds new instances of theHostInterceptor.

public static class Builder implementsInterceptor.Builder {

@Override

publicvoid configure(Context context) {

}

@Override

publicInterceptor build() {

returnnew MyInterceptor();

}

4、 RPC clients – Avro

通过Avro client可以实现将日志信息直接发送到flume，前提是flume的source端收集日志方式需要配置为Avro方式。示例如下：

public class MyApp {

public static void main(String[] args) {

MyRpcClientFacade client = new MyRpcClientFacade();

// Initialize client with the remote Flume agent's host and port

client.init("host.example.org", 41414);

// Send 10 events to the remote Flume agent. That agent should be

// configured to listen with an AvroSource.

String sampleData = "Hello Flume!";

for (int i = 0; i < 10; i++) {

client.sendDataToFlume(sampleData);

}

client.cleanUp();

}

class MyRpcClientFacade {

private RpcClient client;

private String hostname;

private int port;

public void init(String hostname, int port) {

// Setup the RPC connection

this.hostname = hostname;

this.port = port;

this.client = RpcClientFactory.getDefaultInstance(hostname, port);

// Use the following method to create a thrift client (instead of theabove line):

// this.client = RpcClientFactory.getThriftInstance(hostname, port);

}

public void sendDataToFlume(String data) {

// Create a Flume Event object that encapsulates the sample data

Event event = EventBuilder.withBody(data,Charset.forName("UTF-8"));

// Send the event

try {

client.append(event);

} catch (EventDeliveryException e) {

// clean up and recreate the client

client.close();

client = null;

client = RpcClientFactory.getDefaultInstance(hostname, port);

// Use the following method to create a thrift client (instead of theabove line):

// this.client = RpcClientFactory.getThriftInstance(hostname, port);

}

public void cleanUp() {

// Close the RPC connection

client.close();

}

1.2、Elasticsearch

1.2.1、Elasticsearch介绍

1）Elasticsearch介绍

Elasticsearch是一个实时的分布式搜索和分析引擎。它可以帮助你用前所未有的速度去处理大规模数据。它可以用于全文搜索，结构化搜索以及分析，当然你也可以将这三者进行组合。

Elasticsearch是一个建立在全文搜索引擎Apache Lucene™基础上的搜索引擎，可以说Lucene是当今最先进，最高效的全功能开源搜索引擎框架。

Elasticsearch使用Lucene作为内部引擎，但是在使用它做全文搜索时，只需要使用统一开发好的API即可，而不需要了解其背后复杂的Lucene的运行原理。

当然Elasticsearch并不仅仅是Lucene这么简单，它不但包括了全文搜索功能，还可以进行以下工作:

1、分布式实时文件存储，并将每一个字段都编入索引，使其可以被搜索；

2、实时分析的分布式搜索引擎；

3、可以扩展到上百台服务器，处理PB级别的结构化或非结构化数据；

这么多的功能被集成到一台服务器上，你可以轻松地通过客户端或者任何你喜欢的程序语言与ES的RESTful API进行交流。

2）Elasticsearch优点

1、 Elasticsearch是分布式的。不需要其他组件，分发是实时的，被叫做”Push replication”；

2、Elasticsearch完全支持 Apache Lucene 的接近实时的搜索；

3、处理多租户（multitenancy）不需要特殊配置，而Solr则需要更多的高级设置；

4、Elasticsearch采用 Gateway 的概念，使得完备份更加简单；

5、各节点组成对等的网络结构，某些节点出现故障时会自动分配其他节点代替其进行工作。

3）Elasticsearch使用案例

1、维基百科使用Elasticsearch来进行全文搜做并高亮显示关键词，以及提供search-as-you-type、did-you-mean等搜索建议功能；

2、英国卫报使用Elasticsearch来处理访客日志，以便能将公众对不同文章的反应实时地反馈给各位编辑；

3、StackOverflow将全文搜索与地理位置和相关信息进行结合，以提供more-like-this相关问题的展现；

4、GitHub使用Elasticsearch来检索超过1300亿行代码；

5、每天，GoldmanSachs使用它来处理5TB数据的索引，还有很多投行使用它来分析股票市场的变动。

但是Elasticsearch并不只是面向大型企业的，它还帮助了很多类似DataDog以及Klout的创业公司进行了功能的扩展。

1.2.2、Elasticsearch配置说明

1、索引模板（Index templates）

索引可使用预定义的模板进行创建,这个模板称作Indextemplates。模板设置包括settings和mappings，通过模式匹配的方式使得多个索引重用一个模板。

1）定义模板：

curl -XPUT localhost:9200/_template/template_1 -d '

{

"template" :"te*",

"settings" : {

"number_of_shards" :5

"mappings" : {

"type1" : {

"_source" :{"enabled" : false },

"properties": { }

}

说明：上述定义的模板template_1将对用te开头的新索引都是有效。

模板中也可以包含别别名的定义，如下：

curl -XPUT localhost:9200/_template/template_1 -d '

{

"template" :"te*",

"settings" : {

"number_of_shards" :1

"aliases" : {

"alias1" : {},

"alias2" : {

"filter" : {

"term":{"user" : "kimchy" }

"routing":"kimchy"

"{index}-alias" : {}

}

2）删除模板

curl -XDELETElocalhost:9200/_template/template_1

3）查看模板

curl -XGET localhost:9200/_template/template_1

4）模板配置文件

除了以上方式，索引模板也可以在文件中进行配置。索引模板的配置文件需要在每个主节点的config目录下，目录结构为：config/templates/template_1.json,

template_1.json的示例如下：

{

"template-logstash" : {

"template" :"logstash*",

"settings" : {

"index.number_of_shards" :5,

"number_of_replicas" :1,

"index" : {

"store" : {

"compress" :{

"stored" : true,

"tv": true

}

"mappings" : {

"_default_" : {

"properties" : {

"dynamic" :"true",

"loadbalancer" : {

"_source" : {

"compress" : true,

"_ttl" : {

"enabled" : true,

"default" :"10d"

"_all" : {

"enabled" : false

"properties" : {

"@fields" : {

"dynamic" :"true",

"properties" : {

"client" : {

"type" :"string",

"index" :"not_analyzed"

"domain" : {

"type" :"string",

"index" :"not_analyzed"

"oh" : {

"type" :"string",

"index" :"not_analyzed"

"responsetime" :{

"type" :"double",

"size" : {

"type" :"long",

"index" :"not_analyzed"

"status" : {

"type" :"string",

"index" :"not_analyzed"

"upstreamtime" :{

"type" :"double",

"url" : {

"type" :"string",

"index" :"not_analyzed"

}

"@source" : {

"type" :"string",

"index" :"not_analyzed"

"@timestamp" : {

"type" :"date",

"format" :"dateOptionalTime"

"@type" : {

"type" :"string",

"index" :"not_analyzed",

"store" :"no"

}

5）_source字段说明

_source字段是自动生成的，以JSON格式存储索引文件。_source字段没有建索引，所以不可搜索。当执行“get”或者“search”操作时，默认会返回_source字段。

_source字段消耗性能，所以可以屏蔽（disable）掉。例如：

{

"tweet":{

"_source":{"enabled":false}

}

enabale:false的情况下，默认检索只返回ID。

如果觉得enabale:true时，索引的膨涨率比较大的情况下可以通过下面一些辅助设置进行优化：

Compress:是否进行压缩，建议一般情况下将其设为true

“includes” : ["author", "name"],

“excludes” : ["sex"]

上面的includes和 excludes主要是针对默认情况下面_source一般是保存全部Bulk过去的数据，我们可以通过include,excludes在字段级别上做出一些限索。

2、环境搭建

2.1、flume安装

1）下载apache-flume-1.6.0-bin.tar.gz

官网下载地址：http://flume.apache.org/download.html

2）安装flume

1、将软件包拷贝到服务器上，如/root/yiran目录下；

2、解压软件包，命令：tar –xvf apache-flume-1.6.0-bin.tar.gz；

3、配置*.conf文件，如example.conf，放在/root/yiran/apache-flume-1.6.0-bin/conf目录下，内容示例如下：

# Name the components on this agent

a1.sources = r1

a1.sinks = k1

a1.channels = c1

# Describe/configure the source

a1.sources.r1.type = netcat

a1.sources.r1.bind = localhost

a1.sources.r1.port = 44444

# Describe the sink

#a1.sinks.k1.type = logger

a1.sinks. k1.type = elasticsearch

a1.sinks. k1.hostNames =127.0.0.1:9300

a1.sinks. k1.indexName = test_index

a1.sinks. k1.indexType = test_type_1

a1.sinks. k1.clusterName = vie61_yanshi

a1.sinks. k1.batchSize = 10

a1.sinks. k1.ttl = 5d

a1.sinks. k1.serializer =org.apache.flume.sink.elasticsearch.ElasticSearchDynamicSerializer

# Use a channel which buffers eventsin memory

a1.channels.c1.type = memory

a1.channels.c1.capacity = 1000

a1.channels.c1.transactionCapacity =100

# Bind the source and sink to thechannel

a1.sources.r1.channels = c1

a1.sinks.k1.channel = c1

4、配置flume-env.sh文件，位于/root/yiran/apache-flume-1.6.0-bin/conf目录下，操作如下：

mv flume-env.sh.template flume-env.sh

vi flume-env.sh

修改JAVA_HOME

JAVA_HOME=/usr/java/jdk1.7.0_71

JAVA_OPTS="-Xms512m–Xmx1024m-Dcom.sun.management.jmxremote"

5、从elasticsearch安装包下将写入elasticsearch需要的核心包拷贝到/root/yiran/apache-flume-1.6.0-bin/lib（当收集的日志需要写入es时才进行第5步操作，否则不用进行第5步）；

elasticsearch-1.6.2.jar；

lucene-core-4.10.4.jar；

3）启动flume

完成上述5个步骤后，则安装完成。然后可以启动flume，命令如下：

cd /root/yiran/apache-flume-1.6.0-bin;

./bin/flume-ng agent --conf conf--conf-file conf/example.conf --name a1 -Dflume.root.logger=INFO,console &;

说明：--conf指明配置文件目录名称；--conf-file指明要运行的配置文件；--name指明agent名称，保持与*.conf配置文件里面命名一致；-Dflume.root.logger指明日志打印级别；

2.2、Elasticsearch安装

1）下载elasticsearch-1.6.2.tar.gz

官网下载地址：https://www.elastic.co/downloads/elasticsearch；

2）安装elasticsearch

1、将软件包拷贝到服务器上，如/root/yiran目录下；

2、解压软件包，命令：tar –xvf elasticsearch-1.6.2.tar.gz；

3、修改/root/yiran/elasticsearch-1.6.2/config下elasticsearch.yml文件：

如将node.name的值设置为“test-node”，表示当前这个es服务节点名字为test-node；

修改cluster.name的值为vie61_yanshi；

修改network.host为本机ip；

4、根据需要可配置/root/yiran/elasticsearch-1.6.2/bin目录下elasticsearch.in.sh文件；

3）启动elasticsearch

完成上述4个步骤后，则安装完成。然后可以启动elasticsearch，命令如下：

nohup ./elasticsearch &

2.3、kibana安装

1）下载kibana-4.1.10-linux-x64.tar.gz

官网下载地址：https://www.elastic.co/downloads/past-releases

2）安装kibana

1、将软件包拷贝到服务器上，如/root/yiran目录下；

2、解压软件包，命令：tar –xvf kibana-4.1.10-linux-x64.tar.gz；

3、修改/root/yiran/kibana-4.1.10-linux-x64/config目录下kibana.yml文件；

修改host为本机ip;

修改elasticsearch_url为要访问的elasticsearch的地址，如：http://localhost:9200；

3）启动kibana

完成上述3个步骤后，则安装完成。然后可以启动kibana，命令如下：

nohup ./kibana &

3、实例演示

3.1、实例说明

日志记录方式用logback，示例展示了将tomcat访问日志和用户行为日志收集到flume，然后由flume将日志信息写入到elasticsearch中，最后通过kibana对数据进行统计分析。架构图如下：

3.2、flume配置

环境说明：

1）用logback记录日志；

2）部署两套flume环境，本地flume环境命名为agent1（ip:192.168.27.73），远程flume环境命名为agent3（ip:192.168.77.113）；

3） Agent1收集日志数据，然后将数据通过Avro方式传递到agent3，最后agent3将日志数据写入到elasticsearch；

4） Source1收集用户行为日志，source2收集tomcat访问日志；

5）安装步骤见“2.1节flume安装”，*.conf文件见下面配置说明。

配置说明：

1） agent3配置文件命名为sink_to_es.conf，内容如下：

#定义agent各组件名称

agent1.sources = source1 source2

agent1.sinks = sink1 sink2

agent1.channels = channel1 channel2

#定义source1配置信息，监听44444端口

agent1.sources.source1.type = avro

agent1.sources.source1.bind =192.168.77.113

agent1.sources.source1.port = 44444

#定义source2配置信息，监听33333端口

agent1.sources.source2.type = avro

agent1.sources.source2.bind =192.168.77.113

agent1.sources.source2.port = 33333

#定义sink1的配置信息，将信息写入到es中，索引名为test_index，索引类型为test_type_1，集群名为vie61_yanshi

agent1.sinks.sink1.type =elasticsearch

agent1.sinks.sink1.hostNames =127.0.0.1:9300

agent1.sinks.sink1.indexName =test_index

agent1.sinks.sink1.indexType =test_type_1

agent1.sinks.sink1.clusterName =vie61_yanshi

agent1.sinks.sink1.batchSize = 10

agent1.sinks.sink1.ttl = 5d

agent1.sinks.sink1.serializer =org.apache.flume.sink.elasticsearch.ElasticSearchDynamicSerializer

#定义sink2的配置信息，将信息写入到es中，索引名为test_index_tomcat，索引类型为test_type_1，集群名为vie61_yanshi

agent1.sinks.sink2.type =elasticsearch

agent1.sinks.sink2.hostNames =127.0.0.1:9300

agent1.sinks.sink2.indexName =test_index_tomcat

agent1.sinks.sink2.indexType =test_type_1

agent1.sinks.sink2.clusterName =vie61_yanshi

agent1.sinks.sink2.batchSize = 10

agent1.sinks.sink2.ttl = 5d

agent1.sinks.sink2.serializer =org.apache.flume.sink.elasticsearch.ElasticSearchDynamicSerializer

#定义channel1的配置信息，信息存储在缓存中

agent1.channels.channel1.type =memory

agent1.channels.channel1.capacity =1000

agent1.channels.channel1.transactionCapacity= 100

#定义channel2的配置信息，信息存储在缓存中

agent1.channels.channel2.type =memory

agent1.channels.channel2.capacity =1000

agent1.channels.channel2.transactionCapacity= 100

#将source1 and sink1 绑定到 channel1

agent1.sources.source1.channels =channel1

agent1.sinks.sink1.channel = channel1

#将source2 and sink2 绑定到 channel2

agent1.sources.source2.channels =channel2

agent1.sinks.sink2.channel = channel2

2） agent1配置文件命名为sink_to_agent.conf，提供两种实现方式，任选其中一种即可。

第一种：通过读取生成的日志文件，将其中的内容收集到flume。假设tomcat访问日志路径为/root/yiran/apache-tomcat-8.0.36/logs/tomcat，用户操作日志路径为/root/yiran/apache-tomcat-8.0.36/logs/operator。内容如下：

#定义agent各组件名称

agent1.sources= source1 source2

agent1.sinks= sink1 sink2

agent1.channels= channel1 channel2

#定义source1配置信息，读取operator文件夹下日志内容，只接受“^@@.*@@$”格式的消息，其他消息过滤掉

agent1.sources.source1.type= org.apache.flume.source.SpoolDirectoryTailFileSource

agent1.sources.source1.spoolDir= /root/yiran/apache-tomcat-8.0.36/logs/operator

agent1.sources.source1.fileSuffix= .COMPLETED

agent1.sources.source1.deletePolicy= never

agent1.sources.source1.ignorePattern= ^$

agent1.sources.source1.targetPattern= .*(\\d){4}-(\\d){2}-(\\d){2}.*

agent1.sources.source1.targetFilename= yyyy-MM-dd

agent1.sources.source1.trackerDir= .flumespooltail

agent1.sources.source1.consumeOrder= oldest

agent1.sources.source1.batchSize= 100

agent1.sources.source1.inputCharset= UTF-8

agent1.sources.source1.decodeErrorPolicy= REPLACE

agent1.sources.source1.deserializer= LINE

agent1.sources.source1.interceptors= i1 i2

agent1.sources.source1.interceptors.i1.type= regex_filter

agent1.sources.source1.interceptors.i1.regex= ^@@.*@@$

agent1.sources.source1.interceptors.i2.type= org.apache.flume.interceptor.JsonParseInterceptor$Builder

agent1.sources.source1.interceptors.i2.splitFlag=@@

#定义source2配置信息，读取tomcat文件夹下日志内容，只接受“^@@.*@@$”格式的消息，其他消息过滤掉

agent1.sources.source2.type= org.apache.flume.source.SpoolDirectoryTailFileSource

agent1.sources.source2.spoolDir= /root/yiran/apache-tomcat-8.0.36/logs/tomcat

agent1.sources.source2.fileSuffix= .COMPLETED

agent1.sources.source2.deletePolicy= never

agent1.sources.source2.ignorePattern= ^$

agent1.sources.source2.targetPattern= .*(\\d){4}-(\\d){2}-(\\d){2}.*

agent1.sources.source2.targetFilename= yyyy-MM-dd

agent1.sources.source2.trackerDir= .flumespooltail

agent1.sources.source2.consumeOrder= oldest

agent1.sources.source2.batchSize= 100

agent1.sources.source2.inputCharset= UTF-8

agent1.sources.source2.decodeErrorPolicy= REPLACE

agent1.sources.source2.deserializer= LINE

agent1.sources.source2.interceptors= i3 i4

agent1.sources.source2.interceptors.i3.type= regex_filter

agent1.sources.source2.interceptors.i3.regex= ^@@.*@@$

agent1.sources.source2.interceptors.i4.type= org.apache.flume.interceptor.JsonParseInterceptor$Builder

agent1.sources.source2.interceptors.i4.splitFlag=@@

#定义sink1的配置信息，将信息写入到agent3中的source1中

agent1.sinks.sink1.type= avro

agent1.sinks.sink1.hostname= 192.168.77.113

agent1.sinks.sink1.port= 44444

#定义sink2的配置信息，将信息写入到agent3中的source2中

agent1.sinks.sink2.type= avro

agent1.sinks.sink2.hostname= 192.168.77.113

agent1.sinks.sink2.port= 33333

#定义channel1的配置信息，信息存储在缓存中

agent1.channels.channel1.type= memory

agent1.channels.channel1.capacity= 1000

agent1.channels.channel1.transactionCapacity= 100

#定义channel2的配置信息，信息存储在缓存中

agent1.channels.channel2.type= memory

agent1.channels.channel2.capacity= 1000

agent1.channels.channel2.transactionCapacity= 100

#将source1 andsink1 绑定到 channel1

agent1.sources.source1.channels= channel1

agent1.sinks.sink1.channel= channel1

#将source2 andsink2 绑定到 channel2

agent1.sources.source2.channels= channel2

agent1.sinks.sink2.channel= channel2

说明：

1）agent1中定义了两个拦截器，第一个类型为“regex_filter”，是系统自带的拦截器，作用是只接受“regex”正则表达式定义格式的消息，如“^@@.*@@$”。第二个类型为org.apache.flume.interceptor.JsonParseInterceptor$Builder，是一个自定义的拦截器，作用是解析接收到的json格式的内容，其内容分割标志默认为：@@，可以根据格式自行定义。实现如下，详细内容见（flume-ng-core-extend工程）：

public classJsonParseInterceptor implements Interceptor {

private staticfinalLogger logger= LoggerFactory.getLogger(JsonParseInterceptor.class);

private finalString splitFlag;

/**

* Only {@link HostInterceptor.Builder} can build me

private JsonParseInterceptor(StringsplitFlag) {

this.splitFlag=splitFlag;

}

@Override

public voidinitialize() {

}

/**

* Modifies events in-place.

@Override

public Event intercept(Eventevent) {

Stringbody= newString(event.getBody(),Charsets.UTF_8);

String[] bodyArray = body.split(splitFlag);

logger.info("body content: " +body +" length:" +bodyArray.length);

if(bodyArray !=null&&bodyArray.length > 1){

Stringcontent= bodyArray[1]; //取得body内容，为json格式数据

try{

Map<String,String> jsonMap=(Map<String, String>) JSON.parse(content);

event.setHeaders(jsonMap);

event.setBody(null);

logger.info("body content: "+jsonMap);

}catch(Exceptione){

logger.error("Could not parse json data, ",e);

return null;

}

}else{

return null;

}

returnevent;

}

/**

* Delegates to {@link #intercept(Event)} in a loop.

* @param events

* @return

@Override

public List<Event> intercept(List<Event>events) {

List<Event> out = Lists.newArrayList();

for (Eventevent :events) {

Event outEvent = intercept(event);

if (outEvent !=null){

out.add(outEvent);

}

returnout;

}

@Override

public voidclose() {

// no-op

}

/**

* Builder which builds new instances of theHostInterceptor.

public staticclassBuilder implementsInterceptor.Builder {

private StringsplitFlag =null;

@Override

public void configure(Context context) {

splitFlag = context.getString("splitFlag",Constants.DEFAULT_SPLITFLAG);

}

@Override

public Interceptor build() {

// TODO Auto-generated method stub

return new JsonParseInterceptor(splitFlag);

}

public staticclassConstants {

public static String DEFAULT_SPLITFLAG ="@@";

}

3） agent1中使用了自定义的Source（SpoolDirectoryTailFileSource，详见flume-ng-core-extend工程），此source将exec和spoolingsource结合起来，可以实现如下效果：

l 监控指定文件夹下的日志文件；

l 实时读取日志文件的内容；

l 实现断点续传功能，即flume发生故障后，重启会从故障前的位置继续读取日志内容；

第二种：扩展logback接口，通过Avroclient将日志信息传递到flume。内容如下：

#定义agent各组件名称

agent1.sources = source1 source2

agent1.sinks = sink1 sink2

agent1.channels = channel1 channel2

#定义source1配置信息，监听44444端口，只接受“^@@.*@@$”格式的消息，其他消息过滤掉

agent1.sources.source1.type = avro

agent1.sources.source1.bind =192.168.77.113

agent1.sources.source1.port = 44444

agent1.sources.source1.interceptors =i1 i2

agent1.sources.source1.interceptors.i1.type= regex_filter

agent1.sources.source1.interceptors.i1.regex= ^@@.*@@$

agent1.sources.source1.interceptors.i2.type= org.apache.flume.interceptor.JsonParseInterceptor$Builder

#定义source2配置信息，监听33333端口，只接受“^@@.*@@$”格式的消息，其他消息过滤掉

agent1.sources.source2.type = avro

agent1.sources.source2.bind =192.168.77.113

agent1.sources.source2.port = 33333

agent1.sources.source2.interceptors =i3 i4

agent1.sources.source2.interceptors.i3.type= regex_filter

agent1.sources.source2.interceptors.i3.regex= ^@@.*@@$

agent1.sources.source2.interceptors.i4.type= org.apache.flume.interceptor.JsonParseInterceptor$Builder

#定义sink1的配置信息，通过Avro方式将信息写入到agent3的source1中

agent1.sinks.sink1.type = avro

agent1.sinks.sink1.hostname =192.168.77.113

agent1.sinks.sink1.port = 44444

#定义sink2的配置信息，通过Avro方式将信息写入到agent3的source2中

agent1.sinks.sink2.type = avro

agent1.sinks.sink2.hostname =192.168.77.113

agent1.sinks.sink2.port = 33333

#定义channel1的配置信息，信息存储在缓存中

agent1.channels.channel1.type =memory

agent1.channels.channel1.capacity =1000

agent1.channels.channel1.transactionCapacity= 100

#定义channel2的配置信息，信息存储在缓存中

agent1.channels.channel2.type =memory

agent1.channels.channel2.capacity =1000

agent1.channels.channel2.transactionCapacity= 100

#将source1 and sink1 绑定到 channel1

agent1.sources.source1.channels =channel1

agent1.sinks.sink1.channel = channel1

#将source2 and sink2 绑定到 channel2

agent1.sources.source2.channels =channel2

agent1.sinks.sink2.channel = channel2

说明：

1、拦截器的定义见第一种方式里面说明；

2、此种方式需要扩展logback，核心类如下，详细内容见（logback-monitor工程）：

public classFlumeRpcAppender<E> extends OutputStreamAppender<E> {

protected Stringconfigfile;

public static final String CLIENT_APPNAME="client.appname";

public static final String POOL_SIZE="pool.size";

private final Properties props = new Properties();

RpcClientclient;

@Override

public void start() {

genProps();

client = RpcClientFactory.getInstance(props);

Map<String,String>header= configHeader(props);

int poolSize = 10;

StringstrPoolSize= props.getProperty(POOL_SIZE);

if(strPoolSize !=null && !strPoolSize.trim().isEmpty() ){

poolSize = Integer.parseInt(strPoolSize.trim());

}

ExecutorServiceexecutorService= Executors.newFixedThreadPool(poolSize);

setOutputStream(new FlumeRpcOutputStream(client,header,executorService));

super.start();

}

private void genProps(){

if(this.configfile!=null){

ClassLoadercl =FlumeRpcAppender.class.getClassLoader();

try {

props.load(cl.getResourceAsStream(getConfigfile()));

}catch(IOExceptione) {

e.printStackTrace();

}

}else{

throw new RuntimeException("configfile is null.");

}

privateMap<String,String> configHeader(Propertiesprops){

Map<String,String>header=newHashMap<String,String>();

header.put(CLIENT_APPNAME,(String)props.remove(CLIENT_APPNAME));

return header;

}

@Override

public void stop() {

client.close();

super.stop();

}

/**

* @return theconfigfile

public String getConfigfile(){

return configfile;

}

/**

* @param configfile theconfigfile to set

public void setConfigfile(String configfile) {

this.configfile =configfile;

}

3、用户行为日志会通过Avro的方式发送到source1的44444端口，集成用户行为日志需要做如下配置：

1）在web应用中引入logback-monitor工程，通过maven方式或将logback-monitor打包为jar放入web应用lib目录下；

2）将logback.xml和logback-monitor.properties文件放到web应用class目录。具体内容如下：

logback.xml内容：

<?xml version="1.0"encoding="UTF-8"?>

<statusListenerclass="ch.qos.logback.core.status.OnConsoleStatusListener" />

<appendername="STDOUT"class="ch.qos.logback.core.ConsoleAppender">

<pattern>%d{HH:mm:ss.SSS}[%thread] %-5level %logger{56} - %msg%n</pattern>

</encoder>

</appender>

<appendername="FILE"

class="ch.qos.logback.core.rolling.RollingFileAppender">

<rollingPolicyclass="ch.qos.logback.core.rolling.TimeBasedRollingPolicy">

<fileNamePattern>/root/yiran/apache-tomcat-8.0.36/logs/operator/operator.%d{yyyy-MM-dd}.log</fileNamePattern>

</rollingPolicy>

</encoder>

</appender>

<appendername="flumeRpcAppender"

class="com.logback.core.FlumeRpcAppender">

</encoder>

<configfile>logback-monitor.properties</configfile>

</appender>

<rootlevel="INFO">

<appender-ref ref="FILE"/>

<appender-refref="flumeRpcAppender"/>

</root>

</configuration>

说明：

1）msg格式示例：@@{"userName":"王五", "sex": "女","age": "23","createDate":"2016-07-28T23:23:32.999+0800"}@@；

2）应用程序在写入日志时要按照msg消息示例格式进行日志记录。

logback-monitor.properties内容：

client.appname=something

client.type=DEFAULT_FAILOVER

hosts=h1

hosts.h1=127.0.0.1:44444

pool.size=20

说明：如有多个flume接收端，可以如下配置。通过配置client.type可以实现负载均衡（default_loadbalance）或失败重发（default_failover）。

host2=h1 h2 h3

hosts.h1=127.0.0.1:44444

hosts.h2=127.0.0.1:44444

hosts.h3=127.0.0.1:44444

4、tomcat访问日志会通过Avro的方式发送到source2的33333端口，集成tomcat访问日志需要做如下配置：

1）配置logback-monitor.properties文件，将其放入到D:\server2\apache-tomcat-8.0.36\lib目录下，内容如下：

client.appname=something

client.type=DEFAULT_FAILOVER

hosts=h1

hosts.h1=127.0.0.1:33333

pool.size=5

2）配置logback-access.xml文件，将其放入到D:\server2\apache-tomcat-8.0.36\conf目录下，内容如下：

<?xml version="1.0"encoding="UTF-8"?>

<statusListenerclass="ch.qos.logback.core.status.OnConsoleStatusListener" />

<appendername="STDOUT" class="ch.qos.logback.core.ConsoleAppender">

<pattern>@@{"createTime":"%t{yyyy-MM-ddHH:mm:ss.SSSZ}", "remoteIP":"%h","status":"%s", "contentLength":"%b","duration":"%D", "requestURL":"%r",

"logType":"tomcatLog"}@@</pattern>

</encoder>

</appender>

<appendername="flumeRpcAppender"

class="com.logback.core.FlumeRpcAppender">

<pattern>@@{"createTime":"%t{yyyy-MM-dd'T'HH:mm:ss.SSSZ}","remoteIP":"%h", "status":"%s","contentLength":"%b", "duration":"%D","requestURL":"%r","logType":"tomcatLog"}@@</pattern>

</encoder>

<configfile>logback-monitor.properties</configfile>

</appender>

<appendername="FILE"

class="ch.qos.logback.core.rolling.RollingFileAppender">

<rollingPolicyclass="ch.qos.logback.core.rolling.TimeBasedRollingPolicy">

<fileNamePattern>/root/yiran/apache-tomcat-8.0.36/logs/tomcat/tomcat_access.%d{yyyy-MM-dd}.log</fileNamePattern>

</rollingPolicy>

<pattern>@@{"createTime":"%t{yyyy-MM-dd'T'HH:mm:ss.SSSZ}","remoteIP":"%h", "status":"%s","contentLength":"%b", "duration":"%D","requestURL":"%r","logType":"tomcatLog"}@@</pattern>

</encoder>

</appender>

<appender-refref="FILE"/>

<appender-ref ref="flumeRpcAppender"/>

</configuration>

3）将[\部署配置文件\lib]目录下的jar包复制到D:\server2\apache-tomcat-8.0.36\lib目录下：

4）修改tomcat中server.xml下最后一行，添加如下配置：

<ValveclassName="ch.qos.logback.access.tomcat.LogbackValve"quiet="true"/>

通过上述步骤，则完成flume配置，然后启动agent1和agent3，命令如下：

启动agent1：

cd C:\Users\qxb-810\Desktop\flume_study\apache-flume-1.6.0-bin;

bin\flume-ng.cmd agent --conf conf--conf-file conf\sink_to_agent.conf --name agent1 &

启动agent3：

cd /root/yiran/apache-flume-1.6.0-bin;

./bin/flume-ng agent --conf conf --conf-fileconf/sink_to_es.conf --name agent1 &

3.3、Elasticsearch配置

环境说明：

1）根据“2.2节Elasticsearch安装”说明完成安装；

2）用crul方式创建索引模板，

用户操作日志模板示例如下：

curl -XPUTlocalhost:9200/_template/template_flume_log -d '

{

"template" : "test_index-*",

"settings" : {

"number_of_shards" : 5

"mappings" : {

"test_type_1" : {

"_source" :{"enabled" : true },

"properties": {

"createDate": {

"type" : "date",

"date_format":"date_hour_minute_second_millis"

"sex": {

"type" : "string"

"userName": {

"type" : "string"

"age": {

"type" : "long"

}

Tomcat访问日志模板如下：

curl -XPUTlocalhost:9200/_template/template_flume_tomcat_log -d '

{

"template" : "test_index_tomcat-*",

"settings" : {

"number_of_shards" : 5

"mappings" : {

"test_type_1" : {

"_source" : {"enabled": true },

"properties": {

"createTime": {

"type" : "date",

"date_format":"date_hour_minute_second_millis"

"remoteIP": {

"type" : "string"

"status": {

"type" : "long"

"contentLength": {

"type" : "string"

"duration": {

"type" : "long"

"requestURL": {

"type" : "string"

"logType": {

"type" : "string"

}

3.4、Kibana配置

环境说明：

1）根据“2.3节Kibana安装”完成Kibana安装；

2）访问http://192.168.77.113:5601/，完成kibana索引配置；

4、常见问题说明

1、elasticsearch启动不成功

原因：很大可能是因为jdk版本不对

解决：更新jdk版本

2、flume向elasticsearch写数据，flume启动报：No nodeavailable and cluster/nodes/info] request_id [0] timed out

原因：1、可能是节点配置信息有误，检查节点配置信息，确认没有问题重启es;

2、检查elasticsearch的jdk版本与应用的jdk版本是否一致；

3、tail 断点续传的问题

可以在 tail 传的时候记录行号，下次再传的时候，取上次记录的位置开始传输，类似：

agent1.sources.avro-source1.command =/usr/local/bin/tail -n +$(tail -n1/home/storm/tmp/n) --max-unchanged-stats=600 -F /home/storm/tmp/id.txt | awk'ARNGIND==1{i=$0;next}{i++; if($0~/文件已截断/)i=0; print i>> "/home/storm/tmp/n";print $1"---"i}' /home/storm/tmp/n-

需要注意如下几点：

1）文件被 rotation 的时候，需要同步更新你的断点记录“指针”；

2）需要按文件名来追踪文件；

3）flume 挂掉后需要累加断点续传“指针”；

4）flume 挂掉后，如果恰好文件被rotation，那么会有丢数据的风险

只能监控尽快拉起或者加逻辑判断文件大小重置指针；

5）tail 注意你的版本，请更新coreutils 包到最新。

4、flume 报错：java.lang.OutOfMemoryError:GC overhead limit exceeded

解决：Flume 启动时的最大堆内存大小默认是 20M，线上环境很容易 OOM，因此需要你在 flume-env.sh 中添加 JVM 启动参数:

JAVA_OPTS="-Xms8192m -Xmx8192m -Xss256k-Xmn2g -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:-UseGCOverheadLimit"

然后在启动 agent 的时候一定要带上 -c conf 选项，否则 flume-env.sh 里配置的环境变量不会被加载生效。

5、关于elasticsearch和kibana的时区和日期问题

原因：elasticsearch原生支持date类型，json格式通过字符来表示date类型。date类型是包含时区信息的，如果我们没有在json代表日期的字符串中显式指定时区，对es来说没什么问题，但是如果通过kibana显示es里的数据时，就会出现问题，数据的时间会晚8个小时。因为kibana从es里读取的date类型数据，没有时区信息，kibana会默认当作0时区来解析，但是kibana在通过浏览器展示的时候，会通过js获取当前客户端机器所在的时区，也就是东八区，所以kibana会把从es得到的日期数据减去8小时。这里就会导致kibana经常遇到的“数据时间延迟8小时”的问题。

解决：在往es提交日期数据的时候，直接提交带有时区信息的日期字符串，如：“2016-07-15T12:58:17.136+0800”。