Flume(1.4)+Solr(4.3) Log analysis

最新推荐文章于 2024-07-19 14:02:11 发布

kellyzly

最新推荐文章于 2024-07-19 14:02:11 发布

阅读量1k

点赞数

分类专栏： hadoop 文章标签： solr flume

本文链接：https://blog.csdn.net/kellyzly/article/details/28238329

版权

hadoop 专栏收录该内容

5 篇文章 0 订阅

订阅专栏

Flume: event collector, typical usage is log collection.

Solr: search engine based on Lucene

Function: watch /var/log/a1.new.log file. If new lines append to this file, it will send the event( new lines) to flume source and make index of the event , then send to solr engine. You can quickly search the new event by solr.

Download:

flume 1.4: http://archive.apache.org/dist/flume/1.4.0/apache-flume-1.4.0-bin.tar.gz
solr 4.3: http://archive.apache.org/dist/lucene/solr/4.3.0/

a1.new.log's format like following:

# cat /var/log/a1.new.log
2014-05-29 10:37:56,777 INFO org.apache.hadoop.http.HttpServer: HttpServer.start() threw a non Bind IOException
2014-05-15 19:06:52,373 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1

Configure solr (server-1941/192.168.100.110)

extract solr
1. cd /usr/lib
2. tar zxvf solr-4.3.0.tgz
configure solr cloud
1. cd solr-4.3.0
2. cp -r example node1
3. cd node1
4. vi solr/zoo.cfg uncomment "clientPort=2181"
5. edit /solr/collection1/conf/schema.xml
  1. add following to fields element: 
    <field name="timestamp" type="string" indexed="true" stored="true"/>
    <field name="loglevel" type="string" indexed="true" stored="true"/>
    <field name="classname" type="string" indexed="true" stored="true"/>
    <field name="msg" type="string" indexed="true" stored="true"/>
    
  2. ucomment following lines in /solr/collection1/conf/solrconfig.xml. Why this? please reference " If you want to use the Near Realtime search support, you will probably want to enable auto soft commits in your solrconfig.xml file before putting it into zookeeper.(http://wiki.apache.org/solr/SolrCloud)"
    <autoSoftCommit>
    <maxTime>1000</maxTime>
    </autoSoftCommit>
6. start solr: java -DzkRun -Dbootstrap_confdir=./solr/collection1/conf -Dcollection.configName=myconf -jar start.jar
7. brownse :http://192.168.100.110:8983/solr/#/

Configure flume (server-1941/192.168.100.110)

1. extract apache-flume-1.4.0-bin.tar.gz

cd /usr/lib/
tar zxvf apache-flume-1.4.0-bin.tar.gz

2. edit flume-env.sh

cp conf/flume-env.sh.template conf/flume-env.sh
edit conf/flume-env.sh , add following in flume-env.sh JAVA_OPTS="-Xms256m -Xmx512m"
edit conf/flume-conf-morphlineSolr.properties

a1.channels = c1

a1.sources = r1
a1.sinks = k1
a1.channels.c1.type = memory
a1.sources.r1.channels = c1
a1.sources.r1.type = exec
a1.sources.r1.command = tail -F /var/log/a1.new.log
a1.sinks.k1.channel = c1
a1.sinks.k1.type = org.apache.flume.sink.solr.morphline.MorphlineSolrSink
a1.sinks.k1.morphlineFile = /usr/lib/apache-flume-1.4.0-bin/conf/morphline.conf
a1.channels.MemChannel.type = memory
a1.channels.MemChannel.capacity = 10000
a1.channels.MemChannel.transactionCapacity = 100
edit conf/morphline.conf

morphlines : [
{
    # Name used to identify a morphline. E.g. used if there are multiple
    # morphlines in a morphline config file
    id : morphline1

    # Import all morphline commands in these java packages and their
    # subpackages. Other commands that may be present on the classpath are
    # not visible to this morphline.
    importCommands : ["com.cloudera.**", "org.apache.solr.**"]

    commands : [
      {
        # Parse input attachment and emit a record for each input line
        readLine {
          charset : UTF-8
        }
      }

      {
        grok {
          # Consume the output record of the previous command and pipe another
          # record downstream.
          #
          # A grok-dictionary is a config file that contains prefabricated
          # regular expressions that can be referred to by name. grok patterns
          # specify such a regex name, plus an optional output field name.
          # The syntax is %{REGEX_NAME:OUTPUT_FIELD_NAME}
          # The input line is expected in the "message" input field.
          #dictionaryFiles : [src/test/resources/grok-dictionaries]
          dictionaryFiles :[/usr/lib/apache-flume-1.4.0-bin/conf/grok-dictionaries]
          expressions : {
            #message : """%{SYSLOGTIMESTAMP:timestamp} %{SYSLOGHOST:hostname} %{DATA:program}(?:\[%{POSINT:pid}\])?: %{GREEDYDATA:msg}"""
            #message : """%{TIMESTAMP_LOG:timestamp} %{SYSLOGHOST:hostname} %{DATA:program}(?:\[%{POSINT:pid}\])?: %{GREEDYDATA:msg}"""
            message : """%{TIMESTAMP_LOG:timestamp} %{LOGLEVEL:loglevel} %{DATA:classname}: %{GREEDYDATA:msg}"""

          }
        }
      }

      # Consume the output record of the previous command, convert
      # the timestamp, and pipe another record downstream.
      #
      # convert timestamp field to native Solr timestamp format
      # e.g. 2012-09-06T07:14:34Z to 2012-09-06T07:14:34.000Z
      {
        convertTimestamp {
          field : timestamp
          inputFormats : ["yyyy-MM-dd'T'HH:mm:ss.SSS'Z'", "yyyy-MM-dd HH:mm:ss,SSS"]
          #inputFormats : ["yyyy-MM-dd'T'HH:mm:ss.SSS'Z'", "yyyy-MM-dd'T'HH:mm:ss", "yyyy-MM-dd"]
          inputTimezone : America/Los_Angeles
          outputFormat : "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'"
          outputTimezone : UTC
        }

      }

      {
        generateUUID {
           field : id
        }
      }
      # Consume the output record of the previous command, transform it
      # and pipe the record downstream.
      #
      # This command deletes record fields that are unknown to Solr
      # schema.xml. Recall that Solr throws an exception on any attempt to
      # load a document that contains a field that isn't specified in
      # schema.xml.
      {
        sanitizeUnknownSolrFields {
          # Location from which to fetch Solr schema
          solrLocator : {
            collection : collection1       # Name of solr collection
            zkHost : "127.0.0.1:2181/" # ZooKeeper ensemble
          }
        }
      }

      # log the record at INFO level to SLF4J
      { logInfo { format : "output record: {}", args : ["@{}"] } }

      # load the record into a Solr server or MapReduce Reducer
      {
        loadSolr {
          solrLocator : {
            collection : collection1       # Name of solr collection
            zkHost : "127.0.0.1:2181/" # ZooKeeper ensemble
          }
        }
      }
    ]
}
]
start flume
1. cd /usr/lib/apache-flume-1.4.0-bin/
2. start flume
  1. ./bin/flume-ng agent --conf conf --conf-file conf/flume-conf-morphlineSolr.properties --name a1 -Dflume.root.logger=INFO,console
How to test
1. curl -g http://192.168.100.110:8983/solr/collection1/select?q=msg:*hadoop*&wt=xml&indent=true
  <?xml version="1.0" encoding="UTF-8"?>
  <response>
  <lst name="responseHeader"><int name="status">0</int><int name="QTime">4</int><lst name="params"><str name="q">msg:*hadoop</str></lst></lst><result name="response" numFound="0" start="0"></result>
  </response>
2. append new line to /var/log/a1.log
  1. echo "2014-06-03 10:16:52,373 INFO org.apache.hadoop.util.ExitUtil: hadoop will shutdown">>/var/log/a1.new.log
3. curl -g http://192.168.100.110:8983/solr/collection1/select?q=msg:*hadoop*&wt=xml&indent=true
<?xml version="1.0" encoding="UTF-8"?>

<response>

<lst name="responseHeader"><int name="status">0</int><int name="QTime">5</int><lst name="params"><str name="q">msg:*hadoop*</str></lst></lst><result name="response" numFound="1" start="0"><doc><str name="id">63566aed-7438-4c8e-8b02-7f6fa0be85b3</str><str name="timestamp">2014-06-03T17:16:52.373Z</str><str name="msg"> hadoop will shutdown</str><long name="_version_">1469853891281551360</long></doc></result>

</response>

kellyzly

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
11
评论
Flume(1.4)+Solr(4.3) Log analysis

Flume: event collector, typical usage is log collection.Solr: search engine based on LuceneFunction: watch /var/log/a1.new.log file. If new lines append to this file, it will send the event(
复制链接

扫一扫