Flume(1.4)+Solr(4.3) Log analysis

Flume: event collector, typical usage is log collection.
Solr:  search engine based on Lucene

Function: watch  /var/log/a1.new.log file. If new lines append to this file, it will send the event( new lines) to flume source and make index of the event , then send to solr engine.  You can quickly search the new event by solr.

Download:
flume 1.4: http://archive.apache.org/dist/flume/1.4.0/apache-flume-1.4.0-bin.tar.gz
solr 4.3:  http://archive.apache.org/dist/lucene/solr/4.3.0/


a1.new.log's format like following:
# cat  /var/log/a1.new.log
2014-05-29 10:37:56,777 INFO org.apache.hadoop.http.HttpServer: HttpServer.start() threw a non Bind IOException
2014-05-15 19:06:52,373 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1


Configure solr  (server-1941/192.168.100.110)
  1. extract solr 
    1.  cd /usr/lib
    2.  tar zxvf solr-4.3.0.tgz
  2. configure solr cloud
    1. cd solr-4.3.0
    2. cp -r example node1
    3. cd node1
    4. vi solr/zoo.cfg  uncomment  "clientPort=2181"
    5. edit /solr/collection1/conf/schema.xml
      1. add following to fields element:                                                                                                                           <!-- add start because of  flume -->
              <field name="timestamp" type="string" indexed="true" stored="true"/>
              <field name="loglevel" type="string" indexed="true" stored="true"/>
              <field name="classname" type="string" indexed="true" stored="true"/>
              <field name="msg" type="string" indexed="true" stored="true"/>
                <!-- add end in because of flume -->
      2. ucomment following lines in  /solr/collection1/conf/solrconfig.xml.  Why this? please reference " If you want to use the Near Realtime search support, you will probably want to enable auto soft commits in your solrconfig.xml file before putting it into zookeeper.(http://wiki.apache.org/solr/SolrCloud)"
        <autoSoftCommit>
        <maxTime>1000</maxTime>
        </autoSoftCommit>
    6. start solr:  java -DzkRun  -Dbootstrap_confdir=./solr/collection1/conf -Dcollection.configName=myconf -jar start.jar
    7. brownse :http://192.168.100.110:8983/solr/#/ 

Configure flume  (server-1941/192.168.100.110)

  1. cd /usr/lib/
  2. tar zxvf apache-flume-1.4.0-bin.tar.gz
2.  edit flume-env.sh
  1. cp conf/flume-env.sh.template  conf/flume-env.sh
  2. edit   conf/flume-env.sh , add following in flume-env.sh                                                                           JAVA_OPTS="-Xms256m -Xmx512m"
  3. edit conf/flume-conf-morphlineSolr.properties           
            a1.channels = c1
    a1.sources = r1
    a1.sinks = k1
    a1.channels.c1.type = memory
    a1.sources.r1.channels = c1
    a1.sources.r1.type = exec
    a1.sources.r1.command = tail -F /var/log/a1.new.log
    a1.sinks.k1.channel = c1
    a1.sinks.k1.type = org.apache.flume.sink.solr.morphline.MorphlineSolrSink
    a1.sinks.k1.morphlineFile = /usr/lib/apache-flume-1.4.0-bin/conf/morphline.conf
    a1.channels.MemChannel.type = memory
    a1.channels.MemChannel.capacity = 10000
    a1.channels.MemChannel.transactionCapacity = 100

  4. edit conf/morphline.conf
    morphlines : [
      {
        # Name used to identify a morphline. E.g. used if there are multiple
        # morphlines in a morphline config file
        id : morphline1

        # Import all morphline commands in these java packages and their
        # subpackages. Other commands that may be present on the classpath are
        # not visible to this morphline.
        importCommands : ["com.cloudera.**", "org.apache.solr.**"]

        commands : [
          {
            # Parse input attachment and emit a record for each input line
            readLine {
              charset : UTF-8
            }
          }

          {
            grok {
              # Consume the output record of the previous command and pipe another
              # record downstream.
              #
              # A grok-dictionary is a config file that contains prefabricated
              # regular expressions that can be referred to by name. grok patterns
              # specify such a regex name, plus an optional output field name.
              # The syntax is %{REGEX_NAME:OUTPUT_FIELD_NAME}
              # The input line is expected in the "message" input field.
              #dictionaryFiles : [src/test/resources/grok-dictionaries]
              dictionaryFiles :[/usr/lib/apache-flume-1.4.0-bin/conf/grok-dictionaries]
              expressions : {
                #message : """%{SYSLOGTIMESTAMP:timestamp} %{SYSLOGHOST:hostname} %{DATA:program}(?:\[%{POSINT:pid}\])?: %{GREEDYDATA:msg}"""
                #message : """%{TIMESTAMP_LOG:timestamp} %{SYSLOGHOST:hostname} %{DATA:program}(?:\[%{POSINT:pid}\])?: %{GREEDYDATA:msg}"""
                message : """%{TIMESTAMP_LOG:timestamp} %{LOGLEVEL:loglevel} %{DATA:classname}: %{GREEDYDATA:msg}"""

              }
            }
          }

          # Consume the output record of the previous command, convert
          # the timestamp, and pipe another record downstream.
          #
          # convert timestamp field to native Solr timestamp format
          # e.g. 2012-09-06T07:14:34Z to 2012-09-06T07:14:34.000Z
          {
            convertTimestamp {
              field : timestamp
              inputFormats : ["yyyy-MM-dd'T'HH:mm:ss.SSS'Z'", "yyyy-MM-dd HH:mm:ss,SSS"]
              #inputFormats : ["yyyy-MM-dd'T'HH:mm:ss.SSS'Z'", "yyyy-MM-dd'T'HH:mm:ss", "yyyy-MM-dd"]
              inputTimezone : America/Los_Angeles
              outputFormat : "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'"
              outputTimezone : UTC
            }


          }

          {
            generateUUID {
               field : id
            }
          }
          # Consume the output record of the previous command, transform it
          # and pipe the record downstream.
          #
          # This command deletes record fields that are unknown to Solr
          # schema.xml. Recall that Solr throws an exception on any attempt to
          # load a document that contains a field that isn't specified in
          # schema.xml.
          {
            sanitizeUnknownSolrFields {
              # Location from which to fetch Solr schema
              solrLocator : {
                collection : collection1       # Name of solr collection
                zkHost : "127.0.0.1:2181/" # ZooKeeper ensemble
              }
            }
          }

          # log the record at INFO level to SLF4J
          { logInfo { format : "output record: {}", args : ["@{}"] } }

          # load the record into a Solr server or MapReduce Reducer
          {
            loadSolr {
              solrLocator : {
                collection : collection1       # Name of solr collection
                zkHost : "127.0.0.1:2181/" # ZooKeeper ensemble
              }
            }
          }
        ]
      }
    ]
  5. start flume
    1. cd /usr/lib/apache-flume-1.4.0-bin/
    2. start flume
      1.  ./bin/flume-ng  agent --conf conf --conf-file conf/flume-conf-morphlineSolr.properties  --name a1 -Dflume.root.logger=INFO,console
  6. How to test
    1.  curl -g http://192.168.100.110:8983/solr/collection1/select?q=msg:*hadoop*&wt=xml&indent=true
       <?xml version="1.0" encoding="UTF-8"?>
      <response>
      <lst name="responseHeader"><int name="status">0</int><int name="QTime">4</int><lst name="params"><str name="q">msg:*hadoop</str></lst></lst><result name="response" numFound="0" start="0"></result>
      </response>
    2. append new line to /var/log/a1.log
      1.  echo "2014-06-03 10:16:52,373 INFO org.apache.hadoop.util.ExitUtil: hadoop will shutdown">>/var/log/a1.new.log
    3.  curl -g http://192.168.100.110:8983/solr/collection1/select?q=msg:*hadoop*&wt=xml&indent=true 
     <?xml version="1.0" encoding="UTF-8"?>
    <response>
    <lst name="responseHeader"><int name="status">0</int><int name="QTime">5</int><lst name="params"><str name="q">msg:*hadoop*</str></lst></lst><result name="response" numFound="1" start="0"><doc><str name="id">63566aed-7438-4c8e-8b02-7f6fa0be85b3</str><str name="timestamp">2014-06-03T17:16:52.373Z</str><str name="msg"> hadoop will shutdown</str><long name="_version_">1469853891281551360</long></doc></result>
    </response>




    • 0
      点赞
    • 0
      收藏
      觉得还不错? 一键收藏
    • 11
      评论

    “相关推荐”对你有帮助么?

    • 非常没帮助
    • 没帮助
    • 一般
    • 有帮助
    • 非常有帮助
    提交
    评论 11
    添加红包

    请填写红包祝福语或标题

    红包个数最小为10个

    红包金额最低5元

    当前余额3.43前往充值 >
    需支付:10.00
    成就一亿技术人!
    领取后你会自动成为博主和红包主的粉丝 规则
    hope_wisdom
    发出的红包
    实付
    使用余额支付
    点击重新获取
    扫码支付
    钱包余额 0

    抵扣说明:

    1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
    2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

    余额充值