java 发送数据到flume_通过Thrift source向Flume发送数据的Python实现 | 学步园

目前Flume支持Thrift source,即通过一个Thrift服务来收集数据(这一点和scribe是一样的),然后通过相应的channel发送到sink中去。以下是具体的实现过程:

环境:Python 2.7.5/CDH4.3 Flume 1.3/Thrift 0.9/

首先,我们需要一个Thrift协议的Python Flume客户端的模块,这个模块可以根据Thrift的定义自动生成。你应该先从Cloudera的网站上下载到CDH4.3中的Flume tarball :

wget http://archive.cloudera.com/cdh4/cdh/4/

下载到本地之后解压,在目录flume-ng-sdk\src\main\thrift下有Thrift对应的定义文件,并用它来生成对应的客户端模块:tar xzvf flume-ng-1.3.0-cdh4.3.0.tar.gz

cd apache-flume-1.3.0-cdh4.3.0-bin\flume-ng-sdk\src\main\thrift

thrift --gen py flume.thrit

你会在当前目录下得到一个叫做gen-py的目录,我们将其更名为genpy之后,放到Python的系统模块路径中去:mv gen-py/ /usr/local/lib/python2.7/site-packages/genpy

此时,你就可以通过以下过程来引用这个模块了:[GCC 4.4.7 20120313 (Red Hat 4.4.7-3)] on linux2

Type "help", "copyright", "credits" or "license" for more information.

>>> from genpy import flume

>>> dir(flume)

['__all__', '__builtins__', '__doc__', '__file__', '__name__', '__package__', '__path__']

>>>

下面,就是利用这模块来封装一个客户端模块,注意Flume的Thrift source服务端使用的协议是继承自TCompactProtocol的TTupleProtocol:public final class TTupleProtocol extends TCompactProtocol {...

在Thrift Python模块中,只有两种可选协议:TCompactProtocol, TBinaryProtocol, 很显然我们需要使用前一种协议,如果使用TBinaryProtocol,会在服务器端报以下错误:18 Jul 2013 18:25:29,447 ERROR [pool-5-thread-4] (org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run:213) - Thrift error occurred d

uring processing of message.

org.apache.thrift.protocol.TProtocolException: Expected protocol id ffffff82 but got ffffff80

at org.apache.thrift.protocol.TCompactProtocol.readMessageBegin(TCompactProtocol.java:472)

at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:27)

at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206)

at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)

at java.lang.Thread.run(Thread.java:662)

在客户端会有以下的报错Traceback (most recent call last):

File "pyflume.py", line 65, in

flume_client.send({'a':'hello', 'b':'world'}, 'events under hello world')

File "pyflume.py", line 53, in send

self.client.append(event)

File "/usr/local/lib/python2.7/site-packages/genpy/flume/ThriftSourceProtocol.py", line 49, in append

return self.recv_append()

File "/usr/local/lib/python2.7/site-packages/genpy/flume/ThriftSourceProtocol.py", line 60, in recv_append

(fname, mtype, rseqid) = self._iprot.readMessageBegin()

File "/usr/local/lib/python2.7/site-packages/thrift/protocol/TBinaryProtocol.py", line 126, in readMessageBegin

sz = self.readI32()

File "/usr/local/lib/python2.7/site-packages/thrift/protocol/TBinaryProtocol.py", line 206, in readI32

buff = self.trans.readAll(4)

File "/usr/local/lib/python2.7/site-packages/thrift/transport/TTransport.py", line 58, in readAll

chunk = self.read(sz - have)

File "/usr/local/lib/python2.7/site-packages/thrift/transport/TTransport.py", line 271, in read

self.readFrame()

File "/usr/local/lib/python2.7/site-packages/thrift/transport/TTransport.py", line 275, in readFrame

buff = self.__trans.readAll(4)

File "/usr/local/lib/python2.7/site-packages/thrift/transport/TTransport.py", line 58, in readAll

chunk = self.read(sz - have)

File "/usr/local/lib/python2.7/site-packages/thrift/transport/TSocket.py", line 118, in read

message='TSocket read 0 bytes')

thrift.transport.TTransport.TTransportException: TSocket read 0 bytes

以下就是具体的实现代码,大家可以参考修改:#coding=utf-8

'''

Created on 2013-07-18

@author: Felix

'''

from genpy.flume import ThriftSourceProtocol

from genpy.flume.ttypes import ThriftFlumeEvent

from thrift.transport import TTransport, TSocket

from thrift.protocol import TCompactProtocol

class _Transport(object):

def __init__(self, thrift_host, thrift_port, timeout=None, unix_socket=None):

self.thrift_host = thrift_host

self.thrift_port = thrift_port

self.timeout = timeout

self.unix_socket = unix_socket

self._socket = TSocket.TSocket(self.thrift_host, self.thrift_port, self.unix_socket)

self._transport_factory = TTransport.TFramedTransportFactory()

self._transport = self._transport_factory.getTransport(self._socket)

def connect(self):

try:

if self.timeout:

self._socket.setTimeout(self.timeout)

if not self.is_open():

self._transport = self._transport_factory.getTransport(self._socket)

self._transport.open()

except Exception, e:

print(e)

self.close()

def is_open(self):

return self._transport.isOpen()

def get_transport(self):

return self._transport

def close(self):

self._transport.close()

class FlumeClient(object):

def __init__(self, thrift_host, thrift_port, timeout=None, unix_socket=None):

self._transObj = _Transport(thrift_host, thrift_port, timeout=timeout, unix_socket=unix_socket)

self._protocol = TCompactProtocol.TCompactProtocol(trans=self._transObj.get_transport())

self.client = ThriftSourceProtocol.Client(iprot=self._protocol, oprot=self._protocol)

self._transObj.connect()

def send(self, event):

try:

self.client.append(event)

except Exception, e:

print(e)

finally:

self._transObj.connect()

def send_batch(self, events):

try:

self.client.appendBatch(events)

except Exception, e:

print(e)

finally:

self._transObj.connect()

def close(self):

self._transObj.close()

if __name__ == '__main__':

import random

flume_client = FlumeClient('192.168.1.141', 4141)

event = ThriftFlumeEvent({'a':'hello', 'b':'world'}, 'events under hello world2')

events = [ThriftFlumeEvent({'a':'hello', 'b':'world'}, 'events under hello world%s' % random.randint(0, 1000)) for _ in range(100)]

flume_client.send(event)

flume_client.send_batch(events)

flume_client.close()

以上代码也在github上:https://github.com/sinolambda/pyflume

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值