最近在做flume的实时日志收集,记录一下memeryChannel的transactionCapacity参数采的坑,transactionCapacity默认值为100,也就是说收集端的sink会在收集到了100条以后再去提交事务(即发送到下一个目的地),于是我修改了transactionCapacity到10,想看看是不是会更加实时一点,结果发现收集日志的agent启动的时候报错了
16/04/29 09:36:15 ERROR sink.AbstractRpcSink: Rpc Sink avro-sink: Unable to get event from channel memoryChannel. Exception follows.
org.apache.flume.ChannelException: Take list for MemoryTransaction, capacity 10 full, consider committing more frequently, increasing capacity, or increasing thread count
at org.apache.flume.channel.MemoryChannel
M
e
m
o
r
y
T
r
a
n
s
a
c
t
i
o
n
.
d
o
T
a
k
e
(
M
e
m
o
r
y
C
h
a
n
n
e
l
.
j
a
v
a
:
96
)
a
t
o
r
g
.
a
p
a
c
h
e
.
f
l
u
m
e
.
c
h
a
n
n
e
l
.
B
a
s
i
c
T
r
a
n
s
a
c
t
i
o
n
S
e
m
a
n
t
i
c
s
.
t
a
k
e
(
B
a
s
i
c
T
r
a
n
s
a
c
t
i
o
n
S
e
m
a
n
t
i
c
s
.
j
a
v
a
:
113
)
a
t
o
r
g
.
a
p
a
c
h
e
.
f
l
u
m
e
.
c
h
a
n
n
e
l
.
B
a
s
i
c
C
h
a
n
n
e
l
S
e
m
a
n
t
i
c
s
.
t
a
k
e
(
B
a
s
i
c
C
h
a
n
n
e
l
S
e
m
a
n
t
i
c
s
.
j
a
v
a
:
95
)
a
t
o
r
g
.
a
p
a
c
h
e
.
f
l
u
m
e
.
s
i
n
k
.
A
b
s
t
r
a
c
t
R
p
c
S
i
n
k
.
p
r
o
c
e
s
s
(
A
b
s
t
r
a
c
t
R
p
c
S
i
n
k
.
j
a
v
a
:
354
)
a
t
o
r
g
.
a
p
a
c
h
e
.
f
l
u
m
e
.
s
i
n
k
.
D
e
f
a
u
l
t
S
i
n
k
P
r
o
c
e
s
s
o
r
.
p
r
o
c
e
s
s
(
D
e
f
a
u
l
t
S
i
n
k
P
r
o
c
e
s
s
o
r
.
j
a
v
a
:
68
)
a
t
o
r
g
.
a
p
a
c
h
e
.
f
l
u
m
e
.
S
i
n
k
R
u
n
n
e
r
MemoryTransaction.doTake(MemoryChannel.java:96) at org.apache.flume.channel.BasicTransactionSemantics.take(BasicTransactionSemantics.java:113) at org.apache.flume.channel.BasicChannelSemantics.take(BasicChannelSemantics.java:95) at org.apache.flume.sink.AbstractRpcSink.process(AbstractRpcSink.java:354) at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68) at org.apache.flume.SinkRunner
MemoryTransaction.doTake(MemoryChannel.java:96)atorg.apache.flume.channel.BasicTransactionSemantics.take(BasicTransactionSemantics.java:113)atorg.apache.flume.channel.BasicChannelSemantics.take(BasicChannelSemantics.java:95)atorg.apache.flume.sink.AbstractRpcSink.process(AbstractRpcSink.java:354)atorg.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)atorg.apache.flume.SinkRunnerPollingRunner.run(SinkRunner.java:147)
at java.lang.Thread.run(Thread.java:745)
注意点:sink的batchsize参数 不能大于transactionCapacity的参数;这个sink的batchsize是什么意思呢,就是sink会一次从channel中取多少个event去发送,而这个发送是要最终以事务的形式去发送的,因此这个batchsize的event会传送到一个事务的缓存队列中(takeList),这是一个双向队列,这个队列可以在事务失败时进行回滚(也就是把取出来的数据吐memeryChannel的queue中),它的初始大小就是transactionCapacity定义的大小;在sink中,channel的transactionCapacity参数不能小于sink的batchsize。