pulsar支持应用无感知的扩展与迁移。
对broker,我们不论是升级还是扩展都非常简单,此处不做介绍。但是对于bookie,还是需要注意一些地方的。
autorecovery
关闭
bookkeeper shell autorecovery -disable
开启
bookkeeper shell autorecovery -enable
做迁移bookie的时候开启自动拷贝,会自动将关闭bookie的消息拷贝到新增的bookie上。
如何查看拷贝的ledger
显示bookkeeper的复制列表(此处可看出所有下架bookie的消息对否拷贝完全)
bookkeeper shell listunderreplicated
显示bookkeeper的未复制列表(对某台bookie而言)
bookkeeper shell listunderreplicated -missingreplica 172.16.4.224:3181
显示某个ledgerId的元数据信息
bookkeeper shell ledgermetadata -ledgerid 89
问题一
https://github.com/apache/bookkeeper/issues/2001
楼主碰到了这个bug。
现象是
13:34:36.437 [db-storage-cleanup-16-1] WARN org.apache.bookkeeper.bookie.storage.ldb.DbLedgerStorage - Failed to cleanup db indexes
org.apache.bookkeeper.bookie.Bookie
N
o
E
n
t
r
y
E
x
c
e
p
t
i
o
n
:
E
n
t
r
y
−
1
n
o
t
f
o
u
n
d
i
n
630856964063500820
a
t
o
r
g
.
a
p
a
c
h
e
.
b
o
o
k
k
e
e
p
e
r
.
b
o
o
k
i
e
.
s
t
o
r
a
g
e
.
l
d
b
.
E
n
t
r
y
L
o
c
a
t
i
o
n
I
n
d
e
x
.
g
e
t
L
a
s
t
E
n
t
r
y
I
n
L
e
d
g
e
r
I
n
t
e
r
n
a
l
(
E
n
t
r
y
L
o
c
a
t
i
o
n
I
n
d
e
x
.
j
a
v
a
:
123
)
[
o
r
g
.
a
p
a
c
h
e
.
b
o
o
k
k
e
e
p
e
r
−
b
o
o
k
k
e
e
p
e
r
−
s
e
r
v
e
r
−
4.9.0.
j
a
r
:
4.9.0
]
a
t
o
r
g
.
a
p
a
c
h
e
.
b
o
o
k
k
e
e
p
e
r
.
b
o
o
k
i
e
.
s
t
o
r
a
g
e
.
l
d
b
.
E
n
t
r
y
L
o
c
a
t
i
o
n
I
n
d
e
x
.
r
e
m
o
v
e
O
f
f
s
e
t
F
r
o
m
D
e
l
e
t
e
d
L
e
d
g
e
r
s
(
E
n
t
r
y
L
o
c
a
t
i
o
n
I
n
d
e
x
.
j
a
v
a
:
219
)
[
o
r
g
.
a
p
a
c
h
e
.
b
o
o
k
k
e
e
p
e
r
−
b
o
o
k
k
e
e
p
e
r
−
s
e
r
v
e
r
−
4.9.0.
j
a
r
:
4.9.0
]
a
t
o
r
g
.
a
p
a
c
h
e
.
b
o
o
k
k
e
e
p
e
r
.
b
o
o
k
i
e
.
s
t
o
r
a
g
e
.
l
d
b
.
S
i
n
g
l
e
D
i
r
e
c
t
o
r
y
D
b
L
e
d
g
e
r
S
t
o
r
a
g
e
.
l
a
m
b
d
a
NoEntryException: Entry -1 not found in 630856964063500820 at org.apache.bookkeeper.bookie.storage.ldb.EntryLocationIndex.getLastEntryInLedgerInternal(EntryLocationIndex.java:123) ~[org.apache.bookkeeper-bookkeeper-server-4.9.0.jar:4.9.0] at org.apache.bookkeeper.bookie.storage.ldb.EntryLocationIndex.removeOffsetFromDeletedLedgers(EntryLocationIndex.java:219) ~[org.apache.bookkeeper-bookkeeper-server-4.9.0.jar:4.9.0] at org.apache.bookkeeper.bookie.storage.ldb.SingleDirectoryDbLedgerStorage.lambda
NoEntryException:Entry−1notfoundin630856964063500820atorg.apache.bookkeeper.bookie.storage.ldb.EntryLocationIndex.getLastEntryInLedgerInternal(EntryLocationIndex.java:123) [org.apache.bookkeeper−bookkeeper−server−4.9.0.jar:4.9.0]atorg.apache.bookkeeper.bookie.storage.ldb.EntryLocationIndex.removeOffsetFromDeletedLedgers(EntryLocationIndex.java:219) [org.apache.bookkeeper−bookkeeper−server−4.9.0.jar:4.9.0]atorg.apache.bookkeeper.bookie.storage.ldb.SingleDirectoryDbLedgerStorage.lambdacheckpoint
7
(
S
i
n
g
l
e
D
i
r
e
c
t
o
r
y
D
b
L
e
d
g
e
r
S
t
o
r
a
g
e
.
j
a
v
a
:
624
)
[
o
r
g
.
a
p
a
c
h
e
.
b
o
o
k
k
e
e
p
e
r
−
b
o
o
k
k
e
e
p
e
r
−
s
e
r
v
e
r
−
4.9.0.
j
a
r
:
4.9.0
]
a
t
j
a
v
a
.
u
t
i
l
.
c
o
n
c
u
r
r
e
n
t
.
E
x
e
c
u
t
o
r
s
7(SingleDirectoryDbLedgerStorage.java:624) ~[org.apache.bookkeeper-bookkeeper-server-4.9.0.jar:4.9.0] at java.util.concurrent.Executors
7(SingleDirectoryDbLedgerStorage.java:624) [org.apache.bookkeeper−bookkeeper−server−4.9.0.jar:4.9.0]atjava.util.concurrent.ExecutorsRunnableAdapter.call(Executors.java:511) [?:1.8.0_181]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_181]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access
201
(
S
c
h
e
d
u
l
e
d
T
h
r
e
a
d
P
o
o
l
E
x
e
c
u
t
o
r
.
j
a
v
a
:
180
)
[
?
:
1.8.
0
1
81
]
a
t
j
a
v
a
.
u
t
i
l
.
c
o
n
c
u
r
r
e
n
t
.
S
c
h
e
d
u
l
e
d
T
h
r
e
a
d
P
o
o
l
E
x
e
c
u
t
o
r
201(ScheduledThreadPoolExecutor.java:180) [?:1.8.0_181] at java.util.concurrent.ScheduledThreadPoolExecutor
201(ScheduledThreadPoolExecutor.java:180)[?:1.8.0181]atjava.util.concurrent.ScheduledThreadPoolExecutorScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [?:1.8.0_181]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_181]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_181]
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [io.netty-netty-all-4.1.32.Final.jar:4.1.32.Final]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]
13:35:36.359 [db-storage-cleanup-16-1] INFO org.apache.bookkeeper.bookie.storage.ldb.EntryLocationIndex - Deleting indexes for ledgers: [32768, 32771, 32774, 32777, 32780, 32783, 32786, 32789, 32792, 32795, 32798, 32801, 32804, 32807, 32810, 32813, 32816, 32819, 32822, 32825, 32828, 32831, 32834, 32837, 32840, 32843, 32846, 32849, 32852, 32855, 32858, 32861, 32864, 32867, 32870, 32873, 32876, 32879
暂未解决
问题二
以及可用bookie不足的错误;
12:19:53.378 [ReplicationWorker] WARN org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicyImpl - Failed to find 1 bookies : excludeBookies [Bookie:172.16.4.229:3181, Bookie:172.16.4.230:3181, Bookie:172.16.4.222:3181], allBookies [Bookie:172.16.4.222:3181, Bookie:172.16.4.229:3181, Bookie:172.16.4.230:3181].
12:19:53.378 [ReplicationWorker] WARN org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicyImpl - Failed to choose a bookie: excluded [Bookie:172.16.4.229:3181, Bookie:172.16.4.230:3181, Bookie:172.16.4.222:3181], fallback to choose bookie randomly from the cluster.
12:19:53.378 [ReplicationWorker] WARN org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicyImpl - Failed to find 1 bookies : excludeBookies [Bookie:172.16.4.229:3181, Bookie:172.16.4.230:3181, Bookie:172.16.4.222:3181], allBookies [Bookie:172.16.4.229:3181, Bookie:172.16.4.230:3181, Bookie:172.16.4.222:3181].
12:19:53.378 [ReplicationWorker] WARN org.apache.bookkeeper.replication.ReplicationWorker - BKNotEnoughBookiesException while replicating the fragment
org.apache.bookkeeper.client.BKException$BKNotEnoughBookiesException: Not enough non-faulty bookies available
at org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicyImpl.selectRandomInternal(RackawareEnsemblePlacementPolicyImpl.java:989) ~[org.apache.bookkeeper-bookkeeper-server-4.9.0.jar:4.9.0]
at org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicyImpl.selectRandom(RackawareEnsemblePlacementPolicyImpl.java:907) ~[org.apache.bookkeeper-bookkeeper-server-4.9.0.jar:4.9.0]
at org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicyImpl.selectFromNetworkLocation(RackawareEnsemblePlacementPolicyImpl.java:797) ~[org.apache.bookkeeper-bookkeeper-server-4.9.0.jar:4.9.0]
at org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicy.selectFromNetworkLocation(RackawareEnsemblePlacementPolicy.java:200) ~[org.apache.bookkeeper-bookkeeper-server-4.9.0.jar:4.9.0]
at org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicyImpl.selectFromNetworkLocation(RackawareEnsemblePlacementPolicyImpl.java:757) ~[org.apache.bookkeeper-bookkeeper-server-4.9.0.jar:4.9.0]
at org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicy.selectFromNetworkLocation(RackawareEnsemblePlacementPolicy.java:221) ~[org.apache.bookkeeper-bookkeeper-server-4.9.0.jar:4.9.0]
at org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicyImpl.replaceBookie(RackawareEnsemblePlacementPolicyImpl.java:659) ~[org.apache.bookkeeper-bookkeeper-server-4.9.0.jar:4.9.0]
at org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicy.replaceBookie(RackawareEnsemblePlacementPolicy.java:114) ~[org.apache.bookkeeper-bookkeeper-server-4.9.0.jar:4.9.0]
at org.apache.bookkeeper.client.BookKeeperAdmin.getReplacementBookiesByIndexes(BookKeeperAdmin.java:997) ~[org.apache.bookkeeper-bookkeeper-server-4.9.0.jar:4.9.0]
at org.apache.bookkeeper.client.BookKeeperAdmin.replicateLedgerFragment(BookKeeperAdmin.java:1045) ~[org.apache.bookkeeper-bookkeeper-server-4.9.0.jar:4.9.0]
at org.apache.bookkeeper.replication.ReplicationWorker.rereplicate(ReplicationWorker.java:296) [org.apache.bookkeeper-bookkeeper-server-4.9.0.jar:4.9.0]
at org.apache.bookkeeper.replication.ReplicationWorker.rereplicate(ReplicationWorker.java:249) [org.apache.bookkeeper-bookkeeper-server-4.9.0.jar:4.9.0]
at org.apache.bookkeeper.replication.ReplicationWorker.run(ReplicationWorker.java:210) [org.apache.bookkeeper-bookkeeper-server-4.9.0.jar:4.9.0]
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [io.netty-netty-all-4.1.32.Final.jar:4.1.32.Final]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]
经过询问pulsar大神sijie后,将bookie关闭AutoRecovery功能,再全部重启后错误不再抛出。如果有碰到的兄弟可以试试这个方法。(关闭bookie的时候注意,最好将producer关闭,要不然会造成消息的重复发送。2.4版本支持消息的事务功能,应该能解决此问题。)