hbase创建solr索引的超时问题

本次记录一下hbase创建solr二级索引出现的一些问题,传统比较保险的一种做法就是通过java API读取hbase中数据,同时创建到solr中。集群是五台服务器,对于几亿条的数据的数据全表扫描还是很困难的。

试过通过列中的时间进行过滤,都会有超时情况出现。rowkey的设计通过几个唯一的字段拼接而成的,分隔符为“|”。首位防止出现热点问题进行hash值处理,取账号字段hash值的最后两位。
防止数据量过大,在读取hbase数据时通过起止键加了一个filter进行时间过滤。本来是想减少对应区间内的数据量,反而弄巧成拙是性能下降了。下面是应用端报出的错误:
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=36, exceptions:
Wed Mar 06 11:01:28 CST 2019, null, java.net.SocketTimeoutException: callTimeout=60000, callDuration=60103: Call to pbigdata1/196.1.40.33:21302 failed on local exception: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=5267, waitTime=60002, operationTimeout=60000 expired. row '97|017701800122410025|20181015|260500000032|8010002753|22419999|20190218^@' on table 'tbl_accounting_entry' at region=tbl_accounting_entry,94|019801370002199|20190214|M00020010750|0080064230|20010505|20190219,1551715791119.1f23765783d3cbf72861b9c5bdf612b2., hostname=pbigdata1,21302,1551669174213, seqNum=251080

        at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.throwEnrichedException(RpcRetryingCallerWithReadReplicas.java:275)
        at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:240)
        at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:62)
        at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:217)
        at org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:330)
        at org.apache.hadoop.hbase.client.ClientScanner.loadCache(ClientScanner.java:411)
        at org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:374)
        at com.hadoop.solr.HbaseSolr_AccountingEntry.run(HbaseSolr_AccountingEntry.java:33)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.SocketTimeoutException: callTimeout=60000, callDuration=60103: Call to pbigdata1/196.1.40.33:21302 failed on local exception: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=5267, waitTime=60002, operationTimeout=60000 expired. row '97|017701800122410025|20181015|260500000032|8010002753|22419999|20190218^@' on table 'tbl_accounting_entry' at region=tbl_accounting_entry,94|019801370002199|20190214|M00020010750|0080064230|20010505|20190219,1551715791119.1f23765783d3cbf72861b9c5bdf612b2., hostname=pbigdata1,21302,1551669174213, seqNum=251080
        at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:176)
        at org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:65)
        ... 3 more
Caused by: java.io.IOException: Call to pbigdata1/196.1.40.33:21302 failed on local exception: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=5267, waitTime=60002, operationTimeout=60000 expired.
        at org.apache.hadoop.hbase.ipc.AbstractRpcClient.wrapException(AbstractRpcClient.java:285)
        at org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1281)
        at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:224)
        at org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:329)
        at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.scan(ClientProtos.java:32741)
        at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:217)
        at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:63)
        at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:217)
        at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:387)
        at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:361)
        at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:138)
        ... 4 more
Caused by: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=5267, waitTime=60002, operationTimeout=60000 expired.
        at org.apache.hadoop.hbase.ipc.Call.checkAndSetTimeout(Call.java:70)
        at org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1255)
        ... 13 more

在hbase服务端出现如下警告:

response时间过长,按理说通过rowkey去取数据应该不会这么慢。所以最后的问题定位在hbase查询时的filter导致的超时,将filter去掉再去读取问题解决了没有再报错。

    但是感觉hbase rowkey的设计还是有点问题,rowkey打散的还是不够。首字母的区间设置应该还要更宽一些,或者直接通过hash值作为整个rowkey。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

tengkong22

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值