mongodb读取测试

1. there are a large amount of data in Hbase (row numer = 3849920), read all data from mongodb again and below is the log:

[azureuser@h3crd-wlan31 ~]$ kubectl logs histclientsdataetl-20161028141055-doize |grep @ETL@ @ETL@ getLatestOnlineTime: 2016-10-27 23:59:59.815 @ETL@ get uplineDate from hbase: 2016-10-27 23:59:59.815 @ETL@ getCurrentPosFromDB: 2016-10-27 23:59:59.815 @ETL@ Start finding at time: 2016-10-28 15:00:17.991, startFindDayIndex=17101 @ETL@ Finding finished at time: 2016-10-28 15:00:21.551 @ETL@ Wrting to Hbase has been finished ! @ETL@readCount: 3849920,putCount: 0,time: 2016-10-28 15:11:48.784 @ETL@ child file name: 1.1.1.1-histclientsdataetl @ETL@ Change logfile name has finished: 2016-10-28 15:11:48.792 @ETL@ cmd: sh /home/mongo-hive-hbase/Hive_HistClientsInfoAnalysis.sh @ETL@: exec has finished! @ETL@ new Executing remote hive shell has finished: 2016-10-28 15:12:49.273 @ETL@ result:

 

将代码改为批量获取后的效果(10695026条数据共4个G左右):

public MongoCursor<Document> find(Bson filter)
    {
        FindIterable<Document> findIterable = this.collection.find(filter).batchSize(50000);
        MongoCursor<Document> mongoCursor = findIterable.iterator();

        return mongoCursor;
    }

 

[azureuser@h3crd-wlan31 ~]$ kubectl logs histclientsdataetl-xubd1 |grep @ETL@
@ETL@ getLatestOnlineTime: 2016-10-28 23:59:59.982
@ETL@ get uplineDate from hbase: 2016-10-28 23:59:59.982
@ETL@ getCurrentPosFromDB: 2016-10-28 23:59:59.982
@ETL@ Start finding at time: 2016-10-31 00:05:02.858, startFindDayIndex=17102
@ETL@ Finding finished at time: 2016-10-31 00:05:04.154
@ETL@ Wrting to Hbase has been finished !
@ETL@ upDateLatestOnlineTime: 2016-10-30 23:59:59.999
@ETL@readCount: 10695026,putCount: 10695026,time: 2016-10-31 00:48:41.78

数据量是之前的2.7倍,拉取时间4倍,读取的时候是从Mongodb的备服务器上读取

 

继续修改代码加batchsize增加一倍后:

public MongoCursor<Document> find(Bson filter)
    {
        FindIterable<Document> findIterable = this.collection.find(filter).batchSize(100000);
        MongoCursor<Document> mongoCursor = findIterable.iterator();

        return mongoCursor;
    }

 

[azureuser@h3crd-wlan31 ~]$ kubectl logs histclientsdataetl-20161031095007-qv4fd |grep @ETL@
@ETL@ getLatestOnlineTime: 2016-10-30 23:59:59.999
@ETL@ get uplineDate from hbase: 2016-10-30 23:59:59.999
@ETL@ getCurrentPosFromDB: 2016-10-30 23:59:59.999
@ETL@ Start finding at time: 2016-10-31 10:26:55.071, startFindDayIndex=17104
@ETL@ Finding finished at time: 2016-10-31 10:26:56.406
@ETL@ Wrting to Hbase has been finished !
@ETL@readCount: 7936993,putCount: 0,time: 2016-10-31 11:04:02.219

数据量2.06倍,时间3.08倍

转载于:https://www.cnblogs.com/zhengchunhao/p/6008829.html

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值