测试一个config server 服务器挂机后,集群是否能读写数据

测试架构:

测试目的:


    测试一个config server 服务器挂机后,集群是否能读写数据。

测试原因:

    上周因为内存吃紧,准备添加内存。在查看服务器时,把一台服务器关机检查。
    关机后,WEB 端应用出错,用户无法登录,所有都出错。

    但在我理解,因为CONFIG是3台冗余的,一台关机,应该不会产生影响才对,或者说不会影响到无法
    登录,应用不能正常运行。只是会有数据不均衡啊。

    但我使用mongochef 登录,确实登录数据库不成功。


测试结果:

    经下面的测试,印证了我的理解,一台config 宕机,只是会提示连接config 出错,
    但能登录数据库,插入数据的代码(mongo -port 28000 登录后运行的JS 代码),还能正常运行下去。


    但为什么当时WEB 端会出错,我看到他们php 配置文件中如下:
    $connect = new MongoClient("mongodb://192.168.2.188:28000,192.168.2.132:28000", array("replicaSet" => "myReplSet"));

    却没加MongoConnectionException。 容错出错没有的话,当正在操作时,一个collection打开出错,不会再重新连接数据库,
    就报错了。如果这时能重新连接数据库,应该能再找到另外正常的节点连接的。



测试环境为: mongodb3.0.3
    2个shard,3个配置config,3个mongos




set        host:port            stateStr                                                   
------------------------------------------------------------
shard1        192.168.2.188:28011        PRIMARY                   
shard1        192.168.2.132:28011        SECONDARY               
shard1        192.168.2.134:28011        SECONDARY               

shard2        192.168.2.188:28012        SECONDARY               
shard2        192.168.2.132:28012        SECONDARY               
shard2        192.168.2.134:28012        PRIMARY                   
                



--------------------------------------------------------------------
192.168.2.188:28010        config                           
192.168.2.188:28000        mongos                           

192.168.2.132:28010        config                           
192.168.2.132:28000        mongos                           

192.168.2.134:28010        config                           
192.168.2.134:28000        mongos                       

#备用服务器,当134挂机后,把134 对应的t3 服务器名称修改为  135

192.168.2.135:28010        config                           


各服务器的hosts 文件:
[root@localhost bin]# cat /etc/hosts
192.168.2.188 t1
192.168.2.132 t2
192.168.2.134 t3

192.168.2.135 t4



一、配置集群:

    初始化replica set shard1
    用mongo 连接其中一个节点: 比如:shard11 执行以下:

    > config = {_id: 'shard1', members: [
                               {_id: 0, host:'t1:28011'}]
                }
    
    > rs.initiate(config);

    rs.add({_id: 1, host:'t2:28011'})
    rs.add({_id: 2, host:'t3:28011'})


    初始化replica set shard2
    用mongo 连接其中一个节点: 比如:shard11 执行以下:

    > config = {_id: 'shard2', members: [
                               {_id: 0, host:'t1:28012'}]
                }
    
    > rs.initiate(config);

    rs.add({_id: 1, host:'t2:28012'})
    rs.add({_id: 2, host:'t3:28012'})

        #遇到如下错误
        shard2:PRIMARY> rs.add({_id: 1, host:'t2:28012'})
        {
            "ok" : 0,
            "errmsg" : "Quorum check failed because not enough voting nodes responded; required 2 but only the following 1 voting nodes responded: t1:28012; the following nodes did not respond affirmatively: t2:28012 failed with Failed attempt to connect to t2:28012; couldn't connect to server t2:28012 (192.168.2.132), connection attempt failed",
            "code" : 74
        }
        #原因:有防火墙
        #关闭防火墙
        [root@t2 ~]# chkconfig iptables off
        [root@t2 ~]# service iptables stop
        iptables:清除防火墙规则:                                 [确定]
        iptables:将链设置为政策 ACCEPT:nat mangle filter         [确定]
        iptables:正在卸载模块:                                   [确定]
        [root@t2 ~]#



二、配置分片



    1:配置一下时间一致,不然无法建立分片集群:

    */5 * * * * /usr/sbin/ntpdate -u ntp.sjtu.edu.cn


    2.mongos 运行时的参数如下:
    /opt/mongodb3.0.3/bin/mongos --configdb t1:28010,t2:28010,t3:28010 --port 28000  --logpath /opt/mongodb3.0.3/logs/mongos.log --logappend --fork


    3.连接到其中一个mongos进程,并切换到admin数据库做以下配置
     3.1. 连接到mongs,并切换到admin
     ./mongo 192.168.20.11:28000/admin
     >db
     Admin
     3.2. 加入shards
     命令格式:
        replicaSetName/[:port] [,serverhostname2[:port],…]
    
    执行如下:
     >use admin;
     >db.runCommand({addshard:"shard1/t1:28011,t2:28011,t3:28011"});
     >db.runCommand({addshard:"shard2/t1:28012,t2:28012,t3:28012"});

    添加分片后,把chunksize 改成 1M,以方便测试
    >use config
     mongos> db.shards.find()
    { "_id" : "shard1", "host" : "shard1/t1:28011,t2:28011,t3:28011" }
    { "_id" : "shard2", "host" : "shard2/t1:28012,t2:28012,t3:28012" }
    mongos> db.settings.find()
    { "_id" : "chunksize", "value" : 64 }
    mongos> db.settings.update({_id:'chunksize'},{$set:{value:1}})
    WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })
    mongos> db.settings.find()
    { "_id" : "chunksize", "value" : 1 }
    mongos>

    到此分片集群配置已完成


三、添加数据


    db.runCommand({"enablesharding":"test"})
    db.runCommand({shardcollection:"test.customer",key:{_id:1}});

    #表中插入数据:
    use test;
    for (i = 1; i <= 90000; i++){
    db.customer.insert({_id:i,name:' test name test name test name test name test name test name test name test name test name end',
    address:'address test customer al;dkjfa;lsdfjsal;#^end',province:'test customer province',city:'test customer city'});
    }

    用于测试时查看各数据所在的分片。

    从下面的查询中,可以看到test.customer数据已分片到shard1,shard2

    mongos> db.chunks.find()
    { "_id" : "test.customer-_id_MinKey", "lastmod" : Timestamp(2, 1), "lastmodEpoch" : ObjectId("55b99f3ff5ddb9333eda7464"), "ns" : "test.customer", "min" : { "_id" : { "$minKey" : 1 } }, "max" : { "_id" : 2 }, "shard" : "shard2" }
    { "_id" : "test.customer-_id_2.0", "lastmod" : Timestamp(1, 2), "lastmodEpoch" : ObjectId("55b99f3ff5ddb9333eda7464"), "ns" : "test.customer", "min" : { "_id" : 2 }, "max" : { "_id" : 6 }, "shard" : "shard2" }
    { "_id" : "test.customer-_id_6.0", "lastmod" : Timestamp(2, 2), "lastmodEpoch" : ObjectId("55b99f3ff5ddb9333eda7464"), "ns" : "test.customer", "min" : { "_id" : 6 }, "max" : { "_id" : 2120 }, "shard" : "shard1" }
    { "_id" : "test.customer-_id_2120.0", "lastmod" : Timestamp(2, 3), "lastmodEpoch" : ObjectId("55b99f3ff5ddb9333eda7464"), "ns" : "test.customer", "min" : { "_id" : 2120 }, "max" : { "_id" : 4579 }, "shard" : "shard1" }
    { "_id" : "test.customer-_id_4579.0", "lastmod" : Timestamp(2, 5), "lastmodEpoch" : ObjectId("55b99f3ff5ddb9333eda7464"), "ns" : "test.customer", "min" : { "_id" : 4579 }, "max" : { "_id" : 6693 }, "shard" : "shard1" }
    { "_id" : "test.customer-_id_6693.0", "lastmod" : Timestamp(2, 6), "lastmodEpoch" : ObjectId("55b99f3ff5ddb9333eda7464"), "ns" : "test.customer", "min" : { "_id" : 6693 }, "max" : { "_id" : 9371 }, "shard" : "shard1" }
    { "_id" : "test.customer-_id_9371.0", "lastmod" : Timestamp(2, 7), "lastmodEpoch" : ObjectId("55b99f3ff5ddb9333eda7464"), "ns" : "test.customer", "min" : { "_id" : 9371 }, "max" : { "_id" : { "$maxKey" : 1 } }, "shard" : "shard1" }
    mongos> db.collections.find()
    { "_id" : "test.customer", "lastmod" : ISODate("2015-07-30T03:51:28.671Z"), "dropped" : false, "key" : { "_id" : 1 }, "unique" : false, "lastmodEpoch" : ObjectId("55b99f3ff5ddb9333eda7464") }
    mongos>


四、测试挂掉一个config:
    
    4.1

    [mongo@localhost bin]$ ll /opt/mongodb3.0.3/config
    总用量 0
    [mongo@localhost bin]$ ./configd_start.sh
    about to fork child process, waiting until server is ready for connections.
    forked process: 2638
    child process started successfully, parent exiting
    [mongo@localhost bin]$ ps -ef|grep mongod
    mongo     2455     1 28 10:51 ?        00:04:14 /opt/mongodb3.0.3/bin/mongod --storageEngine wiredTiger -f shard1.conf
    mongo     2518     1  7 10:51 ?        00:01:11 /opt/mongodb3.0.3/bin/mongod --storageEngine wiredTiger -f shard2.conf
    mongo     2638     1  7 10:53 ?        00:00:59 /opt/mongodb3.0.3/bin/mongod --storageEngine wiredTiger -f configsvr.conf
    mongo     2931  2083  8 11:06 pts/0    00:00:00 grep mongod
    [mongo@localhost bin]$ kill -9 2638
    [mongo@localhost bin]$ ps -ef|grep mongod
    mongo     2455     1 27 10:51 ?        00:04:16 /opt/mongodb3.0.3/bin/mongod --storageEngine wiredTiger -f shard1.conf
    mongo     2518     1  7 10:51 ?        00:01:13 /opt/mongodb3.0.3/bin/mongod --storageEngine wiredTiger -f shard2.conf
    mongo     2941  2083  6 11:06 pts/0    00:00:00 grep mongod
    [mongo@localhost bin]$



    #t1登录没问题
    [mongo@t1 bin]$ ./mongo -port 28000
    MongoDB shell version: 3.0.3
    connecting to: 127.0.0.1:28000/test
    mongos> show dbs;
    admin   (empty)
    config  (empty)
    test    0.002GB
    mongos>
    
    #t2登录 也没问题
    [mongo@t2 bin]$ ./mongo -port 28000
    MongoDB shell version: 3.0.3
    connecting to: 127.0.0.1:28000/test
    mongos>

    #同样,使用mongochef 客户端登录,也同样验证了上面的问题,


    
    4.2.
    把分片从hostname修改成ip:

        > config = {_id: 'shard1', members: [
                               {_id: 0, host:'192.168.2.188:28011'}]
                }
    
        > rs.initiate(config);

        rs.add({_id: 1, host:'192.168.2.132:28011'})
        rs.add({_id: 2, host:'192.168.2.134:28011'})



        > config = {_id: 'shard2', members: [
                               {_id: 0, host:'192.168.2.188:28012'}]
                }
    
        > rs.initiate(config);

        rs.add({_id: 1, host:'192.168.2.132:28012'})
        rs.add({_id: 2, host:'192.168.2.134:28012'})

        #把mongos 的参数中的hostname也修改成IP

        [mongo@t1 bin]$ cat mongos_router.sh
        /opt/mongodb3.0.3/bin/mongos --configdb 192.168.2.188:28010,192.168.2.132:28010,192.168.2.134:28010 --port 28000  --logpath /opt/mongodb3.0.3/logs/mongos.log --logappend --fork
        [mongo@t1 bin]$

    #添加分片,把chunksize 改成 1M,以方便测试

     >use admin;
     >db.runCommand({addshard:"shard1/192.168.2.188:28011,192.168.2.132:28011,192.168.2.134:28011"});
     >db.runCommand({addshard:"shard2/192.168.2.188:28012,192.168.2.132:28012,192.168.2.134:28012"});

    db.settings.update({_id:'chunksize'},{$set:{value:1}})

    #当所有配置设置成使用IP后,宕掉config server.
    #再从本机尝试连接,发现不成功。


    [mongo@localhost bin]$ ps -ef|grep mongod
    mongo     4062     1 10 11:48 ?        00:09:56 /opt/mongodb3.0.3/bin/mongod --storageEngine wiredTiger -f shard1.conf
    mongo     4082     1  8 11:48 ?        00:08:09 /opt/mongodb3.0.3/bin/mongod --storageEngine wiredTiger -f shard2.conf
    mongo     4102     1 11 11:48 ?        00:10:45 /opt/mongodb3.0.3/bin/mongod --storageEngine wiredTiger -f configsvr.conf
    mongo     4121     1  5 11:49 ?        00:04:55 /opt/mongodb3.0.3/bin/mongos --configdb 192.168.2.188:28010,192.168.2.132:28010,192.168.2.134:28010 --logpath /opt/mongodb3.0.3/logs/mongos.log --logappend --fork
    root      9459  9420  0 13:23 pts/1    00:00:00 tail -330f /opt/mongodb3.0.3/logs/shard1.log
    mongo     9471  2083  5 13:23 pts/0    00:00:00 grep mongod
    [mongo@localhost bin]$ kill -9 4102
    [mongo@localhost bin]$ ps -ef|grep mongod
    mongo     4062     1 10 11:48 ?        00:10:20 /opt/mongodb3.0.3/bin/mongod --storageEngine wiredTiger -f shard1.conf
    mongo     4082     1  8 11:48 ?        00:08:18 /opt/mongodb3.0.3/bin/mongod --storageEngine wiredTiger -f shard2.conf
    mongo     4121     1  5 11:49 ?        00:04:59 /opt/mongodb3.0.3/bin/mongos --configdb 192.168.2.188:28010,192.168.2.132:28010,192.168.2.134:28010 --logpath /opt/mongodb3.0.3/logs/mongos.log --logappend --fork
    root      9459  9420  0 13:23 pts/1    00:00:00 tail -330f /opt/mongodb3.0.3/logs/shard1.log
    mongo     9488  2083  5 13:24 pts/0    00:00:00 grep mongod

    [mongo@localhost bin]$ ./mongo -port 28000
    MongoDB shell version: 3.0.3
    connecting to: 127.0.0.1:28000/test
    2015-07-31T13:25:52.166+0800 W NETWORK  Failed to connect to 127.0.0.1:28000, reason: errno:111 Connection refused
    2015-07-31T13:25:53.514+0800 E QUERY    Error: couldn't connect to server 127.0.0.1:28000 (127.0.0.1), connection attempt failed
        at connect (src/mongo/shell/mongo.js:181:14)
        at (connect):1:6 at src/mongo/shell/mongo.js:181
    exception: connect failed
    [mongo@localhost bin]$ ./mongo -port 28000
    MongoDB shell version: 3.0.3
    connecting to: 127.0.0.1:28000/test
    2015-07-31T13:26:30.823+0800 W NETWORK  Failed to connect to 127.0.0.1:28000, reason: errno:111 Connection refused
    2015-07-31T13:26:32.929+0800 E QUERY    Error: couldn't connect to server 127.0.0.1:28000 (127.0.0.1), connection attempt failed
        at connect (src/mongo/shell/mongo.js:181:14)
        at (connect):1:6 at src/mongo/shell/mongo.js:181
    exception: connect failed

    [mongo@localhost bin]$ ps -ef|grep mongod
    mongo     4062     1 12 11:48 ?        00:11:48 /opt/mongodb3.0.3/bin/mongod --storageEngine wiredTiger -f shard1.conf
    mongo     4082     1  8 11:48 ?        00:08:29 /opt/mongodb3.0.3/bin/mongod --storageEngine wiredTiger -f shard2.conf
    mongo     4121     1  5 11:49 ?        00:05:09 /opt/mongodb3.0.3/bin/mongos --configdb 192.168.2.188:28010,192.168.2.132:28010,192.168.2.134:28010 --logpath /opt/mongodb3.0.3/logs/mongos.log --logappend --fork
    root      9459  9420  0 13:23 pts/1    00:00:00 tail -330f /opt/mongodb3.0.3/logs/shard1.log
    mongo     9754  2083  7 13:26 pts/0    00:00:00 grep mongod
    [mongo@localhost bin]$ ./mongo -port 28000
    MongoDB shell version: 3.0.3
    connecting to: 127.0.0.1:28000/test
    2015-07-31T13:28:17.397+0800 W NETWORK  Failed to connect to 127.0.0.1:28000, reason: errno:111 Connection refused
    2015-07-31T13:28:19.161+0800 E QUERY    Error: couldn't connect to server 127.0.0.1:28000 (127.0.0.1), connection attempt failed
        at connect (src/mongo/shell/mongo.js:181:14)
        at (connect):1:6 at src/mongo/shell/mongo.js:181
    exception: connect failed
    [mongo@localhost bin]$

    
    #而其它节点(t2,t1)服务器连接没有问题。使用mongochef 测试,也印证了这点。
    是否说使用Ip地址,真的是在服务器集群中,稳定性会差一些呢。
    并且我同在宕掉config server. 时,测试循环插入数据的代码没有中止。后面也一直在插入数据。


    [mongo@t2 bin]$ ./mongo -port 28000
    MongoDB shell version: 3.0.3
    connecting to: 127.0.0.1:28000/test
    mongos>
    connecting to: 127.0.0.1:28000/test
    mongos> db.customer.count()
    10024
    mongos> db.customer.count()
    10091
    mongos> db.customer.count()
    10103
    mongos>


    可以从下面的日志中看到这点:

        2015-07-31T13:34:55.114+0800 I NETWORK  [LockPinger] trying reconnect to 192.168.2.134:28010 (192.168.2.134) failed
        2015-07-31T13:34:55.328+0800 W NETWORK  [LockPinger] Failed to connect to 192.168.2.134:28010, reason: errno:111 Connection refused
        2015-07-31T13:34:55.383+0800 I WRITE    [conn258] insert test.customer query: { _id: 11119.0, name: " test name test name test name test name test name test name test name test name test name end", address: "address test customer al;dkjfa;lsdfjsal;#^end", province: "test customer province", city: "test customer city" } ninserted:1 keyUpdates:0 writeConflicts:0 numYields:0 locks:{ Global: { acquireCount: { w: 2 } }, Database: { acquireCount: { w: 2 } }, Collection: { acquireCount: { w: 1 } }, oplog: { acquireCount: { w: 1 } } } 115ms
        2015-07-31T13:34:55.419+0800 I NETWORK  [LockPinger] reconnect 192.168.2.134:28010 (192.168.2.134) failed failed couldn't connect to server 192.168.2.134:28010 (192.168.2.134), connection attempt failed
        2015-07-31T13:34:55.556+0800 I COMMAND  [conn258] command test.$cmd command: insert { insert: "customer", documents: [ { _id: 11119.0, name: " test name test name test name test name test name test name test name test name test name end", address: "address test customer al;dkjfa;lsdfjsal;#^end", province: "test customer province", city: "test customer city" } ], ordered: true, metadata: { shardName: "shard1", shardVersion: [ Timestamp 4000|1, ObjectId('55bb0603bb39501b3b7c2934') ], session: 0 } } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:140 locks:{ Global: { acquireCount: { w: 2 } }, Database: { acquireCount: { w: 2 } }, Collection: { acquireCount: { w: 1 } }, oplog: { acquireCount: { w: 1 } } } 312ms
        2015-07-31T13:34:55.618+0800 I NETWORK  [LockPinger] scoped connection to 192.168.2.188:28010,192.168.2.132:28010,192.168.2.134:28010 not being returned to the pool
        2015-07-31T13:34:55.714+0800 W SHARDING [LockPinger] distributed lock pinger '192.168.2.188:28010,192.168.2.132:28010,192.168.2.134:28010/t3:28011:1438320143:660128477' detected an exception while pinging. :: caused by :: SyncClusterConnection::update prepare failed:  192.168.2.134:28010 (192.168.2.134) failed:9001 socket exception [CONNECT_ERROR] server [192.168.2.134:28010 (192.168.2.134) failed]
        2015-07-31T13:34:55.773+0800 I COMMAND  [conn258] command test.$cmd command: insert { insert: "customer", documents: [ { _id: 11120.0, name: " test name test name test name test name test name test name test name test name test name end", address: "address test customer al;dkjfa;lsdfjsal;#^end", province: "test customer province", city: "test customer city" } ], ordered: true, metadata: { shardName: "shard1", shardVersion: [ Timestamp 4000|1, ObjectId('55bb0603bb39501b3b7c2934') ], session: 0 } } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:140 locks:{ Global: { acquireCount: { w: 2 } }, Database: { acquireCount: { w: 2 } }, Collection: { acquireCount: { w: 1 } }, oplog: { acquireCount: { w: 1 } } } 105ms
        2015-07-31T13:34:56.034+0800 I WRITE    [conn258] insert test.customer query: { _id: 11121.0, name: " test name test name test name test name test name test name test name test name test name end", address: "address test customer al;dkjfa;lsdfjsal;#^end", province: "test customer province", city: "test customer city" } ninserted:1 keyUpdates:0 writeConflicts:0 numYields:0 locks:{ Global: { acquireCount: { w: 2 } }, Database: { acquireCount: { w: 2 } }, Collection: { acquireCount: { w: 1 } }, oplog: { acquireCount: { w: 1 } } } 114ms
        2015-07-31T13:34:56.161+0800 I COMMAND  [conn258] command test.$cmd command: insert { insert: "customer", documents: [ { _id: 11121.0, name: " test name test name test name test name test name test name test name test name test name end", address: "address test customer al;dkjfa;lsdfjsal;#^end", province: "test customer province", city: "test customer city" } ], ordered: true, metadata: { shardName: "shard1", shardVersion: [ Timestamp 4000|1, ObjectId('55bb0603bb39501b3b7c2934') ], session: 0 } } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:140 locks:{ Global: { acquireCount: { w: 2 } }, Database: { acquireCount: { w: 2 } }, Collection: { acquireCount: { w: 1 } }, oplog: { acquireCount: { w: 1 } } } 256ms
        2015-07-31T13:34:56.367+0800 I COMMAND  [conn258] command test.$cmd command: insert { insert: "customer", documents: [ { _id: 11122.0, name: " test name test name test name test name test name test name test name test name test name end", address: "address test customer al;dkjfa;lsdfjsal;#^end", province: "test customer province", city: "test customer city" } ], ordered: true, metadata: { shardName: "shard1", shardVersion: [ Timestamp 4000|1, ObjectId('55bb0603bb39501b3b7c2934') ], session: 0 } } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:140 locks:{ Global: { acquireCount: { w: 2 } }, Database: { acquireCount: { w: 2 } }, Collection: { acquireCount: { w: 1 } }, oplog: { acquireCount: { w: 1 } } } 110ms


        #重启config 后,再登录也出错。
        可能是本机mongos 已记录没找到config ,后再重启mongos ,全部正常。
        这有点说不过去。

        [mongo@localhost bin]$ ./configd_start.sh
        about to fork child process, waiting until server is ready for connections.
        forked process: 12102
        child process started successfully, parent exiting
        [mongo@localhost bin]$ ./mongo -port 28000
        MongoDB shell version: 3.0.3
        connecting to: 127.0.0.1:28000/test
        2015-07-31T13:46:49.324+0800 W NETWORK  Failed to connect to 127.0.0.1:28000, reason: errno:111 Connection refused
        2015-07-31T13:46:50.747+0800 E QUERY    Error: couldn't connect to server 127.0.0.1:28000 (127.0.0.1), connection attempt failed
            at connect (src/mongo/shell/mongo.js:181:14)
            at (connect):1:6 at src/mongo/shell/mongo.js:181
        exception: connect failed
        [mongo@localhost bin]$ ps -ef|grep mongod
        mongo     4062     1 23 11:48 ?        00:27:18 /opt/mongodb3.0.3/bin/mongod --storageEngine wiredTiger -f shard1.conf
        mongo     4082     1  8 11:48 ?        00:10:29 /opt/mongodb3.0.3/bin/mongod --storageEngine wiredTiger -f shard2.conf
        mongo     4121     1  5 11:49 ?        00:06:56 /opt/mongodb3.0.3/bin/mongos --configdb 192.168.2.188:28010,192.168.2.132:28010,192.168.2.134:28010 --logpath /opt/mongodb3.0.3/logs/mongos.log --logappend --fork
        root     11468  9420  0 13:40 pts/1    00:00:01 tail -330f /opt/mongodb3.0.3/logs/shard1.log
        mongo    12102     1 12 13:46 ?        00:00:05 /opt/mongodb3.0.3/bin/mongod --storageEngine wiredTiger -f configsvr.conf
        mongo    12190  2083  3 13:46 pts/0    00:00:00 grep mongod
        [mongo@localhost bin]$ kill -9 4121
        [mongo@localhost bin]$ ./mongos_router.sh
        about to fork child process, waiting until server is ready for connections.
        forked process: 12220
        child process started successfully, parent exiting
        [mongo@localhost bin]$ ps -ef|grep mongod
        mongo     4062     1 23 11:48 ?        00:27:57 /opt/mongodb3.0.3/bin/mongod --storageEngine wiredTiger -f shard1.conf
        mongo     4082     1  9 11:48 ?        00:11:09 /opt/mongodb3.0.3/bin/mongod --storageEngine wiredTiger -f shard2.conf
        root     11468  9420  0 13:40 pts/1    00:00:01 tail -330f /opt/mongodb3.0.3/logs/shard1.log
        mongo    12102     1 19 13:46 ?        00:00:31 /opt/mongodb3.0.3/bin/mongod --storageEngine wiredTiger -f configsvr.conf
        mongo    12220     1  5 13:47 ?        00:00:04 /opt/mongodb3.0.3/bin/mongos --configdb 192.168.2.188:28010,192.168.2.132:28010,192.168.2.134:28010 --logpath /opt/mongodb3.0.3/logs/mongos.log --logappend --fork
        mongo    12299  2083  3 13:48 pts/0    00:00:00 grep mongod
        [mongo@localhost bin]$ ./mongo -port 28000
        MongoDB shell version: 3.0.3

    cfg = rs.conf()
    cfg.members[0].priority = 2;
    cfg.members[1].priority = 3;
    cfg.members[2].priority = 1;
    rs.reconfig(cfg);

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
当kafka集群中的某台服务器宕机时,我们可以采取以下步骤来恢复kafka partition数据: 1. 首先,我们需要检查宕机服务器的硬件状况并确保服务器可以正常启动。如果是硬件故障导致宕机,我们需要修复或更换故障的硬件设备。 2. 然后,我们需要找出宕机服务器上的kafka数据目录。在该目录中,我们可以找到kafka partition的日志和索引文件,以及其他相关的元数据。 3. 接下来,我们需要将宕机服务器上的数据目录复制到一台正常运行的kafka服务器上。确保复制过程中保持数据的一致性。 4. 一旦数据目录复制完成,我们需要更新kafka的配置文件以指向新的数据目录。在配置文件中,我们需要修改`log.dirs`参数来指定新的数据目录路径。 5. 在更新配置文件后,我们可以启动kafka服务器,并使用命令`bin/kafka-server-start.sh <path_to_config_file>`来启动kafka。 6. 一旦kafka服务器成功启动,它将读取复制的数据目录,并恢复partition的数据。在此过程中,kafka会检查和修复任何可能的数据损坏。 7. 最后,我们可以使用kafka的工具来验证数据恢复的情况。通过连接到kafka服务器并使用命令行工具来消费和生产消息,我们可以确保partition的数据已成功恢复。 总结而言,当kafka集群中的某台服务器宕机时,我们可以通过将宕机服务器上的数据复制到正常服务器上,并更新配置文件来恢复kafka partition的数据。这样,在新的服务器上启动kafka后,数据将会被读取和恢复,从而保证数据的一致性和可用性。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值