mongodb oplog java_MongoDB oplog 深入剖析

MongoDB 的Replication是通过一个日志来存储写操作的,这个日志就叫做oplog。

在默认情况下,oplog分配的是5%的空闲磁盘空间。通常而言,这是一种合理的设置。可以通过mongod --oplogSize来改变oplog的日志大小。

oplog是capped collection,因为oplog的特点(不能太多把磁盘填满了,固定大小)需要,MongoDB才发明了capped collection(the oplog is actually the reason capped collections were invented).

oplog的位置

oplog在local库:

master/slave 架构下

local.oplog.$main;

replica sets 架构下:

local.oplog.rs

sharding 架构下,mongos下不能查看oplog,可到每一片去看。

mongos> use local

switched to db local

mongos> show collections

Thu Mar 28 11:37:11 uncaught exception: error: { "$err" : "can't use 'local' database through mongos", "code" : 13644 }

1

2

3

4

mongos>uselocal

switchedtodblocal

mongos>showcollections

ThuMar2811:37:11uncaughtexception:error:{"$err":"can't use 'local' database through mongos","code":13644}

oplog的格式

MongoDB 2.0版本

PRIMARY> db.oplog.rs.findOne()

{

"ts" : {

"t" : 1354919611000,

"i" : 196

},

"h" : NumberLong("-8946637877024029255"),

"op" : "i",

"ns" : "msg.msgToSend",

"o" : {

"_id" : ObjectId("50c26ecae7d64ae0b5f36cfe"),

...

}

}

1

2

3

4

5

6

7

8

9

10

11

12

13

14

PRIMARY>db.oplog.rs.findOne()

{

"ts":{

"t":1354919611000,

"i":196

},

"h":NumberLong("-8946637877024029255"),

"op":"i",

"ns":"msg.msgToSend",

"o":{

"_id":ObjectId("50c26ecae7d64ae0b5f36cfe"),

...

}

}

MongoDB 2.2版本

PRIMARY> db.oplog.rs.findOne()

{

"ts" : Timestamp(1364362801000, 8247),

"h" : NumberLong("8229173295225699173"),

"v" : 2,

"op" : "i",

"ns" : "goods.Simigoods",

"fromMigrate" : true,

"o" : {

"_id" : ObjectId("50b534310eba2018b88ba3b2"),

...

}

}

1

2

3

4

5

6

7

8

9

10

11

12

13

PRIMARY>db.oplog.rs.findOne()

{

"ts":Timestamp(1364362801000,8247),

"h":NumberLong("8229173295225699173"),

"v":2,

"op":"i",

"ns":"goods.Simigoods",

"fromMigrate":true,

"o":{

"_id":ObjectId("50b534310eba2018b88ba3b2"),

...

}

}

可以看到有个字段"fromMigrate" : true,之前以为是从2.0升级过来的,后查看源码发现并发如此,fromMigrate指的是chunk是迁移过来的,分片里的块移动,详见src/mongo/s/d_migrate.cpp,

v表示OPLOG_VERSION,oplog版本。

新搭建的结构形如:

PRIMARY> db.version()

2.2.2

PRIMARY> db.oplog.rs.findOne()

{

"ts" : Timestamp(1364186197000, 58),

"h" : NumberLong("-7878220425718087654"),

"v" : 2,

"op" : "u",

"ns" : "exaitem_gmsbatchtask.jdgmsbatchtask",

"o2" : {

"_id" : "83f09a98-6a41-497b-a988-99ba5399d296"

},

"o" : {

"_id" : "83f09a98-6a41-497b-a988-99ba5399d296",

"status" : 2,

"content" : "",

"type" : 17,

"business" : "832722",

"optype" : 2,

"addDate" : ISODate("2013-03-25T04:36:38.511Z"),

"modifyDate" : ISODate("2013-03-25T04:36:39.131Z"),

"source" : 5

}

}

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

PRIMARY>db.version()

2.2.2

PRIMARY>db.oplog.rs.findOne()

{

"ts":Timestamp(1364186197000,58),

"h":NumberLong("-7878220425718087654"),

"v":2,

"op":"u",

"ns":"exaitem_gmsbatchtask.jdgmsbatchtask",

"o2":{

"_id":"83f09a98-6a41-497b-a988-99ba5399d296"

},

"o":{

"_id":"83f09a98-6a41-497b-a988-99ba5399d296",

"status":2,

"content":"",

"type":17,

"business":"832722",

"optype":2,

"addDate":ISODate("2013-03-25T04:36:38.511Z"),

"modifyDate":ISODate("2013-03-25T04:36:39.131Z"),

"source":5

}

}

MongoDB 2.4版本

{

"ts" : {

"t" : 1361948104000,

"i" : 325

},

"h" : NumberLong("-8795977166222676062"),

"v" : 2,

"op" : "i",

"ns" : "test.log",

"o" : {

"_id" : ObjectId("51031ca0c86617a8811be893"),

...

}

}

1

2

3

4

5

6

7

8

9

10

11

12

13

14

{

"ts":{

"t":1361948104000,

"i":325

},

"h":NumberLong("-8795977166222676062"),

"v":2,

"op":"i",

"ns":"test.log",

"o":{

"_id":ObjectId("51031ca0c86617a8811be893"),

...

}

}

格式大同小异,2.4版本又改回去了。ts格式2.2版本中是Timestamp(1364186197000, 58)形式,MongoDB2.0版本及MongoDB2.4版本是{ "t" : 1361948104000, "i" : 325 }形式,另外若用MongoDB2.4版本的客户端(mongo)查看2.2版本的,看到的是MongoDB2.4版本的格式,这个只与mongo版本有关。

oplog相关字段含义

ts: the time this operation occurred.

h: a unique ID for this operation. Each operation will have a different value in this field.

op: the write operation that should be applied to the slave. n indicates a no-op, this is just an informational message.

ns: the database and collection affected by this operation. Since this is a no-op, this field is left blank.

o: the actual document representing the op. Since this is a no-op, this field is pretty useless.

The o field now contains the document to insert or the criteria to update and remove. Notice that, for the update, there are two o fields (o and o2). o2 give the update criteria and o gives the modifications (equivalent to update()‘s second argument).

ts:8字节的时间戳,由4字节unix timestamp + 4字节自增计数表示。

这个值很重要,在选举(如master宕机时)新primary时,会选择ts最大的那个secondary作为新primary。

op:1字节的操作类型,例如i表示insert,d表示delete。

ns:操作所在的namespace。

o:操作所对应的document,即当前操作的内容(比如更新操作时要更新的的字段和值)

o2: 在执行更新操作时的条件,仅限于update时才有该属性。

其中op,可以是如下几种情形之一:

"i": insert

"u": update

"d": delete

"c": db cmd

"db":声明当前数据库 (其中ns 被设置成为=>数据库名称+ '.')

"n": no op,即空操作,其会定期执行以确保时效性 。

20130719更新:今天发现修改配置,会产生 "n" 操作

{ "ts" : Timestamp(1372320938000, 1), "h" : NumberLong("2050563086860406946"), "v" : 2, "op" : "n", "ns" : "", "o" : { "msg" : "Reconfig set", "version" : 6 } }

{ "ts" : Timestamp(1372319914000, 1), "h" : NumberLong("5828735007195954091"), "v" : 2, "op" : "n", "ns" : "", "o" : { "msg" : "Reconfig set", "version" : 5 } }

{ "ts" : Timestamp(1372318223000, 1), "h" : NumberLong("512600544405470974"), "v" : 2, "op" : "n", "ns" : "", "o" : { "msg" : "Reconfig set", "version" : 4 } }

1

2

3

{"ts":Timestamp(1372320938000,1),"h":NumberLong("2050563086860406946"),"v":2,"op":"n","ns":"","o":{"msg":"Reconfig set","version":6}}

{"ts":Timestamp(1372319914000,1),"h":NumberLong("5828735007195954091"),"v":2,"op":"n","ns":"","o":{"msg":"Reconfig set","version":5}}

{"ts":Timestamp(1372318223000,1),"h":NumberLong("512600544405470974"),"v":2,"op":"n","ns":"","o":{"msg":"Reconfig set","version":4}}

除了以上这些,还有两个bool型的字段,一个是上面提到的fromMigrate,另一是字段b,仔细看oplog我们发现有"b":true的文档,是在delete和update操作时的bool值(update一个或多个)。

举例:

{

"ts" : {

"t" : 1354923335000,

"i" : 2

},

"h" : NumberLong("563747339476084113"),

"op" : "u",

"ns" : "msg.device",

"o2" : {

"_id" : ObjectId("509fa1207386d978864c7833")

},

"o" : {

"$set" : {

"flag" : "1",

"pin" : "5126d5b23c303",

"device" : "ceb27de6b9dd8f045130f046a7662630",

"modified" : ISODate("2012-12-07T23:36:24.628Z")

}

}

}

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

{

"ts":{

"t":1354923335000,

"i":2

},

"h":NumberLong("563747339476084113"),

"op":"u",

"ns":"msg.device",

"o2":{

"_id":ObjectId("509fa1207386d978864c7833")

},

"o":{

"$set":{

"flag":"1",

"pin":"5126d5b23c303",

"device":"ceb27de6b9dd8f045130f046a7662630",

"modified":ISODate("2012-12-07T23:36:24.628Z")

}

}

}

了解了oplog的详细结构,我们就可以根据原理写个程序,来达到同步数据的目的,详见mongosync。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值