SolrCloud Transaction Log 是如何工作的?

本内容由我的同事Hans Tan 提供,在此感谢Hans 分享如下内容

WHY TRANSACTION LOG?

  • A transaction log records all operations performed on an Index between two hard commits
  • Each hard commit starts a new transaction log because a commit guarantees durability of operations performed before it
  • With transaction log, we can benefit with realtime-get feature. In some case, NRT(near real time) search is still not acceptable, for example, we need to get latest version of the document in concurrent updates.
  • One can recovery from transaction log in case of JVM crash and "Kill -9" scenario. 
  • It also allows for a peer to ask "give me the list of the last update events you know about".

IMPLEMENTATION

UpdateLog will be initialized when SolrCore startup or reload

  • The add(), delete(), deleteByQuery() will be called each time has such request come in, and then followed by finish().
  • The preCommit(), postCommit(), preSoftCommit() and postSoftCommit() will be called when has commit/soft commit request or want to close indexWriter.
  • UpdateLog has 4 state
    • REPLAYING -----This core is replaying from log, should do this replaying before register in Zookeeper
    • BUFFERING ---- When core do recovery from leader, all request will buffering and wait to replay later. When in BUFFERING state, all commands will be marked with a flag (FLAG_GAP = 0x10)
    • APPLYING_BUFFERED ---- After recovery have finished replicate step, will start to replay buffered documents
    • ACTIVE ---- In this state, core can receive and handle request normally.
  • UpdateLog has 3 flush strategy
    • NONE  - do nothing
    • FLUSH - only flush buffer for the buffered streaming, but not flush for underlying stream
    • FSYNC - return when data is write into device
  • UpdateLog use a LinkedList "logs" to keep recent log files, newest first.  Each time in postCommit(), the previous tlog will add into this list, and then to check if the numRecords > 100 or log file size > 10, if true, the oldest one will be removed from this list.

LOG START PROCESS DIAGRAM

WHAT HAPPENED WHEN COMMIT()?

1. use new map to store log, this is used for RealTimegetComponent, set preTlog=tlog, and tlog=null;

2. commit index writer, maybe open new searcher

3. add commit flag to preTlog, and add preTlog in logs list.

WHAT HAPPENED WHEN SOFTCOMMIT()?

1. use new map to store log

2. open new searcher

3. clear old map data

TRANSACTION LOG FORMAT FOR SOLRINPUTDOCUMENT

Following is the code in JavaBinCodec to write a SolrInputDocument into log file:

public  void  writeSolrInputDocument(SolrInputDocument sdoc)  throws  IOException {
    writeTag(SOLRINPUTDOC, sdoc.size()); //SOLRINPUTDOC=16 is the tag to indicate following value should be size of key−value pair in the solr document
    writeFloat(sdoc.getDocumentBoost()); //document boost
     for  (SolrInputField inputField : sdoc.values()) {
       if  (inputField.getBoost() !=  1 .0f) {
        writeFloat(inputField.getBoost()); //field boost if any
      }
      writeExternString(inputField.getName());  //field name
      writeVal(inputField.getValue()); //field value
    }
  }

For writeVal(), please check following code:

//if the object type is known, will write using this type related method and then return,
//else if have given resolver, using this resolver to decode the object
//otherwise, write class name and toString() value to log file.
public  void  writeVal(Object val)  throws  IOException {
     if  (writeKnownType(val)) {
       return ;
    }  else  {
      Object tmpVal = val;
       if  (resolver !=  null ) {
        tmpVal = resolver.resolve(val,  this );
         if  (tmpVal ==  null return // null means the resolver took care of it fully
         if  (writeKnownType(tmpVal))  return ;
      }
    }
 
    writeVal(val.getClass().getName() +  ':'  + val.toString());
  }

For wirteKnownType(), it has following known type: primitive, SolrDocumentList, NamedList, Collection, Object[], SolrDocument, SolrInputDocument, Map, Iterator, Iterable 

Let use Long type as an example:

public  void  writeLong( long  val)  throws  IOException {
     if  ((val & 0xff00000000000000L) ==  0 ) { //Any value that small than 0xff00000000000000L(only highest 8 bit is 1, other bit all 0) will be treat as small long
       int  b = SLONG | (( int ) val &  0x0f ); //SLONG=96( or in binary 01100000), this line used to get lowest 4 bit
       if  (val >=  0x0f ) {  //if val>=15
        b |=  0x10 ;
        daos.writeByte(b); //write 01110000|(val lowest 4 bit), this used to mark that has data later, need to read continuously.
        writeVLong(val >>>  4 , daos); //right shift 4 bits, and use variable algorithm to write long, int this algorithm, each byte's highest bit used to as mark that
                                     //shows if have additional byte later, other 7 bits store the value
      }  else  { //if val<15, write tag and value together
        daos.writeByte(b);
      }
    }  else  { //really large long value
      daos.writeByte(LONG); //write tag firstly
      daos.writeLong(val); //write value byte by byte
    }
  }
 
public  long  readSmallLong(FastInputStream dis)  throws  IOException {
     long  v = tagByte &  0x0F ;
     if  ((tagByte &  0x10 ) !=  0 ) //in this case, the value should >=15
      v = (readVLong(dis) <<  4 ) | v;
     return  v;
  }
Example: How to write a value < 15  and >= 15
 
long  val= 13 // 00001101 
 
00001101  & 0xff00000000000000L=  0
b =  01100000  |( 00001101 & 00001111 ) =  01100000 | 00001101 01101101
val< 15
 
daos.writeByte( 01101101 )
 
 
 
long  val=  287 ; // 0001 00011111
 
val & 0xff00000000000000L=  0
 
b=  0110  0000  |( 0001  00011111  00001111 ) =  01101111
 
val> 15
 
b= b| 0x10  00010000 | 01101111  01111111
 
writeByte( 01111111 )
 
writeVlong( 0001  00011111  >>>  4 )=writeVong( 00010001 )
 
  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值