This page show the major procedures in the progress of Solr4.0 startup
SolrDispatchFilter.init(FilterConfig config) init the CoreContainer firstly.
public
void
init(FilterConfig config)
throws
ServletException
{
...........
CoreContainer.Initializer init = createInitializer();
...........
this
.cores = init.initialize();
..........
}
|
then CoreContainer.Initalizer.initializer() call the CoreContainer.load()
public
CoreContainer initialize()
throws
IOException,
ParserConfigurationException, SAXException {
CoreContainer cores =
null
;
String solrHome = SolrResourceLoader.locateSolrHome();
File fconf =
new
File(solrHome, containerConfigFilename ==
null
?
"solr.xml"
: containerConfigFilename);
cores =
new
CoreContainer(solrHome);
if
(fconf.exists()) {
cores.load(solrHome, fconf);
}
else
{
log.info(
"no solr.xml file found − using default"
);
cores.load(solrHome,
new
InputSource(
new
ByteArrayInputStream(DEF_SOLR_XML.getBytes(
"UTF−8"
))));
cores.configFile = fconf;
}
containerConfigFilename = cores.getConfigFile().getName();
return
cores;
}
}
|
CoreContainer.load(solrHome, fconf) call CoreContainer.load(String dir, InputSource cfgis). This function is the most important part for Solr4.0's startup. Many members of CoreContainer initialize here, including OverSeer, ZkCotroller,CoreAdminHandler and CollectionHandler. Now we go in to this function
..........
initZooKeeper(zkHost, zkClientTimeout);
//this calling will initialize the zkControler
..........
coreAdminHandler =
new
CoreAdminHandler(
this
);
..........
..........
NodeList nodes = (NodeList)cfg.evaluate(
"solr/cores/core"
, XPathConstants.NODESET);
//got croe config info from solr.xml
for
(
int
i=
0
; i<nodes.getLength(); i++) {
Node node = nodes.item(i);
.........
.........
CoreDescriptor p =
new
CoreDescriptor(
this
, name, DOMUtil.getAttr(node,
"instanceDir"
,
null
));
.........
.........
SolrCore core = create(p);
//each Core create and initialize here. All important features will create
register(name, core,
false
);
.........
.........
}
|
Core created but did not register. The CoreContainer.register(String name, SolrCore core, boolean returnPrevNotClosed) will register the core to zkController. above register(name, core, false) do this job. At the same time the register(name, core, false) will publice the core status to overseer. register(name, core, false) will call ZkController.register(String coreName, final CoreDescriptor desc, boolean recoverReloadedCores) to update this core's cloud status, including join leaderElection line and so on.
public
String register(String coreName,
final
CoreDescriptor desc,
boolean
recoverReloadedCores)
throws
Exception {
........
........
joinElection(desc);
........
........
if
(!core.isReloaded() && ulog !=
null
) {
//recover From Log if core is not reload
Future<UpdateLog.RecoveryInfo> recoveryFuture = core.getUpdateHandler()
.getUpdateLog().recoverFromLog();
.......
}
..........
boolean
didRecovery = checkRecovery(coreName, desc, recoverReloadedCores, isLeader, cloudDesc,
collection, coreZkNodeName, shardId, leaderProps, core, cc);
if
(!didRecovery) {
publish(desc, ZkStateReader.ACTIVE);
}
..........
zkStateReader.updateCloudState(
true
);
return
shardId;
}
|
1. zkController.joinElection(desc) decide whether this core is a leader. if it's a leader then call runIamLeader() else start a watcher to watch the former core's status. thezkController.joinElection(desc) call LeaderElector.joinElection(context) as follow:
public
int
joinElection(ElectionContext context)
throws
KeeperException, InterruptedException, IOException {
......
int
seq = getSeq(leaderSeqPath);
checkIfIamLeader(seq, context,
false
);
.......
}
|
then LeaderElector.checkIfIamLeader(seq, context, false):
/**
* Check if the candidate with the given n_* sequence number is the leader.
* If it is, set the leaderId on the leader zk node. If it is not, start
* watching the candidate that is in line before this one - if it goes down, check
* if this candidate is the leader again.
**/
private
void
checkIfIamLeader(
final
int
seq,
final
ElectionContext context,
boolean
replacement)
throws
KeeperException,
InterruptedException, IOException {
// get all other numbers...
final
String holdElectionPath = context.electionPath + ELECTION_NODE;
List<String> seqs = zkClient.getChildren(holdElectionPath,
null
,
true
);
sortSeqs(seqs);
List<Integer> intSeqs = getSeqs(seqs);
if
(seq <= intSeqs.get(
0
)) {
runIamLeaderProcess(context, replacement);
}
else
{
// I am not the leader − watch the node below me
int
i =
1
;
for
(; i < intSeqs.size(); i++) {
int
s = intSeqs.get(i);
if
(seq < s) {
// we found who we come before − watch the guy in front
break
;
}
}
int
index = i −
2
;
if
(index <
0
) {
log.warn(
"Our node is no longer in line to be leader"
);
return
;
}
try
{
zkClient.getData(holdElectionPath +
"/"
+ seqs.get(index),
new
Watcher() {
@Override
public
void
process(WatchedEvent event) {
// am I the next leader?
try
{
checkIfIamLeader(seq, context,
true
);
}
catch
(InterruptedException e) {
// Restore the interrupted status
Thread.currentThread().interrupt();
log.warn(
""
, e);
}
catch
(IOException e) {
log.warn(
""
, e);
}
catch
(Exception e) {
log.warn(
""
, e);
}
}
},
null
,
true
);
}
catch
(KeeperException.SessionExpiredException e) {
throw
e;
}
catch
(KeeperException e) {
// we couldn't set our watch − the node before us may already be down?
// we need to check if we are the leader again
checkIfIamLeader(seq, context,
true
);
}
}
}
|
2. for core.getUpdateHandler().getUpdateLog().recoverFromLog(); this will get the UpdateLog from DirectUodateHandler2. UpdateLog call the recoverFromLog() function. this call will start a new thread to replay local updateLog belong to local machine. recoverFromLog() recover the local transation log primarily. the UpdateLog.recoverFromLog() as below:
public
Future<RecoveryInfo> recoverFromLog() {
recoveryInfo =
new
RecoveryInfo();
List<TransactionLog> recoverLogs =
new
ArrayList<TransactionLog>(
1
);
for
(TransactionLog ll : newestLogsOnStartup) {
if
(!ll.try_incref())
continue
;
try
{
if
(ll.endsWithCommit()) {
ll.decref();
continue
;
}
}
catch
(IOException e) {
log.error(
"Error inspecting tlog "
+ ll);
ll.decref();
continue
;
}
recoverLogs.add(ll);
}
if
(recoverLogs.isEmpty())
return
null
;
ExecutorCompletionService<RecoveryInfo> cs =
new
ExecutorCompletionService<RecoveryInfo>(recoveryExecutor);
LogReplayer replayer =
new
LogReplayer(recoverLogs,
false
);
versionInfo.blockUpdates();
try
{
state = State.REPLAYING;
}
finally
{
versionInfo.unblockUpdates();
}
// At this point, we are guaranteed that any new updates coming in will see the state as "replaying"
return
cs.submit(replayer, recoveryInfo);
}
|
3. for ZkController.checkRecovery(coreName, desc, recoverReloadedCores, isLeader, cloudDesc, collection, coreZkNodeName, shardId, leaderProps, core, cc) ,it's a distributed recovery. This process will not do if this core is a leader, or will do recovery. The function will start a new thread named RecoveryStrategy, and this thread is the job holder.
- If this is the first time, try to recovery from the PeerSync.sync(). this action will try to recovery form leader's updateLog. Turn to step 2 if failed
- do distributed recovery. RecoveryStrategy.replicate(String nodeName, SolrCore core, ZkNodeProps leaderprops, String baseUrl) will call ReplicationHandler.doFetch() to fetch index files from leader and try to recovery from those files.