SolrCloud Capability Test

solrcloud 测试的一些中间产物。原记录在内部wiki上,现share出来

  • environment|

                SolrCloud servers:  X.X.X.251, X.X.X.252, X.X.X.253.     each with Mem:16G CPU:8 core 2.57GHz;

                zookeeper servers: X.X.X.22, X.X.X.23, X.X.X.24    

                OS: Linux  x86_64 GNU/Linux

                tools: jmeter2.6 , youykit11.0.5

  • config & start service

      zookeeper run as default parameters and config(zookeeper start)

             1. SlorCloud configeration refer to (solrloud start)

           a). for X.X.X.253 add  $SOLRCLOUD_HOME/example/solr/conf/schema.xml with fields as:

<fields>
<!−− Valid attributes  for  fields:
name: mandatory − the name  for  the field
type: mandatory − the name of a previously defined type from the
<types> section
indexed:  true  if  this  field should be indexed (searchable or sortable)
stored:  true  if  this  field should be retrievable
multiValued:  true  if  this  field may contain multiple values per document
omitNorms: (expert) set to  true  to omit the norms associated with
this  field ( this  disables length normalization and index−time
boosting  for  the field, and saves some memory). Only full−text
fields or fields that need an index−time boost need norms.
Norms are omitted  for  primitive (non−analyzed) types by  default .
termVectors:  false  set to  true  to store the term vector  for  a
given field.
When using MoreLikeThis, fields used  for  similarity should be
stored  for  best performance.
termPositions: Store position information with the term vector.
This will increase storage costs.
termOffsets: Store offset information with the term vector. This
will increase storage costs.
required: The field is required. It will  throw  an error  if  the
value does not exist
default : a value that should be used  if  no value is specified
when adding a document.
−−>
 
<field name= "id"  type= "string"  indexed= "true"  stored= "true"  required= "true"  />
<field name= "ts"  type= "text_general"  indexed= "true"  stored= "true" />
<field name= "name"  type= "text_general"  indexed= "true"  stored= "true" />
<field name= "age"  type= "text_general"  indexed= "true"  stored= "true" />
<field name= "company"  type= "text_general"  indexed= "true"  stored= "true" />
<field name= "branch"  type= "text_general"  indexed= "true"  stored= "true" />
<field name= "mail"  type= "text_general"  indexed= "true"  stored= "true" />
<field name= "interest"  type= "text_general"  indexed= "true"  stored= "true" />
<field name= "address"  type= "text_general"  indexed= "true"  stored= "true" />
<field name= "text_general"  type= "text_general"  indexed= "true"  stored= "false"  multiValued= "true"  />
 
</fields>

            b). change the jetty threads limit:

             open  $SOLR_HOME/example/etc/jetty.xml ,change maxThreads to 10000

<!−− =========================================================== −−>
<!−− Server Thread Pool −−>
<!−− =========================================================== −−>
<Set name= "ThreadPool" ><!−− Default queued blocking threadpool −−>
<New>
<Set name= "minThreads" > 10 </Set>
<Set name= "maxThreads" > 10000 </Set>
<Set name= "detailedDump" > false </Set>
</New></Set>         
this threads num will effect the pressure test. so a big vaule is necessary

          c). config the $SOLR_HOME/example/solr/solr.xml on 3 servers for core. 

          change the last default lines

<cores adminPath= "/admin/cores"  defaultCoreName= "collection1"  host= "${host:}"  hostPort= "${jetty.port:}" >   
   <core name= "collection1"  instanceDir= "."  /> 
</cores>

           to:

<cores defaultCoreName= "collection1"  adminPath= "/admin/cores"  hostPort= "${jetty.port:}" >
    <core schema= "schema.xml"  shard= "shard1"  instanceDir= "core1/"  name= "core1"  collection= "collection1"  conf= "solrconfig.xml" />
    <core schema= "schema.xml"  shard= "shard2"  instanceDir= "core2/"  name= "core2"  collection= "collection1"  conf= "solrconfig.xml" />
    <core schema= "schema.xml"  shard= "shard3"  instanceDir= "core3/"  name= "core3"  collection= "collection1"  conf= "solrconfig.xml" />
    <core schema= "schema.xml"  shard= "shard4"  instanceDir= "core4/"  name= "core4"  collection= "collection1"  conf= "solrconfig.xml" />
    <core schema= "schema.xml"  shard= "shard5"  instanceDir= "core5/"  name= "core5"  collection= "collection1"  conf= "solrconfig.xml" />
    <core schema= "schema.xml"  shard= "shard6"  instanceDir= "core6/"  name= "core6"  collection= "collection1"  conf= "solrconfig.xml" />
    <core schema= "schema.xml"  shard= "shard7"  instanceDir= "core7/"  name= "core7"  collection= "collection1"  conf= "solrconfig.xml" />
    <core schema= "schema.xml"  shard= "shard8"  instanceDir= "core8/"  name= "core8"  collection= "collection1"  conf= "solrconfig.xml" />
    <core schema= "schema.xml"  shard= "shard9"  instanceDir= "core9/"  name= "core9"  collection= "collection1"  conf= "solrconfig.xml" />
    <core schema= "schema.xml"  shard= "shard10"  instanceDir= "core10/"  name= "core10"  collection= "collection1"  conf= "solrconfig.xml" />
</cores>

           2. cd to $SOLR_HOME/example

           3. start SolrCloud on X.X.X.253 first with command:

nohup java −Dcom.sun.management.jmxremote −Dcom.sun.management.jmxremote.port= 3000  −Dcom.sun.management.jmxremote.ssl= false  −Dcom.sun.management.jmxremote.authenticate= false  
  −Xmx10g −Xms10g −Dbootstrap_confdir=./solr/conf −Dcollection.configName=myconf
−Djetty.port= 8900  −DzkHost=X.X.X. 22 : 2181 ,X.X.X. 23 : 2181 ,X.X.X. 24 : 2181  −jar start.jar >> solr.log  2 >& 1  &

JVM param :

−Dcom.sun.management.jmxremote −Dcom.sun.management.jmxremote.port= 3000  −Dcom.sun.management.jmxremote.ssl= false  −Dcom.sun.management.jmxremote.authenticate= false

  make solr JMX available. We can use jvisualvm.exe connect to X.X.X.253 to view the status of Solr. 

           4. start SolrCloud on X.X.X.251 and X.X.X.252 with command:

nohup java −agentpath:/home/hans/yjp− 11.0 . 7 /bin/linux−x86− 64 /libyjpagent.so=disablestacktelemetry,disableexceptiontelemetry,builtinprobes=none,delay= 10000 ,sessionname=JBoss
−Xmx10g −Xms10g
−Djetty.port= 8900  −DzkHost=X.X.X. 22 : 2181 ,X.X.X. 23 : 2181 ,X.X.X. 24 : 2181 −jar start.jar >> solr.log  2 >& 1  &

 JVM param:

−agentpath:/home/hans/yjp− 11.0 . 7 /bin/linux−x86− 64 /libyjpagent.so=disablestacktelemetry,disableexceptiontelemetry,builtinprobes=none,delay= 10000 ,sessionname=JBoss

make yourkit can connect to X.X.X.251 and X.X.X.252 to profiling. How to use yourkit?

          5. access solr admin tool http://X.X.X.252:8900/solr/#/~cloud?view=graph will see follow graph:

           so finally we got 10 shards and each with tow replicas------total 30 slices

  • preper jmeter script & index data

            1.  the index request for preper index-data to solrcloud cluster look like :

POST http: //X.X.X.252:8900/solr/collection1/update
 
POST data:
<add><doc>
<field name= "id" >4al6q90c−8ouj− 1255 −Sind−201206282rmo</field>
<field name= "ts" > 1340859312956 </field>
<field name= "name" >byan</field>
<field name= "age" > 21 </field>
<field name= "company" >Ciscobyan Systemsbyan, Incbyan</field>
<field name= "branch" >Cloudbyan Applicationbyan Servicesbyan</field>
<field name= "mail" >byan @cisco .com</field>
<field name= "interest" >Have intensive interest in Internet−surfingbyan,singingbyan, writingbyan and readingbyan </field>
<field name= "address" >abyan, Gatebyan Buidingbyan Streetbyan Provincebyan Contrybyan</field>
</doc></add>
 
no cookies
 
Request Headers:
Content−Length:  598
Connection: keep−alive
Content−Type: application/xml
 
 
       document  for  indexing:<add><doc>
<field name= "id" >c7$
{id_counter}
</field>
<field name= "ts" > 1342463467567 </field>
<field name= "name" >person</field>
<field name= "age" > 10 </field>
<field name= "company" >Cisco Systems, Inc.</field>
<field name= "branch" >Cloud Application Services</field>
<field name= "mail" >person @cisco .com</field>
<field name= "interest" >Have intensive interest in Internet−surfing,singing, writing and reading.</field>
<field name= "address" >address,The Golden Gate Bridge,Wall Street.</field>
</doc></add>

     each doc have a size about 300 bytes

             the detail of this script can find here

         2. we plan index about 50G data for each shard of SolrCloud for this test 

  • The First Issue

       We open 10 jemeter script for indexing. it about 2M data  was indexed per second. after we got about 40G index data in all shards. the old GC region can't be collected though it is full. and some other data post down:

Solr Admin of X.X.X.252:

  

GC status:

 heap dump by yourkit:

1. class list and objects

  

2. object

  

3. the biggest Objets:

  

  4. The corresponding source code:

the class TokenizerChain 

IndexSchema hava two members

private  Analyzer analyzer; 
private  Analyzer queryAnalyzer;

a). Solr class IndexSchema's constructor load the FieldTypePluginLoader then set SolrIndexAnalyzer as the indexSchema's Analyzer with Function refreshAnalyzers().

public  void  refreshAnalyzers()
   {
     analyzer =  new  SolrIndexAnalyzer();
     queryAnalyzer =  new  SolrQueryAnalyzer();
   }
private  class  SolrIndexAnalyzer  extends  AnalyzerWrapper {
     protected  final  HashMap<String, Analyzer> analyzers;
 
     SolrIndexAnalyzer() {
       analyzers = analyzerCache();
     }
 
     protected  HashMap<String, Analyzer> analyzerCache() {
       HashMap<String, Analyzer> cache =  new  HashMap<String, Analyzer>();
       for  (SchemaField f : getFields().values()) {
         Analyzer analyzer = f.getType().getAnalyzer();
         cache.put(f.getName(), analyzer);
       }
       return  cache;
     }
    ……
    ……
    ……
}

but for AnalyzerWrapper

public  abstract  class  AnalyzerWrapper  extends  Analyzer {
 
   /**
    * Creates a new AnalyzerWrapper.  Since the {@link Analyzer.ReuseStrategy} of
    * the wrapped Analyzers are unknown, {@link Analyzer.PerFieldReuseStrategy} is assumed
    */
   protected  AnalyzerWrapper() {
     super ( new  PerFieldReuseStrategy());
   }
  ……
  ……
  ……
}
this PerFieldReuseStrategy catch annalyzer by FieldName rather than FieldType. this policy would cause redundant resouce be used.
change to catch analyzer by FieldType

We see the class TokenStreamComponents hold most of the memory. there are more than 1000 TokenStreamComponents object .  By reviewing the code we find luncene Analyzier create this objects and cache them for reuse.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值