solr创建索引源码解析

最新推荐文章于 2019-09-22 20:27:16 发布

开源技术之家

最新推荐文章于 2019-09-22 20:27:16 发布

阅读量1.4k

点赞数

分类专栏： solr

本文链接：https://blog.csdn.net/luckyboyguo/article/details/22157613

版权

solr 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

先说下创建索引源码流程：

源码类：
1.CommonHttpSolrServer (SolrServer的子类)
2.SolrServer（抽象类）
3.SolrRequest (基类)
4.AbstractUpdateRequest (抽象类、SolrRequest的子类)
5.UpdateRequest (AbstractUpdateRequest的子类)
6.SolrInputDocument （设置需要索引的名称和值、这个应该放在第一位）

创建索引代码：

查询数据库数据，或者其他文档数据进行索引

          private void updateBook(String sql, String url, String idColumn,
			String timeColumn,BufferedWriter dataFile) throws Exception {
		long start = System.currentTimeMillis();
                <span></span> SolrUtil solrUtil = new SolrUtil(url);//初始化索引
		SolrDocument doc = SqlSh.getSolrMaxDoc(solrUtil, idColumn, timeColumn);
		if (doc == null) {
			CommonLogger.getLogger().error("solr no data.");
			return;
		}
		int maxId = Integer.parseInt(doc.get(idColumn).toString());
		long maxTime = Long.parseLong(doc.get(timeColumn).toString())*1000;
		Date maxDate = new Date(maxTime);
		
		DateFormat dateFormat2 = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
                //获取数据库需要索引的数据
		ResultSet rs = stmt_m.executeQuery(String.format(sql,
				dateFormat2.format(maxDate)));
		//获取需要创建索引的key
                initColumeMeta(rs.getMetaData());

                //解析数据并索引
		parseRs(rs, solrUtil);

		rs.close();
               
                //优化索引
		solrUtil.server.optimize();

		CommonLogger.getLogger().info(
				"update book time:" + (System.currentTimeMillis() - start)
						/ 1000 + "s");
	}

咱们看下上面代码的parseRs方法

//下面是简单的解析数据方法并写索引

 private void parseRs(ResultSet rs, SolrUtil solrUtil) throws <span></span> Exception {
		Collection<SolrInputDocument> docs=new ArrayList<SolrInputDocument>();
		SolrInputDocument doc = null;
		int locBk = 0;
		boolean flag=true;
		StringBuilder sb=null;
		String vl=null;
		try {
			while (rs.next()) {
				doc = new SolrInputDocument();
				for (int i = 0; i < ToolMain.columnNames.length; i++) {
					doc.addField(
					    ToolMain.columnNames[i],
						getColumnValue(
						    rs.getObject(ToolMain.columnNames[i]),
							ToolMain.columnTypes[i]));//此方法为设置一个域，可以添加一个参数来设置权重
				}
				docs.add(doc);
				locBk++;
				if (docs.size() >= 1000) {
					solrUtil.addDocList(docs);//创建索引和提交索引操作都在这里面
					docs.clear();
				}
			}
			if (docs.size() > 0) {
				solrUtil.addDocList(docs);
				docs.clear();
			}
		} catch (Exception e) {
			throw e;
		} finally {
			docs.clear();
			docs = null;
		}
	}

3.下面来说明下SolrUtil类，此类主要是封装了CommonHttpSolrServer

import java.util.Collection;

import log.CommonLogger;

import org.apache.solr.client.solrj.impl.CommonsHttpSolrServer;
import org.apache.solr.common.SolrInputDocument;

public class SolrUtil {
	public CommonsHttpSolrServer server = null;

	public String url = "";//url为solr服务的地址
	public  String shards = "";

	public SolrUtil(String url) {
		this.url = url;
		initSolr();
	}
	public SolrUtil(String url,String shards) {
		this.url = url;
		this.shards=shards;
		initSolr();
	}
       //初始化Server
	private void initSolr() {
		try {
			server = new CommonsHttpSolrServer(url);
			server.setSoTimeout(60*1000);
			server.setConnectionTimeout(60*1000);
			server.setDefaultMaxConnectionsPerHost(1000);
			server.setMaxTotalConnections(1000);
			server.setFollowRedirects(false);
			server.setAllowCompression(true);
		} catch (Exception e) {
			e.printStackTrace();
			System.exit(-1);
		}
	}
	//封装了add、commit
	public void addDocList(Collection<SolrInputDocument> docs) {
		try {
			server.add(docs);
			server.commit();
			docs.clear();//释放
		} catch (Exception e) {
			CommonLogger.getLogger().error("addDocList error.", e);
		}
	}
	
	public void deleteDocByQuery(String query) throws Exception { 
		try {
			server.deleteByQuery(query);
			server.commit();
		} catch (Exception e) {
			CommonLogger.getLogger().error("deleteDocByQuery error.", e);
			throw e;
		}
	}
}

4.现在来看看solr创建索引的源码

其实源码执行的操作无非是生成请求request 返回response

1.上面代码中的SolrInputDocument 类所做的操作

    public class SolrInputDocument implements Map<String,SolrInputField>, Iterable<SolrInputField>, Serializable   //实现了Map和Iterable的接口并且实现了接口中的方法，其主要的类为SolrInputFiled类

    public class SolrInputField implements Iterable<Object>, Serializable //类中只有三个属性，String key，Object value，还包括评分  float boost = 1.0f; 默认是1.0f（如果做权重的话可以设置这个值）

再来看下执行的CommonHttpSolrServer类所做的操作（表现形式在SolrUtil中的addDocList）

2.添加文档方法

public UpdateResponse add(Collection<SolrInputDocument> docs )                         throws SolrServerException, IOException {

            UpdateRequest req = new UpdateRequest();//创建一个request

          req.add(docs);//调用UpdateRequest的add方法，添加索引文档
          return req.process(this);//亲 重点是这个方法（返回的是response）
   }

        //再看下UpdateRequest的add方法
        private List<SolrInputDocument> documents = null;
        public UpdateRequest add( final Collection<SolrInputDocument> docs )
        {
            if( documents == null ) {
                  documents = new ArrayList<SolrInputDocument>( docs.size()+1 );
            }
            documents.addAll( docs );
            return this;
        }

3.提交方法 commit，调用的是SolrServer类中的

 public UpdateResponse commit( boolean waitFlush, boolean waitSearcher ) throws Solr    ServerException, IOException {
        return new UpdateRequest().setAction( UpdateRequest.ACTION.COMMIT, waitFlush, waitSearcher ).process( this );//看到了吗？
<pre class="brush:java; toolbar: true; auto-links: false;"> setAction都是为了对对象ModifiableSolrParams（这个对象在最终CommonHttpSolrServerrequest的request方法中用的到）</pre>
<span></span> 在提交索引的时候也是调用的process方法
  }

4.优化索引

   public UpdateResponse optimize(boolean waitFlush, boolean waitSearcher,                 int maxSegments ) throws SolrServerException, IOException {
            return new UpdateRequest().setAction( UpdateRequest.ACTION.OPTIMIZE, waitFlush, waitSearcher, maxSegments ).process( this );//同样调用process，通过setAction参数，在CommonHttpSolrServer类方法request（）中主要执行的是合并和压缩  setAction都是为了对对象ModifiableSolrParams（这个对象在最终CommonHttpSolrServer的request方法中用的到）进行赋值
   }

5.既然上面都提到了process方法，那我们来看看

@Override
     public UpdateResponse process( SolrServer server ) throws SolrServerException,             IOException
     {
           long startTime = System.currentTimeMillis();
           UpdateResponse res = new UpdateResponse();
           res.setResponse( server.request( this ) );//这里面这个方法可是重点之重啊，这是调用了 CommonHttpSolrServer类中的request方法
           res.setElapsedTime( System.currentTimeMillis()-startTime );
           return res;
     }

6.最终的方法是SolrServer的子类CommonHttpSolrServer类的request方法，咋再来看看这个方法是怎么工作的

public NamedList<Object> request(final SolrRequest request, ResponseParser processor    ) throws SolrServerException, IOException {
    
    HttpMethod method = null;
    InputStream is = null;
    SolrParams params = request.getParams();
    Collection<ContentStream> streams = requestWriter.getContentStreams(request);
    String path = requestWriter.getPath(request);
    
    //创建索引进来的是/update  /select 为查询  
    if( path == null || !path.startsWith( "/" ) ) {
      path = "/select";
    }
    
    ResponseParser parser = request.getResponseParser();
    if( parser == null ) {
      parser = _parser;
    }
    
    // The parser 'wt=' and 'version=' params are used instead of the original params
    ModifiableSolrParams wparams = new ModifiableSolrParams();
    wparams.set( CommonParams.WT, parser.getWriterType() );
    wparams.set( CommonParams.VERSION, parser.getVersion());
    if( params == null ) {
      params = wparams;
    }
    else {
      params = new DefaultSolrParams( wparams, params );
    }
    
    if( _invariantParams != null ) {
      params = new DefaultSolrParams( _invariantParams, params );
    }

    int tries = _maxRetries + 1;
    try {
      while( tries-- > 0 ) {
        // Note: since we aren't do intermittent time keeping
        // ourselves, the potential non-timeout latency could be as
        // much as tries-times (plus scheduling effects) the given
        // timeAllowed.
        try {//通过使用查看solr源码，在使用UpdateRequest对象时会自动设置为Post
          if( SolrRequest.METHOD.GET == request.getMethod() ) {
            if( streams != null ) {
                  <span></span>throw new SolrException( SolrException.ErrorCode.BAD_REQUEST, "GET can't send streams!" );
            }
            method = new GetMethod( _baseURL + path + ClientUtils.toQueryString( params, false ) );
          }
          else if( SolrRequest.METHOD.POST == request.getMethod() ) {//所以我们直接看

            String url = _baseURL + path;
            boolean isMultipart = ( streams != null && streams.size() > 1 );

            if (streams == null || isMultipart) {
              PostMethod post = new PostMethod(url);//设置post，包括request头部、内容、参数、等等一些操作
              post.getParams().setContentCharset("UTF-8");
              if (!this.useMultiPartPost && !isMultipart) {
                post.addRequestHeader("Content-Type",
                    "application/x-www-form-urlencoded; charset=UTF-8");
              }

              List<Part> parts = new LinkedList<Part>();
              Iterator<String> iter = params.getParameterNamesIterator();
              while (iter.hasNext()) {
                String p = iter.next();
                String[] vals = params.getParams(p);
                if (vals != null) {
                  for (String v : vals) {
                    if (this.useMultiPartPost || isMultipart) {
                      parts.add(new StringPart(p, v, "UTF-8"));
                    } else {
                      post.addParameter(p, v);
                    }
                  }
                }
              }

              if (isMultipart) {
                int i = 0;
                for (ContentStream content : streams) {
                  final ContentStream c = content;

                  String charSet = null;
                  PartSource source = new PartSource() {
                    public long getLength() {
                      return c.getSize();
                    }
                    public String getFileName() {
                      return c.getName();
                    }
                    public InputStream createInputStream() throws IOException {
                      return c.getStream();
                    }
                  };
                
                  parts.add(new FilePart(c.getName(), source, 
                                         c.getContentType(), charSet));
                }
              }
              if (parts.size() > 0) {
                post.setRequestEntity(new MultipartRequestEntity(parts
                    .toArray(new Part[parts.size()]), post.getParams()));
              }

              method = post;
            }
            // It is has one stream, it is the post body, put the params in the URL
            else {
              String pstr = ClientUtils.toQueryString(params, false);
              PostMethod post = new PostMethod(url + pstr);

              // Single stream as body
              // Using a loop just to get the first one
              final ContentStream[] contentStream = new ContentStream[1];
              for (ContentStream content : streams) {
                contentStream[0] = content;
                break;
              }
              if (contentStream[0] instanceof RequestWriter.LazyContentStream) {
                post.setRequestEntity(new RequestEntity() {
                  public long getContentLength() {
                    return -1;
                  }

                  public String getContentType() {
                    return contentStream[0].getContentType();
                  }

                  public boolean isRepeatable() {
                    return false;
                  }

                  public void writeRequest(OutputStream outputStream) throws IOException {
                    ((RequestWriter.LazyContentStream) contentStream[0]).writeTo(outputStream);
                  }
                }
                );

              } else {
                is = contentStream[0].getStream();
                post.setRequestEntity(new InputStreamRequestEntity(is, contentStream[0].getContentType()));
              }
              method = post;
            }
          }
          else {
            throw new SolrServerException("Unsupported method: "+request.getMethod() );
          }
        }
        catch( NoHttpResponseException r ) {
          // This is generally safe to retry on
          method.releaseConnection();
          method = null;
          if(is != null) {
            is.close();
          }
          // If out of tries then just rethrow (as normal error).
          if( ( tries < 1 ) ) {
            throw r;
          }
          //log.warn( "Caught: " + r + ". Retrying..." );
        }
      }
    }
    catch( IOException ex ) {
      throw new SolrServerException("error reading streams", ex );
    }

    method.setFollowRedirects( _followRedirects );
    method.addRequestHeader( "User-Agent", AGENT );
    if( _allowCompression ) {
      method.setRequestHeader( new Header( "Accept-Encoding", "gzip,deflate" ) );
    }

    try {
      // Execute the method.
      //System.out.println( "EXECUTE:"+method.getURI() );
      //执行请求，返回状态码，然后组装response 最后返回
      int statusCode = _httpClient.executeMethod(method);
      if (statusCode != HttpStatus.SC_OK) {
        StringBuilder msg = new StringBuilder();
        msg.append( method.getStatusLine().getReasonPhrase() );
        msg.append( "\n\n" );
        msg.append( method.getStatusText() );
        msg.append( "\n\n" );
        msg.append( "request: "+method.getURI() );
        throw new SolrException(statusCode, java.net.URLDecoder.decode(msg.toString(), "UTF-8") );
      }

      // Read the contents
      String charset = "UTF-8";
      if( method instanceof HttpMethodBase ) {
        charset = ((HttpMethodBase)method).getResponseCharSet();
      }
      InputStream respBody = method.getResponseBodyAsStream();
      // Jakarta Commons HTTPClient doesn't handle any
      // compression natively.  Handle gzip or deflate
      // here if applicable.
      if( _allowCompression ) {
        Header contentEncodingHeader = method.getResponseHeader( "Content-Encoding" );
        if( contentEncodingHeader != null ) {
          String contentEncoding = contentEncodingHeader.getValue();
          if( contentEncoding.contains( "gzip" ) ) {
            //log.debug( "wrapping response in GZIPInputStream" );
            respBody = new GZIPInputStream( respBody );
          }
          else if( contentEncoding.contains( "deflate" ) ) {
            //log.debug( "wrapping response in InflaterInputStream" );
            respBody = new InflaterInputStream(respBody);
          }
        }
        else {
          Header contentTypeHeader = method.getResponseHeader( "Content-Type" );
          if( contentTypeHeader != null ) {
            String contentType = contentTypeHeader.getValue();
            if( contentType != null ) {
              if( contentType.startsWith( "application/x-gzip-compressed" ) ) {
                //log.debug( "wrapping response in GZIPInputStream" );
                respBody = new GZIPInputStream( respBody );
              }
              else if ( contentType.startsWith("application/x-deflate") ) {
                //log.debug( "wrapping response in InflaterInputStream" );
                respBody = new InflaterInputStream(respBody);
              }
            }
          }
        }
      }
      return processor.processResponse(respBody, charset);
    }
    catch (HttpException e) {
      throw new SolrServerException( e );
    }
    catch (IOException e) {
      throw new SolrServerException( e );
    }
    finally {
      method.releaseConnection();
      if(is != null) {
        is.close();
      }
    }
  }

下面是文字说明：

1.查询数据库或者读取文件等等按找自己的方式存入SolrInputDocument中、 SolrInputDocument中会定义一个map来存储（正真的对象是SolrInputFiled）

2.初始化CommonHttpSolrServer ，包括服务url（solr服务地址）、超时时间、最大链接数等等（SolrUtil类）

3.SolrServer类的add/commit/optimize方法最终调用的都是 AbstractUpdateRequest类中的process方法

开源技术之家

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
solr创建索引源码解析

先说下创建索引源码流程：源码类： 1.CommonHttpSolrServer (SolrServer的子类) 2.SolrServer（抽象类） 3.SolrRequest (基类) 4.AbstractUpdateRequest (抽象类、SolrRequest的子类)
复制链接

扫一扫