solr中对于关键字置顶（竞价排名）、拉黑的源码实现已经实例讲解（二）

最新推荐文章于 2021-03-11 06:12:52 发布

iteye_14612

最新推荐文章于 2021-03-11 06:12:52 发布

阅读量345

点赞数

分类专栏： solr

本文链接：https://blog.csdn.net/iteye_14612/article/details/82680013

版权

solr 专栏收录该内容

21 篇文章 0 订阅

订阅专栏

继续看他的源码，在上一篇博客中还有几个方法没有看，第一个是getElevationMap，如果在请求中没有指定elevateIds或者没有指定excludeIds的话，则调用这个方法

 /** get the elevation map from the data dir <br/> 从data中读取配置文件。 */
  Map<String,ElevationObj> getElevationMap(IndexReader reader, SolrCore core) throws Exception {
    
    synchronized (elevationCache) {
      
      // 如果在配置文件中设置了（不是solrCloud的时候），则不再读取，因为此时添加的话，key就是null。
      Map<String,ElevationObj> map = elevationCache.get(null);
      if (map != null) return map;
      
      map = elevationCache.get(reader);//根据indexReader读取，如果reader发生了变化，则重新读取，否则不读取，读取的话就会重新加载elevator.xml。
      if (map == null) {
        String f = initArgs.get(CONFIG_FILE);
        if (f == null) {
          throw new SolrException(SolrException.ErrorCode.SERVER_ERROR,
              "QueryElevationComponent must specify argument: " + CONFIG_FILE);
        }
        log.info("Loading QueryElevation from data dir: " + f);
        
        Config cfg;
        
        // 读取配置文件可以从zk上读取（solrCloud），也可以从本地读取（solr）
        ZkController zkController = core.getCoreDescriptor().getCoreContainer().getZkController();
        if (zkController != null) {
          cfg = new Config(core.getResourceLoader(), f, null, null);
        } else {
          InputStream is = VersionedFile.getLatestFile(core.getDataDir(), f);//从data中读取
          cfg = new Config(core.getResourceLoader(), f, new InputSource(is), null);
        }
        
        map = loadElevationMap(cfg);
        elevationCache.put(reader, map);
      }
      return map;
    }
  }

从这个方法中可以得出，如果你的要置顶的document是变化的话，如果你使用的是单机版的solr（不是solrCloud）就不要设置在conf中，不然你必须重启才可以重新加载这个配置文件。如果elevator.xml在solrCloud或者是data下的话，只要indexReader一发生变化，就会重新加载，也就是一commit就会重新读取。

我们再看一下ElevatorObj的代码，这个类用于封装一个指定，也即是封装一个文本值，和要置顶以及拉黑的document id。

  class ElevationObj {
    
    /** 穿入的text，也就是搜索的词 */
    final String text;
    /** 对上面的text进行分词之后的结果，可能和text一样，也可能不一样 */
    final String analyzed;
    /**排除的id组成的termQuery*/
    final TermQuery[] exclude;
    /**要置顶的那些document封装的query*/
    final BooleanQuery include;
     /**每一个id的权重，由大变小*/
    final Map<BytesRef,Integer> priority;
    /**这个是包含的id*/
    final Set<String> ids;
    /**这个是排除的id*/
    final Set<String> excludeIds;
    
    // 第一个参数是文本值，第二个参数是包含的多个id，第三个是排除的多个id
    ElevationObj(String qstr, List<String> elevate, List<String> exclude) throws IOException {
      
      this.text = qstr;
      this.analyzed = getAnalyzedQuery(this.text);//将文本分词
      this.ids = new HashSet<>();
      this.excludeIds = new HashSet<>();
      
      this.include = new BooleanQuery();
      this.include.setBoost(0);
      this.priority = new HashMap<>();
      int max = elevate.size() + 5;
      
      //  对于要置顶的doc，采用的是封装进一个booleanQuery.
      for (String id : elevate) {
        id = idSchemaFT.readableToIndexed(id);//没有操作
        ids.add(id);
        TermQuery tq = new TermQuery(new Term(idField, id));
        include.add(tq, BooleanClause.Occur.SHOULD);
        this.priority.put(new BytesRef(id), max--);
      }
      
      if (exclude == null || exclude.isEmpty()) {
        this.exclude = null;
      } else {
        this.exclude = new TermQuery[exclude.size()];
        for (int i = 0; i < exclude.size(); i++) {
          String id = idSchemaFT.readableToIndexed(exclude.get(i));
          excludeIds.add(id);
          this.exclude[i] = new TermQuery(new Term(idField, id));//封装要拉黑的doc到一个数组中
        }
      } 
    }
  }

看完上面的代码可以总结，他是将原来我们在请求中设置的query 又封装了多个query，有要置顶的，有要拉黑的，都是用id封装的。

最后一个办法最关键了，用来排序，将指定的要置顶的document进行排序：ElevationComparatorSource类，它用于产生一个排序器，我们只看newComparator方法

    /*** 返回的比较器根据的就是设置的priority进行排序的。 */
    @Override
    public FieldComparator<Integer> newComparator(String fieldname, final int numHits, int sortPos, boolean reversed)
        throws IOException {
      
      return new SimpleFieldComparator<Integer>() {
        
        /**这个最终存放的是priority的值，根据*/
        private final int[] values = new int[numHits];
        private int bottomVal;
        private int topVal;
        private PostingsEnum postingsEnum;
        //最后搜集到的id（置顶的）
        private Set<String> seen = new HashSet<>(elevations.ids.size());
        //最后的排序的实现，根据value中的值，
        public int compare(int slot1, int slot2) {
          return values[slot1] - values[slot2]; // values will be small enough that there is no overflow concern
        }
        
        @Override
        public void setBottom(int slot) {
          bottomVal = values[slot];
        }
        
        @Override
        public void setTopValue(Integer value) {
          topVal = value.intValue();
        }
        
        /**
         * 读取docValue  根据lucne的id找到指定的id，再根据指定的id找到priority。最后读取的docValue就是priority
         * @param doc   lucene的id
         * @return      docValue的值
         */
        private int docVal(int doc) {
          if (ordSet.size() > 0) {
            int slot = ordSet.find(doc);
            if (slot >= 0) {//大于0表示在ordSet中，也就是这个id被指定了置顶。
              BytesRef id = termValues[slot];//指定的id
              Integer prio = elevations.priority.get(id);//根据指定的id读取docValue，也就是priority。
              return prio == null ? 0 : prio.intValue();
            }
          }
          return 0;//如果没有指定置顶，则所有的值都是0，表示排序都是一样的，再根据得分的排序器排序。
        }
        
        @Override 
        public int compareBottom(int doc) {//当排序时，先要对比bottomVal
          return bottomVal - docVal(doc);
        }
        //给value赋值，实现lucene的id和docuemnt的id的交换
        @Override
        public void copy(int slot, int doc) {
          values[slot] = docVal(doc);//docVal就是读取的指定的solr的id（和fieldCache是一样的）
        }
        /** 当切换segmentReader的时候调用，读取真正存在的id，添加到seen、ordSet和termValues中。*/
        protected void doSetNextReader(LeafReaderContext context) throws IOException {
          // convert the ids to Lucene doc ids, the ordSet and termValues needs to be the same size as the number of
          // elevation docs we have
          ordSet.clear();
          Fields fields = context.reader().fields();
          if (fields == null) return;
          Terms terms = fields.terms(idField);//和fieldCache一样，也是读取的词典表
          if (terms == null) return;
          TermsEnum termsEnum = terms.iterator();
          BytesRefBuilder term = new BytesRefBuilder();
          Bits liveDocs = context.reader().getLiveDocs();//没有被删除的id
          for (String id : elevations.ids) {
            term.copyChars(id);
            if (seen.contains(id) == false && termsEnum.seekExact(term.get())) {
              postingsEnum = termsEnum.postings(liveDocs, postingsEnum, PostingsEnum.NONE);
              int docId = postingsEnum.nextDoc();//因为是id，所以不会是重复的
              if (docId == DocIdSetIterator.NO_MORE_DOCS) continue; // must have been deleted
              termValues[ordSet.put(docId)] = term.toBytesRef();//添加lucene的id和指定的id的关系,将lucene的id放到ordSet中，返回的是在ordSet中的位置，然后将对应的指定的doc放在termVlues中，实现lucene的id和指定的id的关联。
              seen.add(id);
              assert postingsEnum.nextDoc() == DocIdSetIterator.NO_MORE_DOCS;//因为是id，所以一定只能搜到一个doc，所以是no_more_docs
            }
          }
        }
      };
    }
  }

这个比较器就是实现用priority，也就是按照我们指定的置顶的顺序进行排序。

看完上面这些，就可以使用置顶功能了，无论是solrCloud还是单机版的solr，如果是需要配置文件的话，如果这个文件是需要修改的，那么就会很麻烦，所以我还是推荐使用将置顶的和拉黑的id放在请求的参数中，我自己做的实验也是这么做的。我做的实验是使用了一个单机版的solr，添加了两个document，每个document有两个域，一个是id，一个是title，id为1的的title含有两个hello，id为2的含有4个hello，我的select的requestHandler的df是title。

在浏览器中输入 http://localhost:8080/solr/collection1/select?q=hello&wt=json&indent=true，很明显，id为2的应该排在前面，因为他含有4个hello。

下面使用elevator，url变为：http://localhost:8080/solr/collection1/select?q=hello&wt=json&indent=true&enableElevation=on&elevateIds=1，添加了enableElevation=on，也就是开启置顶，elevateIds=1，置顶的id为1，然后现在id=1就排在前面了，并且id=2的也显示。

使用拉黑：http://localhost:8080/solr/collection1/select?q=hello&wt=json&indent=true&enableElevation=on&elevateIds=1&excludeIds=2，添加了&excludeIds=2，也就是将2拉黑，此时，只有id=1的显示。

添加排序：http://localhost:8080/solr/collection1/select?q=hello&wt=json&indent=true&sort=id asc&enableElevation=on&forceElevation=on&elevateIds=2，虽然开启了sort，按照id升序，也就是id=1的在前面，但是后面设置了forceElevation，也就是强迫使用置顶的排序并且置顶2，此时还是id=2的排在前面。

over，算是弄清楚solr的置顶和拉黑了。

iteye_14612

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
solr中对于关键字置顶（竞价排名）、拉黑的源码实现已经实例讲解（二）

继续看他的源码，在上一篇博客中还有几个方法没有看，第一个是getElevationMap，如果在请求中没有指定elevateIds或者没有指定excludeIds的话，则调用这个方法 /** get the elevation map from the data dir &lt;br/&gt; 从data中读取配置文件。 */ Map&lt;String,ElevationOb...
复制链接

扫一扫

专栏目录