Guava Multiset接口使用与分析

最新推荐文章于 2022-07-05 16:11:55 发布

liuhmmjj

最新推荐文章于 2022-07-05 16:11:55 发布

阅读量1.1w

点赞数 5

分类专栏： Guava 文章标签： java jdk

本文链接：https://blog.csdn.net/u014082714/article/details/52080647

版权

Guava 专栏收录该内容

0 篇文章 0 订阅

订阅专栏

　Guava引进了JDK里没有的，但是非常有用的一些新的集合类型。所有这些新集合类型都能和JDK里的集合平滑集成。Guava集合非常精准地实现了JDK定义的接口。

Multiset: 把重复的元素放入集合

你可能会说这和 Set 接口的契约冲突，因为 Set 接口的 JavaDoc 里面规定不能放入重复元素。事实上，Multiset 并没有实现 java.util.Set 接口，它更像是一个 Bag。普通的 Set 就像这样 :[car, ship, bike]，而 Multiset 会是这样 : [car x 2, ship x 6, bike x 3]。

Multiset的UML图如下所示：

在UML图，加入了HashSet的继承体系来说明，Multiset是一种新的集合，并不是AbsTractSet的子类（不是一种Set)，MultiSet接口，在Collection的基础上扩展出对重复元素处理的方法，例如：int count(Object element)、int add(@Nullable E element, int occurrences)方法（见名知意，就不多说了）。

Multiset有一个有用的功能，就是跟踪每种对象的数量，所以你可以用来进行数字统计。

譬如一个 List 里面有各种字符串，然后你要统计每个字符串在 List 里面出现的次数 :

Map<String, Integer> map = new HashMap<String, Integer>(); 
 for(String word : wordList){ 
    Integer count = map.get(word); 
    map.put(word, (count == null) ? 1 : count + 1); 
 } 
 //count word “the”
 Integer count = map.get(“the”);

如果用 Multiset 就可以这样 :

HashMultiset<String> multiSet = HashMultiset.create(); 
 multiSet.addAll(wordList); 
 //count word “the”
 Integer count = multiSet.count(“the”);

这样连循环都不用了，而且 Multiset 用的方法叫 count，显然比在 Map 里面调 get 有更好的可读性。Multiset 还提供了 setCount 这样设定元素重复次数的方法，虽然你可以通过使用 Map 来实现类似的功能，但是程序的可读性比 Multiset 差了很多。

常用实现 Multiset 接口的类有：

HashMultiset: 元素存放于 HashMap
LinkedHashMultiset: 元素存放于 LinkedHashMap，即元素的排列顺序由第一次放入的顺序决定
TreeMultiset:元素被排序存放于TreeMap
EnumMultiset: 元素必须是 enum 类型
ImmutableMultiset: 不可修改的 Mutiset

看到这里你可能已经发现 Guava Collections 都是以 create 或是 of 这样的静态方法来构造对象。这是因为这些集合类大多有多个参数的私有构造方法，由于参数数目很多，客户代码程序员使用起来就很不方便。而且以这种方式可以返回原类型的子类型对象。另外，对于创建范型对象来讲，这种方式更加简洁。

Multiset接口定义的接口主要有：
　　　　add(E element) :向其中添加单个元素
　　　　add(E element,int occurrences) : 向其中添加指定个数的元素
　　　　count(Object element) : 返回给定参数元素的个数
　　　　remove(E element) : 移除一个元素，其count值会响应减少
　　　　remove(E element,int occurrences): 移除相应个数的元素
　　　　elementSet() : 将不同的元素放入一个Set中
　　　　entrySet(): 类似与Map.entrySet 返回Set<Multiset.Entry>。包含的Entry支持使用getElement()和getCount()
　　　　setCount(E element ,int count): 设定某一个元素的重复次数
　　　　setCount(E element,int oldCount,int newCount): 将符合原有重复个数的元素修改为新的重复次数
　　　　retainAll(Collection c) : 保留出现在给定集合参数的所有的元素
　　　　removeAll(Collectionc) : 去除出现给给定集合参数的所有的元素

Multiset 示例

import java.util.Iterator;
import java.util.Set;

import com.google.common.collect.HashMultiset;
import com.google.common.collect.Multiset;

public class GuavaTester {

   public static void main(String args[]){
      //create a multiset collection
      Multiset<String> multiset = HashMultiset.create();
      multiset.add("a");
      multiset.add("b");
      multiset.add("c");
      multiset.add("d");
      multiset.add("a");
      multiset.add("b");
      multiset.add("c");
      multiset.add("b");
      multiset.add("b");
      multiset.add("b");
      //print the occurrence of an element
      System.out.println("Occurrence of 'b' : "+multiset.count("b"));
      //print the total size of the multiset
      System.out.println("Total Size : "+multiset.size());
      //get the distinct elements of the multiset as set
      Set<String> set = multiset.elementSet();
      //display the elements of the set
      System.out.println("Set [");
      for (String s : set) {			
         System.out.println(s);		    
      }
      System.out.println("]");
      //display all the elements of the multiset using iterator
      Iterator<String> iterator  = multiset.iterator();
      System.out.println("MultiSet [");
      while(iterator.hasNext()){
         System.out.println(iterator.next());
      }
      System.out.println("]");		
      //display the distinct elements of the multiset with their occurrence count
      System.out.println("MultiSet [");
      for (Multiset.Entry<String> entry : multiset.entrySet())
      {
         System.out.println("Element: "+entry.getElement() +", Occurrence(s): " + entry.getCount());		    
      }
      System.out.println("]");		

      //remove extra occurrences 
      multiset.remove("b",2);
      //print the occurrence of an element
      System.out.println("Occurence of 'b' : "+multiset.count("b"));
   }	
}

结果为：

Occurence of 'b' : 5
Total Size : 10
Set [
d
b
c
a
]
MultiSet [
d
b
b
b
b
b
c
c
a
a
]
MultiSet [
Element: d, Occurence(s): 1
Element: b, Occurence(s): 5
Element: c, Occurence(s): 2
Element: a, Occurence(s): 2
]
Occurence of 'b' : 3

HashMultiset是最常用的，其实现是以Map<T,Count>为存储结构，其中的add和remove方法是对Count进行的操作（ Count并不是线程安全的），Multiset与Map<T,Integer>最大的不同是，Multiset遍历时可以遍历出Map.keySize * count个元素，而map却不可以，最大的区别就在于其itertator和Entry<T,Count>的iterator实现代码如下：

private class MapBasedMultisetIterator implements Iterator<E> {
    final Iterator<Map.Entry<E, Count>> entryIterator;
    Map.Entry<E, Count> currentEntry;
    int occurrencesLeft;//元素个数的剩余，用来判断是够移动迭代器指针
    boolean canRemove;

    MapBasedMultisetIterator() {
      this.entryIterator = backingMap.entrySet().iterator();
    }

    @Override
    public boolean hasNext() {
      return occurrencesLeft > 0 || entryIterator.hasNext();
    }

    @Override
    public E next() {
      if (occurrencesLeft == 0) {//如果为0，则移动迭代器指针
        currentEntry = entryIterator.next();
        occurrencesLeft = currentEntry.getValue().get();
      }
      occurrencesLeft--;
      canRemove = true;
      return currentEntry.getKey();
    }

    @Override
    public void remove() {
      checkRemove(canRemove);
      int frequency = currentEntry.getValue().get();
      if (frequency <= 0) {
        throw new ConcurrentModificationException();
      }
      if (currentEntry.getValue().addAndGet(-1) == 0) {
        entryIterator.remove();
      }
      size--;
      canRemove = false;
    }
  }

而Multiset.Entry<E>，实现被map的entryset代理的同时，加入了getCount()操作，并且支持remove和clear，代码如下：

Iterator<Entry<E>> entryIterator() {
        final Iterator<Map.Entry<E, Count>> backingEntries =
                backingMap.entrySet().iterator();//被map代理
        return new Iterator<Multiset.Entry<E>>() {
            Map.Entry<E, Count> toRemove;

            @Override
            public boolean hasNext() {
                return backingEntries.hasNext();
            }

            @Override
            public Multiset.Entry<E> next() {
                final Map.Entry<E, Count> mapEntry = backingEntries.next();
                toRemove = mapEntry;
                return new Multisets.AbstractEntry<E>() {
                    @Override
                    public E getElement() {
                        return mapEntry.getKey();
                    }
　　　　　　　　　　　　//getcount操作
                    @Override
                    public int getCount() {
                        Count count = mapEntry.getValue();
                        if (count == null || count.get() == 0) {
                            Count frequency = backingMap.get(getElement());
                            if (frequency != null) {
                                return frequency.get();
                            }
                        }
                        return (count == null) ? 0 : count.get();
                    }
                };
            }

            @Override
            public void remove() {
                checkRemove(toRemove != null);//在next的时候记录元素，可以被remove
                size -= toRemove.getValue().getAndSet(0);
                backingEntries.remove();
                toRemove = null;
            }
        };
    }

Multiset整体来说是个很好用的集合，而且实现巧妙，一种元素并没有被存多分，而且巧妙的利用iterator指针来模拟多份数据。