布隆过滤器

最新推荐文章于 2021-12-09 11:34:32 发布

坏习惯的叛逆者

最新推荐文章于 2021-12-09 11:34:32 发布

阅读量250

点赞数

分类专栏：数据结构架构

本文链接：https://blog.csdn.net/wenxueliu/article/details/106965065

版权

架构同时被 2 个专栏收录

10 篇文章 0 订阅

订阅专栏

数据结构

1 篇文章 0 订阅

订阅专栏

介绍

1970 年被 Burton Bloom 发明

为什么要用布隆过滤器

比如 URL 有 10 亿条，每条 64 字节，使用哈希函数，由于冲突使用链表，构建成散列表至少需要 100G。而使用布隆过滤器，100 亿的位图，仅仅需要 1.2G 内存。内存优势明显。此外，布隆过滤器是计算密集型，而散列表是内存密集型。

算法

False is always false，ture maybe false。

1、分配一个 N 个元素的数组

2、把一个对象映射成 k 个值

3、对 k 个值中的每个数计算 hash 函数求值，得到索引

算法复杂度：O(k)

详细参考附录

要点

每个元素的 bucket 数量
hash 函数：个数（个数越多冲突越小）、算法（独立、分布均匀、简单，计算时间越短）
false-positive：将不存在的元素错误判断为存在

应用场景

判断某个元素是否在某个集合中

特点

如果布隆过滤器判断某个元素不在集合中，则一定不在集合中
如果布隆过滤器判断某个元素在集合中，则可能不在集合中

由于可用极小的内容判断非常大的数据量，经常用于

1、缓存系统

2、URL 判重

3、网站每条的 UV 数

4、电话黑名单

5、垃圾邮件过滤

源码分析

以 guava 的 bloomfilter 为例

public final class BloomFilter<T> implements Predicate<T>, Serializable {
  /**
   * A strategy to translate T instances, to {@code numHashFunctions} bit indexes.
   *
   * <p>Implementations should be collections of pure functions (i.e. stateless).
   */
  interface Strategy extends java.io.Serializable {

    /**
     * Sets {@code numHashFunctions} bits of the given bit array, by hashing a user element.
     *
     * <p>Returns whether any bits changed as a result of this operation.
     */
    <T> boolean put(
        T object, Funnel<? super T> funnel, int numHashFunctions, LockFreeBitArray bits);

    /**
     * Queries {@code numHashFunctions} bits of the given bit array, by hashing a user element;
     * returns {@code true} if and only if all selected bits are set.
     */
    <T> boolean mightContain(
        T object, Funnel<? super T> funnel, int numHashFunctions, LockFreeBitArray bits);

    /**
     * Identifier used to encode this strategy, when marshalled as part of a BloomFilter. Only
     * values in the [-128, 127] range are valid for the compact serial form. Non-negative values
     * are reserved for enums defined in BloomFilterStrategies; negative values are reserved for any
     * custom, stateful strategy we may define (e.g. any kind of strategy that would depend on user
     * input).
     */
    int ordinal();
  }

  /** The bit set of the BloomFilter (not necessarily power of 2!) */
  private final LockFreeBitArray bits;

  /** Number of hashes per element */
  private final int numHashFunctions;

  /** The funnel to translate Ts to bytes */
  private final Funnel<? super T> funnel;

  /** The strategy we employ to map an element T to {@code numHashFunctions} bit indexes. */
  private final Strategy strategy;

  /** Creates a BloomFilter. */
  private BloomFilter(
      LockFreeBitArray bits, int numHashFunctions, Funnel<? super T> funnel, Strategy strategy) {
    checkArgument(numHashFunctions > 0, "numHashFunctions (%s) must be > 0", numHashFunctions);
    checkArgument(
        numHashFunctions <= 255, "numHashFunctions (%s) must be <= 255", numHashFunctions);
    this.bits = checkNotNull(bits);
    this.numHashFunctions = numHashFunctions;
    this.funnel = checkNotNull(funnel);
    this.strategy = checkNotNull(strategy);
  }

  /**
   * Returns {@code true} if the element <i>might</i> have been put in this Bloom filter, {@code
   * false} if this is <i>definitely</i> not the case.
   */
  public boolean mightContain(T object) {
    return strategy.mightContain(object, funnel, numHashFunctions, bits);
  }

  /**
   * Puts an element into this {@code BloomFilter}. Ensures that subsequent invocations of {@link
   * #mightContain(Object)} with the same element will always return {@code true}.
   *
   * @return true if the Bloom filter's bits changed as a result of this operation. If the bits
   *     changed, this is <i>definitely</i> the first time {@code object} has been added to the
   *     filter. If the bits haven't changed, this <i>might</i> be the first time {@code object} has
   *     been added to the filter. Note that {@code put(t)} always returns the <i>opposite</i>
   *     result to what {@code mightContain(t)} would have returned at the time it is called.
   * @since 12.0 (present in 11.0 with {@code void} return type})
   */
  public boolean put(T object) {
    return strategy.put(object, funnel, numHashFunctions, bits);
  }

  /**
   * Returns the probability that {@linkplain #mightContain(Object)} will erroneously return {@code
   * true} for an object that has not actually been put in the {@code BloomFilter}.
   *
   * <p>Ideally, this number should be close to the {@code fpp} parameter passed in {@linkplain
   * #create(Funnel, int, double)}, or smaller. If it is significantly higher, it is usually the
   * case that too many elements (more than expected) have been put in the {@code BloomFilter},
   * degenerating it.
   *
   * @since 14.0 (since 11.0 as expectedFalsePositiveProbability())
   */
  public double expectedFpp() {
    // You down with FPP? (Yeah you know me!) Who's down with FPP? (Every last homie!)
    return Math.pow((double) bits.bitCount() / bitSize(), numHashFunctions);
  }

  /**
   * Returns an estimate for the total number of distinct elements that have been added to this
   * Bloom filter. This approximation is reasonably accurate if it does not exceed the value of
   * {@code expectedInsertions} that was used when constructing the filter.
   *
   * @since 22.0
   */
  public long approximateElementCount() {
    long bitSize = bits.bitSize();
    long bitCount = bits.bitCount();

    /**
     * Each insertion is expected to reduce the # of clear bits by a factor of
     * `numHashFunctions/bitSize`. So, after n insertions, expected bitCount is `bitSize * (1 - (1 -
     * numHashFunctions/bitSize)^n)`. Solving that for n, and approximating `ln x` as `x - 1` when x
     * is close to 1 (why?), gives the following formula.
     */
    double fractionOfBitsSet = (double) bitCount / bitSize;
    return DoubleMath.roundToLong(
        -Math.log1p(-fractionOfBitsSet) * bitSize / numHashFunctions, RoundingMode.HALF_UP);
  }

  /** Returns the number of bits in the underlying bit array. */
  @VisibleForTesting
  long bitSize() {
    return bits.bitSize();
  }

  @Override
  public boolean equals(@Nullable Object object) {
    if (object == this) {
      return true;
    }
    if (object instanceof BloomFilter) {
      BloomFilter<?> that = (BloomFilter<?>) object;
      return this.numHashFunctions == that.numHashFunctions
          && this.funnel.equals(that.funnel)
          && this.bits.equals(that.bits)
          && this.strategy.equals(that.strategy);
    }
    return false;
  }

  @Override
  public int hashCode() {
    return Objects.hashCode(numHashFunctions, funnel, strategy, bits);
  }

  /**
   * Returns a {@code Collector} expecting the specified number of insertions, and yielding a {@link
   * BloomFilter} with false positive probability 3%.
   *
   * <p>Note that if the {@code Collector} receives significantly more elements than specified, the
   * resulting {@code BloomFilter} will suffer a sharp deterioration of its false positive
   * probability.
   *
   * <p>The constructed {@code BloomFilter} will be serializable if the provided {@code Funnel<T>}
   * is.
   *
   * <p>It is recommended that the funnel be implemented as a Java enum. This has the benefit of
   * ensuring proper serialization and deserialization, which is important since {@link #equals}
   * also relies on object identity of funnels.
   *
   * @param funnel the funnel of T's that the constructed {@code BloomFilter} will use
   * @param expectedInsertions the number of expected insertions to the constructed {@code
   *     BloomFilter}; must be positive
   * @return a {@code Collector} generating a {@code BloomFilter} of the received elements
   * @since 23.0
   */
  public static <T> Collector<T, ?, BloomFilter<T>> toBloomFilter(
      Funnel<? super T> funnel, long expectedInsertions) {
    return toBloomFilter(funnel, expectedInsertions, 0.03);
  }

  /**
   * Returns a {@code Collector} expecting the specified number of insertions, and yielding a {@link
   * BloomFilter} with the specified expected false positive probability.
   *
   * <p>Note that if the {@code Collector} receives significantly more elements than specified, the
   * resulting {@code BloomFilter} will suffer a sharp deterioration of its false positive
   * probability.
   *
   * <p>The constructed {@code BloomFilter} will be serializable if the provided {@code Funnel<T>}
   * is.
   *
   * <p>It is recommended that the funnel be implemented as a Java enum. This has the benefit of
   * ensuring proper serialization and deserialization, which is important since {@link #equals}
   * also relies on object identity of funnels.
   *
   * @param funnel the funnel of T's that the constructed {@code BloomFilter} will use
   * @param expectedInsertions the number of expected insertions to the constructed {@code
   *     BloomFilter}; must be positive
   * @param fpp the desired false positive probability (must be positive and less than 1.0)
   * @return a {@code Collector} generating a {@code BloomFilter} of the received elements
   * @since 23.0
   */
  public static <T> Collector<T, ?, BloomFilter<T>> toBloomFilter(
      Funnel<? super T> funnel, long expectedInsertions, double fpp) {
    checkNotNull(funnel);
    checkArgument(
        expectedInsertions >= 0, "Expected insertions (%s) must be >= 0", expectedInsertions);
    checkArgument(fpp > 0.0, "False positive probability (%s) must be > 0.0", fpp);
    checkArgument(fpp < 1.0, "False positive probability (%s) must be < 1.0", fpp);
    return Collector.of(
        () -> BloomFilter.create(funnel, expectedInsertions, fpp),
        BloomFilter::put,
        (bf1, bf2) -> {
          bf1.putAll(bf2);
          return bf1;
        },
        Collector.Characteristics.UNORDERED,
        Collector.Characteristics.CONCURRENT);
  }

  /**
   * Creates a {@link BloomFilter} with the expected number of insertions and expected false
   * positive probability.
   *
   * <p>Note that overflowing a {@code BloomFilter} with significantly more elements than specified,
   * will result in its saturation, and a sharp deterioration of its false positive probability.
   *
   * <p>The constructed {@code BloomFilter} will be serializable if the provided {@code Funnel<T>}
   * is.
   *
   * <p>It is recommended that the funnel be implemented as a Java enum. This has the benefit of
   * ensuring proper serialization and deserialization, which is important since {@link #equals}
   * also relies on object identity of funnels.
   *
   * @param funnel the funnel of T's that the constructed {@code BloomFilter} will use
   * @param expectedInsertions the number of expected insertions to the constructed {@code
   *     BloomFilter}; must be positive
   * @param fpp the desired false positive probability (must be positive and less than 1.0)
   * @return a {@code BloomFilter}
   */
  public static <T> BloomFilter<T> create(
      Funnel<? super T> funnel, int expectedInsertions, double fpp) {
    return create(funnel, (long) expectedInsertions, fpp);
  }

  /**
   * Creates a {@link BloomFilter} with the expected number of insertions and expected false
   * positive probability.
   *
   * <p>Note that overflowing a {@code BloomFilter} with significantly more elements than specified,
   * will result in its saturation, and a sharp deterioration of its false positive probability.
   *
   * <p>The constructed {@code BloomFilter} will be serializable if the provided {@code Funnel<T>}
   * is.
   *
   * <p>It is recommended that the funnel be implemented as a Java enum. This has the benefit of
   * ensuring proper serialization and deserialization, which is important since {@link #equals}
   * also relies on object identity of funnels.
   *
   * @param funnel the funnel of T's that the constructed {@code BloomFilter} will use
   * @param expectedInsertions the number of expected insertions to the constructed {@code
   *     BloomFilter}; must be positive
   * @param fpp the desired false positive probability (must be positive and less than 1.0)
   * @return a {@code BloomFilter}
   * @since 19.0
   */
  public static <T> BloomFilter<T> create(
      Funnel<? super T> funnel, long expectedInsertions, double fpp) {
    return create(funnel, expectedInsertions, fpp, BloomFilterStrategies.MURMUR128_MITZ_64);
  }

  @VisibleForTesting
  static <T> BloomFilter<T> create(
      Funnel<? super T> funnel, long expectedInsertions, double fpp, Strategy strategy) {
    checkNotNull(funnel);
    checkArgument(
        expectedInsertions >= 0, "Expected insertions (%s) must be >= 0", expectedInsertions);
    checkArgument(fpp > 0.0, "False positive probability (%s) must be > 0.0", fpp);
    checkArgument(fpp < 1.0, "False positive probability (%s) must be < 1.0", fpp);
    checkNotNull(strategy);

    if (expectedInsertions == 0) {
      expectedInsertions = 1;
    }
    /*
     * TODO(user): Put a warning in the javadoc about tiny fpp values, since the resulting size
     * is proportional to -log(p), but there is not much of a point after all, e.g.
     * optimalM(1000, 0.0000000000000001) = 76680 which is less than 10kb. Who cares!
     */
    long numBits = optimalNumOfBits(expectedInsertions, fpp);
    int numHashFunctions = optimalNumOfHashFunctions(expectedInsertions, numBits);
    try {
      return new BloomFilter<T>(new LockFreeBitArray(numBits), numHashFunctions, funnel, strategy);
    } catch (IllegalArgumentException e) {
      throw new IllegalArgumentException("Could not create BloomFilter of " + numBits + " bits", e);
    }
  }

  /**
   * Creates a {@link BloomFilter} with the expected number of insertions and a default expected
   * false positive probability of 3%.
   *
   * <p>Note that overflowing a {@code BloomFilter} with significantly more elements than specified,
   * will result in its saturation, and a sharp deterioration of its false positive probability.
   *
   * <p>The constructed {@code BloomFilter} will be serializable if the provided {@code Funnel<T>}
   * is.
   *
   * <p>It is recommended that the funnel be implemented as a Java enum. This has the benefit of
   * ensuring proper serialization and deserialization, which is important since {@link #equals}
   * also relies on object identity of funnels.
   *
   * @param funnel the funnel of T's that the constructed {@code BloomFilter} will use
   * @param expectedInsertions the number of expected insertions to the constructed {@code
   *     BloomFilter}; must be positive
   * @return a {@code BloomFilter}
   */
  public static <T> BloomFilter<T> create(Funnel<? super T> funnel, int expectedInsertions) {
    return create(funnel, (long) expectedInsertions);
  }

  /**
   * Creates a {@link BloomFilter} with the expected number of insertions and a default expected
   * false positive probability of 3%.
   *
   * <p>Note that overflowing a {@code BloomFilter} with significantly more elements than specified,
   * will result in its saturation, and a sharp deterioration of its false positive probability.
   *
   * <p>The constructed {@code BloomFilter} will be serializable if the provided {@code Funnel<T>}
   * is.
   *
   * <p>It is recommended that the funnel be implemented as a Java enum. This has the benefit of
   * ensuring proper serialization and deserialization, which is important since {@link #equals}
   * also relies on object identity of funnels.
   *
   * @param funnel the funnel of T's that the constructed {@code BloomFilter} will use
   * @param expectedInsertions the number of expected insertions to the constructed {@code
   *     BloomFilter}; must be positive
   * @return a {@code BloomFilter}
   * @since 19.0
   */
  public static <T> BloomFilter<T> create(Funnel<? super T> funnel, long expectedInsertions) {
    return create(funnel, expectedInsertions, 0.03); // FYI, for 3%, we always get 5 hash functions
  }

  // Cheat sheet:
  //
  // m: total bits
  // n: expected insertions
  // b: m/n, bits per insertion
  // p: expected false positive probability
  //
  // 1) Optimal k = b * ln2
  // 2) p = (1 - e ^ (-kn/m))^k
  // 3) For optimal k: p = 2 ^ (-k) ~= 0.6185^b
  // 4) For optimal k: m = -nlnp / ((ln2) ^ 2)

  /**
   * Computes the optimal k (number of hashes per element inserted in Bloom filter), given the
   * expected insertions and total number of bits in the Bloom filter.
   *
   * <p>See http://en.wikipedia.org/wiki/File:Bloom_filter_fp_probability.svg for the formula.
   *
   * @param n expected insertions (must be positive)
   * @param m total number of bits in Bloom filter (must be positive)
   */
  static int optimalNumOfHashFunctions(long n, long m) {
    // (m / n) * log(2), but avoid truncation due to division!
    return Math.max(1, (int) Math.round((double) m / n * Math.log(2)));
  }

  /**
   * Computes m (total bits of Bloom filter) which is expected to achieve, for the specified
   * expected insertions, the required false positive probability.
   *
   * <p>See http://en.wikipedia.org/wiki/Bloom_filter#Probability_of_false_positives for the
   * formula.
   *
   * @param n expected insertions (must be positive)
   * @param p false positive rate (must be 0 < p < 1)
   */
  static long optimalNumOfBits(long n, double p) {
    if (p == 0) {
      p = Double.MIN_VALUE;
    }
    return (long) (-n * Math.log(p) / (Math.log(2) * Math.log(2)));
  }
}

过滤器策略

/**
 * Collections of strategies of generating the k * log(M) bits required for an element to be mapped
 * to a BloomFilter of M bits and k hash functions. These strategies are part of the serialized form
 * of the Bloom filters that use them, thus they must be preserved as is (no updates allowed, only
 * introduction of new versions).
 *
 * <p>Important: the order of the constants cannot change, and they cannot be deleted - we depend on
 * their ordinal for BloomFilter serialization.
 *
 * @author Dimitris Andreou
 * @author Kurt Alfred Kluever
 */
enum BloomFilterStrategies implements BloomFilter.Strategy {
  /**
   * See "Less Hashing, Same Performance: Building a Better Bloom Filter" by Adam Kirsch and Michael
   * Mitzenmacher. The paper argues that this trick doesn't significantly deteriorate the
   * performance of a Bloom filter (yet only needs two 32bit hash functions).
   */
  MURMUR128_MITZ_32() {
    @Override
    public <T> boolean put(
        T object, Funnel<? super T> funnel, int numHashFunctions, LockFreeBitArray bits) {
      long bitSize = bits.bitSize();
      long hash64 = Hashing.murmur3_128().hashObject(object, funnel).asLong();
      int hash1 = (int) hash64;
      int hash2 = (int) (hash64 >>> 32);

      boolean bitsChanged = false;
      for (int i = 1; i <= numHashFunctions; i++) {
        int combinedHash = hash1 + (i * hash2);
        // Flip all the bits if it's negative (guaranteed positive number)
        if (combinedHash < 0) {
          combinedHash = ~combinedHash;
        }
        bitsChanged |= bits.set(combinedHash % bitSize);
      }
      return bitsChanged;
    }

    @Override
    public <T> boolean mightContain(
        T object, Funnel<? super T> funnel, int numHashFunctions, LockFreeBitArray bits) {
      long bitSize = bits.bitSize();
      long hash64 = Hashing.murmur3_128().hashObject(object, funnel).asLong();
      int hash1 = (int) hash64;
      int hash2 = (int) (hash64 >>> 32);

      for (int i = 1; i <= numHashFunctions; i++) {
        int combinedHash = hash1 + (i * hash2);
        // Flip all the bits if it's negative (guaranteed positive number)
        if (combinedHash < 0) {
          combinedHash = ~combinedHash;
        }
        if (!bits.get(combinedHash % bitSize)) {
          return false;
        }
      }
      return true;
    }
  },
  /**
   * This strategy uses all 128 bits of {@link Hashing#murmur3_128} when hashing. It looks different
   * than the implementation in MURMUR128_MITZ_32 because we're avoiding the multiplication in the
   * loop and doing a (much simpler) += hash2. We're also changing the index to a positive number by
   * AND'ing with Long.MAX_VALUE instead of flipping the bits.
   */
  MURMUR128_MITZ_64() {
    @Override
    public <T> boolean put(
        T object, Funnel<? super T> funnel, int numHashFunctions, LockFreeBitArray bits) {
      long bitSize = bits.bitSize();
      byte[] bytes = Hashing.murmur3_128().hashObject(object, funnel).getBytesInternal();
      long hash1 = lowerEight(bytes);
      long hash2 = upperEight(bytes);

      boolean bitsChanged = false;
      long combinedHash = hash1;
      for (int i = 0; i < numHashFunctions; i++) {
        // Make the combined hash positive and indexable
        bitsChanged |= bits.set((combinedHash & Long.MAX_VALUE) % bitSize);
        combinedHash += hash2;
      }
      return bitsChanged;
    }

    @Override
    public <T> boolean mightContain(
        T object, Funnel<? super T> funnel, int numHashFunctions, LockFreeBitArray bits) {
      long bitSize = bits.bitSize();
      byte[] bytes = Hashing.murmur3_128().hashObject(object, funnel).getBytesInternal();
      long hash1 = lowerEight(bytes);
      long hash2 = upperEight(bytes);

      long combinedHash = hash1;
      for (int i = 0; i < numHashFunctions; i++) {
        // Make the combined hash positive and indexable
        if (!bits.get((combinedHash & Long.MAX_VALUE) % bitSize)) {
          return false;
        }
        combinedHash += hash2;
      }
      return true;
    }

    private /* static */ long lowerEight(byte[] bytes) {
      return Longs.fromBytes(
          bytes[7], bytes[6], bytes[5], bytes[4], bytes[3], bytes[2], bytes[1], bytes[0]);
    }

    private /* static */ long upperEight(byte[] bytes) {
      return Longs.fromBytes(
          bytes[15], bytes[14], bytes[13], bytes[12], bytes[11], bytes[10], bytes[9], bytes[8]);
    }
  };

  /**
   * Models a lock-free array of bits.
   *
   * <p>We use this instead of java.util.BitSet because we need access to the array of longs and we
   * need compare-and-swap.
   */
  static final class LockFreeBitArray {
    private static final int LONG_ADDRESSABLE_BITS = 6;
    final AtomicLongArray data;
    private final LongAddable bitCount;

    LockFreeBitArray(long bits) {
      this(new long[Ints.checkedCast(LongMath.divide(bits, 64, RoundingMode.CEILING))]);
    }

    // Used by serialization
    LockFreeBitArray(long[] data) {
      checkArgument(data.length > 0, "data length is zero!");
      this.data = new AtomicLongArray(data);
      this.bitCount = LongAddables.create();
      long bitCount = 0;
      for (long value : data) {
        bitCount += Long.bitCount(value);
      }
      this.bitCount.add(bitCount);
    }

    /** Returns true if the bit changed value. */
    boolean set(long bitIndex) {
      if (get(bitIndex)) {
        return false;
      }

      int longIndex = (int) (bitIndex >>> LONG_ADDRESSABLE_BITS);
      long mask = 1L << bitIndex; // only cares about low 6 bits of bitIndex

      long oldValue;
      long newValue;
      do {
        oldValue = data.get(longIndex);
        newValue = oldValue | mask;
        if (oldValue == newValue) {
          return false;
        }
      } while (!data.compareAndSet(longIndex, oldValue, newValue));

      // We turned the bit on, so increment bitCount.
      bitCount.increment();
      return true;
    }

    boolean get(long bitIndex) {
      return (data.get((int) (bitIndex >>> 6)) & (1L << bitIndex)) != 0;
    }

    /**
     * Careful here: if threads are mutating the atomicLongArray while this method is executing, the
     * final long[] will be a "rolling snapshot" of the state of the bit array. This is usually good
     * enough, but should be kept in mind.
     */
    public static long[] toPlainArray(AtomicLongArray atomicLongArray) {
      long[] array = new long[atomicLongArray.length()];
      for (int i = 0; i < array.length; ++i) {
        array[i] = atomicLongArray.get(i);
      }
      return array;
    }

    /** Number of bits */
    long bitSize() {
      return (long) data.length() * Long.SIZE;
    }

    /**
     * Number of set bits (1s).
     *
     * <p>Note that because of concurrent set calls and uses of atomics, this bitCount is a (very)
     * close *estimate* of the actual number of bits set. It's not possible to do better than an
     * estimate without locking. Note that the number, if not exactly accurate, is *always*
     * underestimating, never overestimating.
     */
    long bitCount() {
      return bitCount.sum();
    }

    LockFreeBitArray copy() {
      return new LockFreeBitArray(toPlainArray(data));
    }

    /**
     * Combines the two BitArrays using bitwise OR.
     *
     * <p>NOTE: Because of the use of atomics, if the other LockFreeBitArray is being mutated while
     * this operation is executing, not all of those new 1's may be set in the final state of this
     * LockFreeBitArray. The ONLY guarantee provided is that all the bits that were set in the other
     * LockFreeBitArray at the start of this method will be set in this LockFreeBitArray at the end
     * of this method.
     */
    void putAll(LockFreeBitArray other) {
      checkArgument(
          data.length() == other.data.length(),
          "BitArrays must be of equal length (%s != %s)",
          data.length(),
          other.data.length());
      for (int i = 0; i < data.length(); i++) {
        long otherLong = other.data.get(i);

        long ourLongOld;
        long ourLongNew;
        boolean changedAnyBits = true;
        do {
          ourLongOld = data.get(i);
          ourLongNew = ourLongOld | otherLong;
          if (ourLongOld == ourLongNew) {
            changedAnyBits = false;
            break;
          }
        } while (!data.compareAndSet(i, ourLongOld, ourLongNew));

        if (changedAnyBits) {
          int bitsAdded = Long.bitCount(ourLongNew) - Long.bitCount(ourLongOld);
          bitCount.add(bitsAdded);
        }
      }
    }

    @Override
    public boolean equals(@Nullable Object o) {
      if (o instanceof LockFreeBitArray) {
        LockFreeBitArray lockFreeBitArray = (LockFreeBitArray) o;
        // TODO(lowasser): avoid allocation here
        return Arrays.equals(toPlainArray(data), toPlainArray(lockFreeBitArray.data));
      }
      return false;
    }

    @Override
    public int hashCode() {
      // TODO(lowasser): avoid allocation here
      return Arrays.hashCode(toPlainArray(data));
    }
  }
}

参考

http://llimllib.github.io/bloomfilter-tutorial/
http://pages.cs.wisc.edu/~cao/papers/summary-cache/node8.html
http://www.eecs.harvard.edu/~kirsch/pubs/bbbf/esa06.pdf
Cassandra 的 BloomFilter
HBase 的 BloomFilter