死磕Java集合之BitSet源码分析(JDK18)

文章详细介绍了Java中的BitSet类,它是通过long数组实现位图的一种数据结构。BitSet利用位运算高效地进行设置、清除和翻转特定位的操作,并提供了计算位为1的数量的方法。文章通过源码分析解释了其内部存储结构、构造方法以及各种关键方法的工作原理,展示了BitSet在空间效率上的优势。
摘要由CSDN通过智能技术生成

死磕Java集合之BitSet源码分析(JDK18)

简介

因为Java中没有new bit[]这种直接创建一个bit数组的方式,所以Java提供了BitSet来实现位图,BitSet是采用一个long型的数组来实现位图的。

BitSet的首个long型数组表示的是[0, 63]这64个元素。

继承体系

在这里插入图片描述

BitSet实现了Cloneable和Serializable

存储结构

BitSet通过long型的数组来实现位图,一个long型元素可以表示64个元素是否存在(第i位为1,表示64 * n + i已存在)。

源码解析

属性

/*
 * BitSets are packed into arrays of "words."  Currently a word is
 * a long, which consists of 64 bits, requiring 6 address bits.
 * The choice of word size is determined purely by performance concerns.
 */
private static final int ADDRESS_BITS_PER_WORD = 6;
// 1 << 6的大小是64
private static final int BITS_PER_WORD = 1 << ADDRESS_BITS_PER_WORD;
private static final int BIT_INDEX_MASK = BITS_PER_WORD - 1;

/* Used to shift left or right for a partial word mask */
private static final long WORD_MASK = 0xffffffffffffffffL;

/**
 * @serialField bits long[]
 *
 * The bits in this BitSet.  The ith bit is stored in bits[i/64] at
 * bit position i % 64 (where bit position 0 refers to the least
 * significant bit and 63 refers to the most significant bit).
 */
@java.io.Serial
private static final ObjectStreamField[] serialPersistentFields = {
    new ObjectStreamField("bits", long[].class),
};

/**
 * The internal field corresponding to the serialField "bits".
 * BitSet的底层实现是使用long数组作为内部存储结构的,所以BitSet的大小为long类型大小(64位)的整数倍。
 */
private long[] words;

/**
 * The number of words in the logical size of this BitSet.
 */
private transient int wordsInUse = 0;

/**
 * Whether the size of "words" is user-specified.  If so, we assume
 * the user knows what he's doing and try harder to preserve it.
 */
private transient boolean sizeIsSticky = false;

/* use serialVersionUID from JDK 1.0.2 for interoperability */
@java.io.Serial
private static final long serialVersionUID = 7997698588986878753L;

构造方法

/**
 * Creates a new bit set. All bits are initially {@code false}.
 */
public BitSet() {
    // 默认初始化long[]的长度为1
    initWords(BITS_PER_WORD);
    sizeIsSticky = false;
}

/**
 * Creates a bit set whose initial size is large enough to explicitly
 * represent bits with indices in the range {@code 0} through
 * {@code nbits-1}. All bits are initially {@code false}.
 *
 * @param  nbits the initial size of the bit set
 * @throws NegativeArraySizeException if the specified initial size
 *         is negative
 */
public BitSet(int nbits) {
    // nbits can't be negative; size 0 is OK
    if (nbits < 0)
        throw new NegativeArraySizeException("nbits < 0: " + nbits);

    initWords(nbits);
    sizeIsSticky = true;
}

private void initWords(int nbits) {
    words = new long[wordIndex(nbits-1) + 1];
}

/**
 * Creates a bit set using words as the internal representation.
 * The last word (if there is one) must be non-zero.
 */
private BitSet(long[] words) {
    this.words = words;
    this.wordsInUse = words.length;
    checkInvariants();
}

// 相当于bitIndex / 64
private static int wordIndex(int bitIndex) {
    return bitIndex >> ADDRESS_BITS_PER_WORD;
}

set(int bitIndex)

将值bitIndex加入BitSet,即索引为bitIndex的元素比特位置为1

public void set(int bitIndex) {
    if (bitIndex < 0)
        throw new IndexOutOfBoundsException("bitIndex < 0: " + bitIndex);

    // 获取bitIndex所在long数组元素的索引
    int wordIndex = wordIndex(bitIndex);
    expandTo(wordIndex);
    
    // 将对应bitIndex的比特位置为1  
    words[wordIndex] |= (1L << bitIndex); // Restores invariants

    checkInvariants();
}

// 将long数组扩展到可以容纳索引wordIndex
private void expandTo(int wordIndex) {
    // 索引从0开始,所以需要的数组长度为wordIndex+1
    int wordsRequired = wordIndex+1;
    if (wordsInUse < wordsRequired) {
        ensureCapacity(wordsRequired);
        wordsInUse = wordsRequired;
    }
}

// 确保BitSet的容量足够
private void ensureCapacity(int wordsRequired) {
    if (words.length < wordsRequired) {
        // Allocate larger of doubled size or required size
        int request = Math.max(2 * words.length, wordsRequired);
        words = Arrays.copyOf(words, request);
        sizeIsSticky = false;
    }
}

// 新建一个newLength长度的数组,并将原数组的元素复制到新数组中,并返回新数组
public static long[] copyOf(long[] original, int newLength) {
    long[] copy = new long[newLength];
    System.arraycopy(original, 0, copy, 0,
                     Math.min(original.length, newLength));
    return copy;
}

Java中的左移是循环位移,例如1 << 33相当于1 << 1值为2

set(int bitIndex, boolean value)

如果value为真,设置bitIndex位的值为1,否则则置为0

public void set(int bitIndex, boolean value) {
    if (value)
        set(bitIndex);
    else
        clear(bitIndex);
}

clear(int bitIndex)

将bitIndex位的值置为0

public void clear(int bitIndex) {
    if (bitIndex < 0)
        throw new IndexOutOfBoundsException("bitIndex < 0: " + bitIndex);
    // 获取bitIndex所在的long数组的索引
    int wordIndex = wordIndex(bitIndex);
    if (wordIndex >= wordsInUse)
        return;

    // 1L << bitIndex找到对应的比特位,取反后再&,就将对应比特位置为0了
    words[wordIndex] &= ~(1L << bitIndex);

    recalculateWordsInUse();
    checkInvariants();
}

// 重新计算wordsInUse
private void recalculateWordsInUse() {
    // Traverse the bitset until a used word is found
    int i;
    for (i = wordsInUse-1; i >= 0; i--)
        if (words[i] != 0)
            break;

    wordsInUse = i+1; // The new logical size
}

private void checkInvariants() {
    assert(wordsInUse == 0 || words[wordsInUse - 1] != 0);
    assert(wordsInUse >= 0 && wordsInUse <= words.length);
    assert(wordsInUse == words.length || words[wordsInUse] == 0);
}

set(int fromIndex, int toIndex)

将[fromIndex, toIndex)的比特位置为1

public void set(int fromIndex, int toIndex) {
    checkRange(fromIndex, toIndex);

    if (fromIndex == toIndex)
        return;

    // Increase capacity if necessary
    int startWordIndex = wordIndex(fromIndex);
    int endWordIndex   = wordIndex(toIndex - 1);
    expandTo(endWordIndex);

    long firstWordMask = WORD_MASK << fromIndex;
    // 这里的WORD_MASK >>> -toIndex等价于WORD_MASK >>> 64-toIndex
    long lastWordMask  = WORD_MASK >>> -toIndex;
    if (startWordIndex == endWordIndex) {
        // Case 1: One word
        // 第1种情况:fromIndex和toIndex都在同一个word中
        words[startWordIndex] |= (firstWordMask & lastWordMask);
    } else {
        // Case 2: Multiple words
        // Handle first word
        words[startWordIndex] |= firstWordMask;

        // Handle intermediate words, if any
        for (int i = startWordIndex+1; i < endWordIndex; i++)
            words[i] = WORD_MASK;

        // Handle last word (restores invariants)
        words[endWordIndex] |= lastWordMask;
    }

    checkInvariants();
}
  1. 检查参数是否合法;
  2. 分别获取fromIndex和toIndex所在的word的索引;
  3. 如果fromIndex和toIndex在同一个word中,则将[fromIndex, toIndex)的比特位置为1;
  4. 如果不在,则分三步处理:
    1. 处理第1个word;
    2. 处理第2个到最后1个之间的word,全置为1;
    3. 处理最后1个word。

clear(int fromIndex, int toIndex)

将[fromIndex, toIndex)的比特位置为0

public void clear(int fromIndex, int toIndex) {
    checkRange(fromIndex, toIndex);

    if (fromIndex == toIndex)
        return;

    int startWordIndex = wordIndex(fromIndex);
    if (startWordIndex >= wordsInUse)
        return;

    int endWordIndex = wordIndex(toIndex - 1);
    if (endWordIndex >= wordsInUse) {
        toIndex = length();
        endWordIndex = wordsInUse - 1;
    }

    long firstWordMask = WORD_MASK << fromIndex;
    // 这里的WORD_MASK >>> -toIndex等价于WORD_MASK >>> 64-toIndex
    long lastWordMask  = WORD_MASK >>> -toIndex;
    if (startWordIndex == endWordIndex) {
        // Case 1: One word
        // 这里与set(fromIndex,toIndex)类似,只是后面进行了取反操作
        words[startWordIndex] &= ~(firstWordMask & lastWordMask);
    } else {
        // Case 2: Multiple words
        // Handle first word
        words[startWordIndex] &= ~firstWordMask;

        // Handle intermediate words, if any
        for (int i = startWordIndex+1; i < endWordIndex; i++)
            words[i] = 0;

        // Handle last word
        words[endWordIndex] &= ~lastWordMask;
    }

    recalculateWordsInUse();
    checkInvariants();
}

set(int fromIndex, int toIndex, boolean value)

如果value为真,将[fromIndex, toIndex)的比特位置为1,否则置为0

public void set(int fromIndex, int toIndex, boolean value) {
    if (value)
        set(fromIndex, toIndex);
    else
        clear(fromIndex, toIndex);
}

flip(int bitIndex)

翻转bitIndex索引上的比特值

public void flip(int bitIndex) {
    if (bitIndex < 0)
        throw new IndexOutOfBoundsException("bitIndex < 0: " + bitIndex);

    int wordIndex = wordIndex(bitIndex);
    expandTo(wordIndex);

    // 对bitIndex上的比特值做异或操作
    words[wordIndex] ^= (1L << bitIndex);

    recalculateWordsInUse();
    checkInvariants();
}

flip(int fromIndex, int toIndex)

翻转[fromIndex, toIndex)的比特位

public void flip(int fromIndex, int toIndex) {
    checkRange(fromIndex, toIndex);

    if (fromIndex == toIndex)
        return;

    int startWordIndex = wordIndex(fromIndex);
    int endWordIndex   = wordIndex(toIndex - 1);
    expandTo(endWordIndex);

    long firstWordMask = WORD_MASK << fromIndex;
    long lastWordMask  = WORD_MASK >>> -toIndex;
    if (startWordIndex == endWordIndex) {
        // Case 1: One word
        words[startWordIndex] ^= (firstWordMask & lastWordMask);
    } else {
        // Case 2: Multiple words
        // Handle first word
        words[startWordIndex] ^= firstWordMask;

        // Handle intermediate words, if any
        for (int i = startWordIndex+1; i < endWordIndex; i++)
            words[i] ^= WORD_MASK;

        // Handle last word
        words[endWordIndex] ^= lastWordMask;
    }

    recalculateWordsInUse();
    checkInvariants();
}

cardinality()

计算整个BitSet中比特位为1的个数

public int cardinality() {
    int sum = 0;
    for (int i = 0; i < wordsInUse; i++)
        sum += Long.bitCount(words[i]);
    return sum;
}

@IntrinsicCandidate
public static int bitCount(long i) {
    // HD, Figure 5-2
    i = i - ((i >>> 1) & 0x5555555555555555L);
    i = (i & 0x3333333333333333L) + ((i >>> 2) & 0x3333333333333333L);
    i = (i + (i >>> 4)) & 0x0f0f0f0f0f0f0f0fL;
    i = i + (i >>> 8);
    i = i + (i >>> 16);
    i = i + (i >>> 32);
    return (int)i & 0x7f;
}

总结

BitSet中的实现充分利用了位运算,速度很快,因为是位图,所以占用空间也比较小。

下面是boolean数组和BitSet的空间占用对比

public class ClassLayoutTest {
    public static void main(String[] args) {
        boolean[] bits = new boolean[1024];
        System.out.println(ClassLayout.parseInstance(bits).toPrintable());
        BitSet bitSet = new BitSet(1024);
        System.out.println(GraphLayout.parseInstance(bitSet).toPrintable());
    }
}

[Z object internals:
 OFFSET  SIZE      TYPE DESCRIPTION                               VALUE
      0     4           (object header)                           01 00 00 00 (00000001 00000000 00000000 00000000) (1)
      4     4           (object header)                           00 00 00 00 (00000000 00000000 00000000 00000000) (0)
      8     4           (object header)                           05 00 00 f8 (00000101 00000000 00000000 11111000) (-134217723)
     12     4           (object header)                           00 04 00 00 (00000000 00000100 00000000 00000000) (1024)
     16  1024   boolean [Z.<elements>                             N/A
Instance size: 1040 bytes
Space losses: 0 bytes internal + 0 bytes external = 0 bytes total

java.util.BitSet@15615099d object externals:
          ADDRESS       SIZE TYPE             PATH                           VALUE
        716ccf4f8         24 java.util.BitSet                                (object)
        716ccf510        144 [J               .words                         [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

可以看出boolean数组,占用的空间是1040 bytes,BitSet占用的空间是24 + 144 = 168 bytes。

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
在C++的STL中实现由一个bitset类模板,其用法如下: std::bitset bs; 也就是说,这个bs只能支持64位以内的位存储和操作;bs一旦定义就不能动态增长了。本资源附件中实现了一个动态Bitset,和标准bitset兼容。 /** @defgroup Bitset Bitset位集类 * @{ */ //根据std::bitset改写,函数意义和std::bitset保持一致 class CORE_API Bitset: public Serializable { typedef typename uint32_t _Ty; static const int _Bitsperword = (CHAR_BIT * sizeof(_Ty)); _Ty * _Array; //最低位放在[0]位置,每位的默认值为0 int _Bits;//最大有效的Bit个数 private: int calculateWords()const; void tidy(_Ty _Wordval = 0); void trim(); _Ty getWord(size_t _Wpos)const; public: //默认构造 Bitset(); //传入最大的位数,每位默认是0 Bitset(int nBits); virtual ~Bitset(); //直接整数转化成二进制,赋值给Bitset,最高低放在[0]位置 Bitset(unsigned long long _Val); //拷贝构造函数 Bitset(const Bitset & b); Bitset(const char * str); Bitset(const std::string & str, size_t _Pos, size_t _Count); public: size_t size()const; //返回设置为1的位数 size_t count() const; bool subscript(size_t _Pos) const; bool get(size_t pos) const; //设置指定位置为0或1,true表示1,false表示0,如果pos大于数组长度,则自动扩展 void set(size_t _Pos, bool _Val = true); //将位数组转换成整数,最低位放在[0]位置 //例如数组中存放的1011,则返回13,而不是返回11 unsigned long long to_ullong() const; bool test(size_t _Pos) const; bool any() const; bool none() const; bool all() const; std::string to_string() const; public: //直接整数转化成二进制,赋值给Bitset,最高位放在[0]位置 Bitset& operator = (const Bitset& b); //直接整数转化成二进制,赋值给Bitset,最高位放在[0]位置 Bitset& operator = (unsigned long long ull); //返回指定位置的值,如果pos大于位数组长度,自动拓展 bool operator [] (const size_t pos); //测试两个Bitset是否相等 bool operator == (const Bitset & b); bool operator != (const Bitset & b); Bitset operator<>(size_t _Pos) const; bool operator > (const Bitset & c)const; bool operator < (const Bitset & c)const; Bitset& operator &=(const Bitset& _Right); Bitset& operator|=(const Bitset& _Right); Bitset& operator^=(const Bitset& _Right); Bitset& operator<>=(size_t _Pos); public: Bitset& flip(size_t _Pos); Bitset& flip(); //将高位与低位互相,如数组存放的是1011,则本函数执行后为1101 Bitset& reverse(); //返回左边n位,构成新的Bitset Bitset left(size_t n) const; //返回右边n位,构成新的Bitset Bitset right(size_t n) const; //判断b包含的位数组是否是本类的位数组的自串,如果是返回开始位置 size_t find (const Bitset & b) const; size_t find(unsigned long long & b) const; size_t find(const char * b) const; size_t find(const std::string & b) const; //判断本类的位数组是否是b的前缀 bool is_prefix(unsigned long long & b) const; bool is_prefix(const char * b) const; bool is_prefix(const std::string & b) const; bool is_prefix(const Bitset & b) const; void clear(); void resize(size_t newSize); void reset(const unsigned char * flags, size_t s); void reset(unsigned long long _Val); void reset(const char * _Str); void reset(const std::string & _Str, size_t _Pos, size_t _Count); //左移动n位,返回新的Bitset //extendBits=false "1101" 左移动2位 "0100"; //extendBits=true "1101" 左移动2位 "110100"; Bitset leftShift(size_t n,bool extendBits=false)const; //右移动n位,返回新的Bitset //extendBits=false "1101" 右移动2位 "0011"; //extendBits=true "1101" 右移动2位 "001101"; Bitset rightShift(size_t n, bool extendBits = false) const; public: virtual uint32_t getByteArraySize(); // returns the size of the required byte array. virtual void loadFromByteArray(const unsigned char * data); // load this object using the byte array. virtual void storeToByteArray(unsigned char ** data, uint32_t& length) ; // store this object in the byte array. };

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值