安卓性能优化一数据结构

刘知意

已于 2023-02-08 13:49:48 修改

阅读量561

点赞数

分类专栏：性能优化文章标签：数据结构 android 哈希算法

于 2023-01-09 16:45:50 首次发布

本文链接：https://blog.csdn.net/liuwushuang233/article/details/128616995

版权

性能优化专栏收录该内容

4 篇文章 0 订阅

订阅专栏

ArrayList与LinkedList

ArrayList查找快，增删慢，内部为数组，连续空间，地址带顺序查找修改快，增加，删除底层为System.copy操作，而copy为循环赋值，末尾添加删除不受影响。
LinkedList增删快，查找慢，内部操作node，是链表，插入删除只操作该节点的头尾指针即可，内存不连续，查找是轮询的方式，使用的for循环耗时操作。查找修改慢

选择方式：数据不进行大量增删，只按顺序排列显示用ArrayList如listview，recycleview；显示的数据包含用户可以进行删除操作，使用LinkedList；

HashMap：1.7之前Android24之前使用的数组保存，数组的构造使用的链表，数组（ArrayList） + 链表，整体为一个数组，数组内每一个元素，为链表，数组和链表共同构成一个节点；1.8之后，除了数组和链表外还有红黑树(二叉树，平衡二叉树)，LinkedHaspMap双向指针。

1.7：

transient Entry[]  table = (Entry<K,V>[]) EMPTY_TABLE;

如何保证K:V唯一对应，查看put（）方法

public V put(K key, V value) {
    if (table == EMPTY_TABLE) {
        inflateTable(threshold);
    }
    if (key == null)
        return putForNullKey(value);
    int hash = hash(key);
    int i = indexFor(hash, table.length);
    for (Entry<K, V> e = table[i]; e != null; e = e.next) {
        Object k;
        if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
            V oldValue = e.value;
            e.value = value;
            e.recordAccess(this);
            return oldValue;
        }
    }
 
    modCount++;	
    addEntry(hash, key, value, i);
    return null;
}

final int hash(Object k) {
        int h = hashSeed;
        if (0 != h && k instanceof String) {
            return sun.misc.Hashing.stringHash32((String) k);
        }
        h ^= k.hashCode();
        h ^= (h >>> 20) ^ (h >>> 12);
        return h ^ (h >>> 7) ^ (h >>> 4);
    }

    static int indexFor(int h, int length) {
        return h & (length-1);
    }

void addEntry(int hash, K key, V value, int bucketIndex) {
    
    	//sizi大于阈值 2倍扩容	
        if ((size >= threshold) && (null != table[bucketIndex])) {
            resize(2 * table.length);
            hash = (null != key) ? hash(key) : 0;
            bucketIndex = indexFor(hash, table.length);
        }
        createEntry(hash, key, value, bucketIndex);
    }

void createEntry(int hash, K key, V value, int bucketIndex) {  
        Entry<K,V> e = table[bucketIndex]; 
        table[bucketIndex] = new Entry<>(hash, key, value, e);
        size++;
    }

put过程：k为object类型，根据k找到int类型的hashcode，装箱过程，table []大小未知，根据hashcode % table的length求模运算，得到范围0-length-1，求模运算等价于位运算，源码中使用的是位运算，更快，往JVM转为字节码时速度更快，这时得到下标index即源码中的i，通过下标i找到要操作的位置，完成k的任务。然后进行 addEntry(hash, key, value, i);调用了createEntry方法，先把下标i记录成e，然后使用HashMapEntry赋值，new一个新的节点，新节点指向e，再把新节点赋值给table[bucketIndex]即头插法，将新节点放到i的位置。

put: k:Object -> hashcode :int -> (hashcode % length) == (h & (length -1))-> :0~length-1 index;

哈希碰撞：得到index的过程是hash运算的位移运算（求模），求模是多对一的过程，多对一的过程，hashcode1 hashcode2 会得到相同的index，于是出现了哈希碰撞（哈希冲突），hashmap提供了解决碰撞的方法：链表法，将新加入的的节点作为下一个节点的next节点。

cpu：所有操作都是位运算，其中最快的就是位置，&或|运算而不是+

查看get（）方法

public V get(Object key) {   
    if (key == null)   
        return getForNullKey();   
    int hash = hash(key.hashCode());   
    for (Entry<K,V> e = table[indexFor(hash, table.length)];   
        e != null;   
        e = e.next) {   
        Object k;   
        if (e.hash == hash && ((k = e.key) == key || key.equals(k)))   
            return e.value;   
    }   
    return null;   
}

类似get（）先找到index ，然后轮询table[]这个位置的链表。

扩容问题：

加载因子：final float loadFactor = 0.75；（这个表超过百分之多少开始扩容）

阈值： 0.75f*16(length)=12;element（所有存的元素）>12即扩容，

默认hashmap大小：16，即new Hashmap（）内的table[16]，大小需为2的次幂

扩容的意义：0.6-0.75做加载因子最合适，数学家测试的结果。提升效率，当一个表全部冲突的时候效率最低退化成单链表，增删高效，查找低。避免冲突，长度更大，冲突的可能性更低

addEntry时如果大于阈值2倍扩容，调用resize（）

void resize(int newCapacity) {
    Entry[] oldTable = table;
    int oldCapacity = oldTable.length;
    if (oldCapacity == MAXIMUM_CAPACITY) {
        threshold = Integer.MAX_VALUE;
        return;
    }

    Entry[] newTable = new Entry[newCapacity];
    transfer(newTable);//用来将原先table的元素全部移到newTable里面
    table = newTable;  //再将newTable赋值给table
    threshold = (int)(newCapacity * loadFactor);//重新计算临界值
}

扩容的时候对hasp表进行转移transfer

void transfer(Entry[] newTable, boolean rehash) {
        int newCapacity = newTable.length;
        //遍历旧表
        for (Entry<K,V> e : table) {
            //将所有的节点进行hash运算
            while(null != e) {
                Entry<K,V> next = e.next;
                if (rehash) {
                    e.hash = null == e.key ? 0 : hash(e.key);
                }
                int i = indexFor(e.hash, newCapacity);
                e.next = newTable[i];
                newTable[i] = e;
                e = next;
            }
        }
    }

扩容耗性能，避免扩容，创建时应该评估hash表的大小，（大小/0.75+1），如何保证大小为2的次幂，这就和put时有关。表空时调用inflateTable(threshold);

private void inflateTable(int toSize) {
    // Find a power of 2 >= toSize
    int capacity = roundUpToPowerOf2(toSize);

    threshold = (int) Math.min(capacity * loadFactor, MAXIMUM_CAPACITY + 1);
    table = new Entry[capacity];
    initHashSeedAsNeeded(capacity);//初始化hashSeed变量
}

private static int roundUpToPowerOf2(int number) {
    // assert number >= 0 : "number must be non-negative";
    return number >= MAXIMUM_CAPACITY
        ? MAXIMUM_CAPACITY
        : (number > 1) ? Integer.highestOneBit((number - 1) << 1) : 1;
}

会进行运算，转为最近的2的次幂。hash表真正的初始化是在put的时候，而不是new的时候。

初始化：put时候，防止创建未用，再put时，才真正初始化

大小为2的次幂原因：保证h & (length -1)运算，如 16-1的二进制位:1111,32-1为：11111,如果不是2的次幂，如length为10，length-1为9，二进制为：1001，进行与运算后只有最高位和最低位起作用，2的次幂的话，起作用的值更多，碰撞的可能性更低

例子：

	十进制	二进制（hash值）	与运算
h1	6	0110
length1-1	9	1001	0000
length2-1	15	1111	0110
h2	7	0111
length1-1	9	1001	0001
length2-1	15	1111	0111

hashmap有阈值，25%的内存浪费空间换时间，尤其扩容的时候，如果多1个节点就扩容了两倍。

安卓中出现了SparseArray：使用双数组，一个存key 一个存value

public class SparseArray<E> implements Cloneable {
    private static final Object DELETED = new Object();
    private boolean mGarbage = false;
    private int[] mKeys;
    private Object[] mValues;
    private int mSize;

key为int类型，value对object

public void put(int key, E value) {
    int i = ContainerHelpers.binarySearch(mKeys, mSize, key);
    //原来已经有key，可能是remove后，value存放着DELETED，也可能是存放旧值，那么就替换
    if (i >= 0) {
        mValues[i] = value;
    } else {
        //没有找到，对i取反，得到i= lo（ContainerHelpers.binarySearch）
        i = ~i;
        //如果i小于数组长度，且mValues==DELETED(i对应的Key被延迟删除了)
        if (i < mSize && mValues[i] == DELETED) {
            //直接取代，实现真实删除原键值对
            mKeys[i] = key;
            mValues[i] = value;
            return;
        }
        //数组中可能存在延迟删除元素且当前数组长度满，无法添加
        if (mGarbage && mSize >= mKeys.length) {
            //真实删除，将所有延迟删除的元素从数组中清除；
            gc();
            //清除后重新确定当前key在数组中的目标位置；
            // Search again because indices may have changed.
            i = ~ContainerHelpers.binarySearch(mKeys, mSize, key);
        }
        //不存在垃圾或者当前数组仍然可以继续添加元素，不需要扩容，则将i之后的元素全部后移，数组中仍然存在被DELETED的垃圾key；
        mKeys = GrowingArrayUtils.insert(mKeys, mSize, i, key);
        mValues = GrowingArrayUtils.insert(mValues, mSize, i, value);
        //新元素添加成功，潜在可用元素数量+1
        mSize++;
    }
}

class ContainerHelpers {

// This is Arrays.binarySearch(), but doesn't do any argument validation.
//第一个参数array为keys的数组，第二个为数组中元素个数（与keys的length不一定相等），第三个value为目标的key
static int binarySearch(int[] array, int size, int value) {
    //lo为二分查找的左边界
    int lo = 0;
    //hi为二分查找的右边界
    int hi = size - 1;
    //还没找到，继续查找
    while (lo <= hi) {
        //左边界+右边界处以2，获取到mid 的index
        final int mid = (lo + hi) >>> 1;
        //获取中间元素
        final int midVal = array[mid];
        // 目标key在右部分  。。。。感觉这部分太简单了
        if (midVal < value) {
            lo = mid + 1;
        } else if (midVal > value) {
            hi = mid - 1;
        } else {
            //相等，找到了，返回key对应在array的下标；
            return mid;  // value found
        }
    }
    //没有找到该元素，对lo取反！！！！！很重要
    return ~lo;  // value not present
}

寻找key使用二分查找，找到该插入的index后，后续的元素使用arraycopy。

内存节约，速度不会慢，使用的二分查找，一个一个放for循环放入，乱序二分。越用越快，remove的时候，把移除的下标标记为delet，下次插入到这里，直接放，不要数组位移。空间复用，效率更高。

public void delete(int key) {
    //查找对应key在数组中的下标，如果存在，返回下标，不存在，返回下标的取反；
    int i = ContainerHelpers.binarySearch(mKeys, mSize, key);
    //key存在于mKeys数组中，将元素删除，用DELETED替换原value，起标记作用；
    if (i >= 0) {
        if (mValues[i] != DELETED) {
            mValues[i] = DELETED;
            mGarbage = true;
        }
    }
}
 
/**
 * @hide
 * Removes the mapping from the specified key, if there was any, returning the old value.
 */
public E removeReturnOld(int key) {
    int i = ContainerHelpers.binarySearch(mKeys, mSize, key);
 
    if (i >= 0) {
        if (mValues[i] != DELETED) {
            final E old = (E) mValues[i];
            mValues[i] = DELETED;
            mGarbage = true;
            return old;
        }
    }
    return null;
}
 
/**
 * Alias for {@link #delete(int)}.
 */
public void remove(int key) {
    delete(key);
}

缺点：key只能是int值

ArrayMap：HashMap + SparseArray思想

@Override
    public V put(K key, V value) {
        //当前容量
        final int osize = mSize;
        //key的散列值
        final int hash;
        //key的hash所在的下标
        int index;

        if (key == null) {
            //key为空hash值为0
            hash = 0;
            //找到key的hash值的下标
            index = indexOfNull();
        } else {
            //key的hash值
            hash = mIdentityHashCode ? System.identityHashCode(key) : key.hashCode();
            // 找到key的hash值的下标
            index = indexOf(key, hash);
        }
        if (index >= 0) {
            //当前要添加的元素已经存在，则直接进行替换操作
            index = (index<<1) + 1;
            final V old = (V)mArray[index];
            mArray[index] = value;
            return old;
        }
        //取反得到要添加元素的位置
        index = ~index;
        if (osize >= mHashes.length) {
            //扩容新的容量
            final int n = osize >= (BASE_SIZE*2) ? (osize+(osize>>1))
                    : (osize >= BASE_SIZE ? (BASE_SIZE*2) : BASE_SIZE);

            if (DEBUG) Log.d(TAG, "put: grow from " + mHashes.length + " to " + n);
            //原hash数组
            final int[] ohashes = mHashes;
            //原散列表
            final Object[] oarray = mArray;
            //扩容操作
            allocArrays(n);

            if (CONCURRENT_MODIFICATION_EXCEPTIONS && osize != mSize) {
                throw new ConcurrentModificationException();
            }

            if (mHashes.length > 0) {
                if (DEBUG) Log.d(TAG, "put: copy 0-" + osize + " to 0");
                //将原数组中的拷贝回新数组中
                System.arraycopy(ohashes, 0, mHashes, 0, ohashes.length);
                System.arraycopy(oarray, 0, mArray, 0, oarray.length);
            }
            //回收释放操作
            freeArrays(ohashes, oarray, osize);
        }

        if (index < osize) {
            if (DEBUG) Log.d(TAG, "put: move " + index + "-" + (osize-index)
                    + " to " + (index+1));
            //将index处（含index）及其之后的数据往后移
            System.arraycopy(mHashes, index, mHashes, index + 1, osize - index);
            System.arraycopy(mArray, index << 1, mArray, (index + 1) << 1, (mSize - index) << 1);
        }

        if (CONCURRENT_MODIFICATION_EXCEPTIONS) {
            if (osize != mSize || index >= mHashes.length) {
                throw new ConcurrentModificationException();
            }
        }
        //将数据添加到index处
        mHashes[index] = hash;
        mArray[index<<1] = key;
        mArray[(index<<1)+1] = value;
        mSize++;
        return null;
    }

private void allocArrays(final int size) {
        if (mHashes == EMPTY_IMMUTABLE_INTS) {
            //扩容时如果mHashes 是不可变的，则抛出异常
            throw new UnsupportedOperationException("ArrayMap is immutable");
        }
        if (size == (BASE_SIZE*2)) {
            //如果扩容容量为8（BASE_SIZE=4）
            synchronized (ArrayMap.class) {
                if (mTwiceBaseCache != null) {
                    /**
                      如果当前有容量为8的int缓存可复用数组和容量为16的object缓存可复用数组，则复用这些数组，而不重新new
                     */
                    final Object[] array = mTwiceBaseCache;
                    mArray = array;
                    mTwiceBaseCache = (Object[])array[0];
                    mHashes = (int[])array[1];
                    array[0] = array[1] = null;
                    mTwiceBaseCacheSize--;
                    if (DEBUG) Log.d(TAG, "Retrieving 2x cache " + mHashes
                            + " now have " + mTwiceBaseCacheSize + " entries");
                    return;
                }
            }
        } else if (size == BASE_SIZE) {
             //如果扩容容量为4（BASE_SIZE=4）
            synchronized (ArrayMap.class) {
                if (mBaseCache != null) {
                    /**
                      如果当前有容量为4的int缓存可复用数组和容量为8的object缓存可复用数组，则复用这些数组，而不重新new
                     */
                    final Object[] array = mBaseCache;
                    mArray = array;
                    mBaseCache = (Object[])array[0];
                    mHashes = (int[])array[1];
                    array[0] = array[1] = null;
                    mBaseCacheSize--;
                    if (DEBUG) Log.d(TAG, "Retrieving 1x cache " + mHashes
                            + " now have " + mBaseCacheSize + " entries");
                    return;
                }
            }
        }

        mHashes = new int[size];
        mArray = new Object[size<<1];
    }

    private static void freeArrays(final int[] hashes, final Object[] array, final int size) {
        if (hashes.length == (BASE_SIZE*2)) {
            //如果当前容量为8（BASE_SIZE=4）
            synchronized (ArrayMap.class) {
                if (mTwiceBaseCacheSize < CACHE_SIZE) {
                    //缓存当前数组，并将数组下标为2之后的数据设置为null
                    array[0] = mTwiceBaseCache;
                    array[1] = hashes;
                    for (int i=(size<<1)-1; i>=2; i--) {
                        array[i] = null;
                    }
                    mTwiceBaseCache = array;
                    mTwiceBaseCacheSize++;
                    if (DEBUG) Log.d(TAG, "Storing 2x cache " + array
                            + " now have " + mTwiceBaseCacheSize + " entries");
                }
            }
        } else if (hashes.length == BASE_SIZE) {
            //如果当前容量为4（BASE_SIZE=4）
            synchronized (ArrayMap.class) {
                //缓存当前数组，并将数组下标为2之后的数据设置为null
                if (mBaseCacheSize < CACHE_SIZE) {
                    array[0] = mBaseCache;
                    array[1] = hashes;
                    for (int i=(size<<1)-1; i>=2; i--) {
                        array[i] = null;
                    }
                    mBaseCache = array;
                    mBaseCacheSize++;
                    if (DEBUG) Log.d(TAG, "Storing 1x cache " + array
                            + " now have " + mBaseCacheSize + " entries");
                }
            }
        }
    }

 @Override
  public V get(Object key) {
        //取得key的hashcode所在mHashes的下标,
        final int index = indexOfKey(key);
        /**
        根据mArray的数据存储结构，得知mHashes的 下标*2  (index << 1)便得到其对应元素在mArray的起始下标，第一个是key，第二个是value
        */
        return index >= 0 ? (V)mArray[(index<<1)+1] : null;
  }
public int indexOfKey(Object key) {
        return key == null ? indexOfNull()
                : indexOf(key, mIdentityHashCode ? System.identityHashCode(key) : key.hashCode());
}

int indexOf(Object key, int hash) {
        final int N = mSize;

        // Important fast case: if nothing is in here, nothing to look for.
        if (N == 0) {
            return ~0;
        }
        
        //二分法找到hash所在的下标
        int index = binarySearchHashes(mHashes, N, hash);

        // If the hash code wasn't found, then we have no entry for this key.
        if (index < 0) {
            //没找到，直接返回
            return index;
        }

        // If the key at the returned index matches, that's what we want.
        if (key.equals(mArray[index<<1])) {
            //如果hash下标对应mArray中的key与要找的key相等，直接返回当前下标
            return index;
        }
        //出现冲突处理方案
        //遍历当前index之后元素，找到匹配的key所在的下标
        // Search for a matching key after the index.
        int end;
        for (end = index + 1; end < N && mHashes[end] == hash; end++) {
            if (key.equals(mArray[end << 1])) return end;
        }
        //遍历当前index之前元素，找到匹配的key所在的下标
        // Search for a matching key before the index.
        for (int i = index - 1; i >= 0 && mHashes[i] == hash; i--) {
            if (key.equals(mArray[i << 1])) return i;
        }

        // Key not found -- return negative value indicating where a
        // new entry for this key should go.  We use the end of the
        // hash chain to reduce the number of array entries that will
        // need to be copied when inserting.
        return ~end;
}

遇到hash冲突使用追加的方式，冲突时候index累加的方式。

性能提升：空间和时间的选择问题

刘知意

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
安卓性能优化一数据结构

k为object类型，根据k找到int类型的hashcode，装箱过程，table []大小未知，根据hashcode % table的length求模运算，得到范围0-length-1，求模运算等价于位运算，源码中使用的是位运算，更快，往JVM转为字节码时速度更快，这时得到下标index即源码中的i，通过下标i找到要操作的位置，完成k的任务。扩容耗性能，避免扩容，创建时应该评估hash表的大小，（大小/0.75+1），如何保证大小为2的次幂，这就和put时有关。避免冲突，长度更大，冲突的可能性更低。
复制链接

扫一扫

专栏目录