记录一道有意思的题目：ArrayList的add(int index, E e)时间复杂度小于O(n)吗?

最新推荐文章于 2024-01-17 19:25:34 发布

GaleZhang

最新推荐文章于 2024-01-17 19:25:34 发布

阅读量379

点赞数

分类专栏： Java 算法

本文链接：https://blog.csdn.net/GaleZhang/article/details/108182420

版权

Java 同时被 2 个专栏收录

15 篇文章 0 订阅

订阅专栏

算法

11 篇文章 0 订阅

订阅专栏

LeetCode 5497

查找大小为m的最新分组

给你一个数组 arr ，该数组表示一个从 1 到 n 的数字排列。有一个长度为 n 的二进制字符串，该字符串上的所有位最初都设置为 0 。

在从 1 到 n 的每个步骤 i 中（假设二进制字符串和 arr 都是从 1 开始索引的情况下），二进制字符串上位于位置 arr[i] 的位将会设为 1 。

给你一个整数 m ，请你找出二进制字符串上存在长度为 m 的一组 1 的最后步骤。一组 1 是一个连续的、由 1 组成的子串，且左右两边不再有可以延伸的 1 。

返回存在长度 恰好 为 m 的 一组 1  的最后步骤。如果不存在这样的步骤，请返回 -1 。

开始的思路是维护一个区间的list，倒序的去遍历数组，每遍历到一个数字我们就要将该位置处的数字删去，将包含该位置的区间一分为二，并判断分出的两个区间的长度是否有符合要求的，有的话则返回。没有则继续遍历。

由于list是有序的，因此可以二分的去查找包含某个位置的区间。不考虑将区间一分为二，则时间复杂度为O(NlogN)。但是由于要将区间一分为二，实际上需要在list中插入，同时二分的过程中需要获取任意索引处的值，因此无论是数组ArrayList还是链表LinkedList都无法实现O(1)，整个实际的时间复杂度为O( $N^2$ )。

考虑到数据规模为 $10^5$ ，这样的数据规模在LeetCode中 $N^2$ 级的时间复杂度一般是会超时的，因此考虑用别的办法。

另一种容易想到的解法是我们维护一个状态数组，每次删除一个元素我们就标志该位，并同时以该位为起点向左向右遍历，找到未被删除的连续区间的长度。解法如下：

public int findLatestStep(int[] arr, int m) {
    int n = arr.length;
    if(n == m) {
        return n;
    }
    boolean[] flag = new boolean[n];
    for(int i = n - 1; i >= 0; i--) {
        flag[arr[i] - 1] = true;
        int l = arr[i] - 2;
        while(l >= 0 && !flag[l]) {
            l--;
        }
        if(arr[i] - 2 - l == m) {
            return i;
        } 
        int r = arr[i];
        while(r < n && !flag[r]) {
            r++;
        }
        if(r - arr[i] == m) {
            return i;
        }
    }
    return -1;
}

但是这种解法最差情况下的时间复杂度也会到达O( $N^2$ )，如case：

[1,100000,2,99999,3,99998,4,99997,5,99996,6,99995,7,99994,8,99993,9,99992,10,99991,11,99990,12,99989,13,99988,14,99987,15,99986,16,99985,17,99984,18,99983,19,99982,20,99981,21,99980,22,99979,23,99978,24,99977,25,99976,26,99975,27,99974,28,99973,29,99972,30,99971,31,99970,32,99969,33,99968,34,99967,35,99966,36,99965,37,99964,38,99963,39,99962,40,99961,41,99960,42,99959,43,99958,44,99957,45,99956,46,99955,47,99954,48,99953,49,99952,50,99951,51,99950,52,99949,53,99948,54,99947,55,99946,56,99945,57,99944,58,99943,59,99942,60,99941,61,99940,62,99939,63,99938,64,99937,65,99936,66,99935,67,99934,68,99933,69,99932,70,99931,71,99930,72,99929,73,99928,74,99927,75,99926,76,99925,77,99924,78,99923,79,99922,80,99921,81,99920,82,99919,83,99918,84,99917,85,99916,86,99915,87,99914,88,99913,89,99912,90,99911,91,99910,92,99909,93,99908,94,99907,95,99906,96,99905,97,99904,98,99903,99,99902,100,99901...]

果不其然超时了。
抱着试一试的心态尝试了一下最开始的解法：

public int findLatestStep(int[] arr, int m) {
    int n = arr.length;
    if(n == m) {
        return n;
    }
    List<int[]> list = new ArrayList();
    list.add(new int[]{0, n});
    for(int i = n - 1; i >= 0; i--) {
        int toDel = arr[i] - 1;
        int l = 0;
        int r = list.size() - 1;
        while(l < r) {
            int mid = l + (r - l) / 2;
            int[] cur = list.get(mid);
            if(cur[0] <= toDel && cur[1] > toDel) {
                l = mid;
                r = mid;
            } else if(cur[0] > toDel) {
                r = mid;
            } else {
                l = mid + 1;
            }
        }
        if(toDel - list.get(l)[0] == m || list.get(l)[1] - toDel - 1 == m) {
            return i;
        } else {
            int temp = list.get(l)[0];
            list.get(l)[0] = toDel + 1;
            list.add(l, new int[]{temp, toDel});
        }
    }

发现竟然过了！难道说ArrayList的插入任意位置的时间复杂度要小于O(N)吗？
带着这个问题我打开了ArrayList的源码（JDK8.251）：
先看看add(int idx, E e)的源码：

    /**
     * Inserts the specified element at the specified position in this
     * list. Shifts the element currently at that position (if any) and
     * any subsequent elements to the right (adds one to their indices).
     *
     * @param index index at which the specified element is to be inserted
     * @param element element to be inserted
     * @throws IndexOutOfBoundsException {@inheritDoc}
     */
    public void add(int index, E element) {
        rangeCheckForAdd(index); //validata index

        ensureCapacityInternal(size + 1);  // Increments modCount!!
        System.arraycopy(elementData, index, elementData, index + 1,
                         size - index);
        elementData[index] = element;
        size++;
    }

插入逻辑还是比较简单的，整体来看是首先验证index的有效性，然后验证容量，不够时会进行扩容。
核心方法是将index开始的元素都往后移一位：System.arraycopy()方法，显然他决定了add()方法的时间复杂度。
让我们看一下System.arraycopy()方法（多嘴一句System类在lang包下）：

public static native void arraycopy(Object src,  int  srcPos,
                                    Object dest, int destPos,
                                    int length);

可以看到arraycopy()是一个native方法，具体的实现跟虚拟机有关，以open JDK的X86 windows为例，最终的核心逻辑实际上是：

// hotspot/src/os_cpu/windows_x86/vm/copy_windows_x86.inline.hpp
static void pd_conjoint_jlongs_atomic(jlong* from, jlong* to, size_t count) {
#ifdef AMD64
  assert(BytesPerLong == BytesPerOop, "jlongs and oops must be the same size");
  pd_conjoint_oops_atomic((oop*)from, (oop*)to, count);
#else
  // Guarantee use of fild/fistp or xmm regs via some asm code, because compilers won't.
  __asm {
    mov    eax, from;
    mov    edx, to;
    mov    ecx, count;
    cmp    eax, edx;
    jbe    downtest;
    jmp    uptest;
  up:
    fild   qword ptr [eax];
    fistp  qword ptr [edx];
    add    eax, 8;
    add    edx, 8;
  uptest:
    sub    ecx, 1;
    jge    up;
    jmp    done;
  down:
    fild   qword ptr [eax][ecx*8];
    fistp  qword ptr [edx][ecx*8];
  downtest:
    sub    ecx, 1;
    jge    down;
  done:;
  }
#endif // AMD64
}

static void pd_conjoint_oops_atomic(oop* from, oop* to, size_t count) {
  // Do better than this: inline memmove body  NEEDS CLEANUP
  if (from > to) {
    while (count-- > 0) {
      // Copy forwards
      *to++ = *from++;
    }
  } else {
    from += count - 1;
    to   += count - 1;
    while (count-- > 0) {
      // Copy backwards
      *to-- = *from--;
    }
  }
}

(以上参考了文章memcpy的疑问)可以看到也是一个个复制来实现的，实际上也是O(N)级的时间复杂度。那么究竟是为什么第二种方法更快呢？

个人推测，第二种有一个明显的优势是占用的存储空间更小，因为其维护的是区间，那么在存储空间更小的情况下，其cache miss的次数显然会更少，访问内存的次数减少了，其性能会有一个比较大的提高，这个是可以预见的。

另外，实际上第一种方法向前向后共遍历了两次，而第二种方法复制list只遍历了一次，理论上来讲是减少了O(N)的系数的。不知道这会不会带来性能的提升。

所以其实正确答案本人也并不完全清楚，仅能给出自己的一些推测。但总结来说，不用过于畏惧使用ArrayList的add(int idx, E e)方法，他的性能实际要比你所想象的强得多。

GaleZhang

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
记录一道有意思的题目：ArrayList的add(int index, E e)时间复杂度小于O(n)吗?

LeetCode 5497查找大小为m的最新分组给你一个数组 arr ，该数组表示一个从 1 到 n 的数字排列。有一个长度为 n 的二进制字符串，该字符串上的所有位最初都设置为 0 。在从 1 到 n 的每个步骤 i 中（假设二进制字符串和 arr 都是从 1 开始索引的情况下），二进制字符串上位于位置 arr[i] 的位将会设为 1 。给你一个整数 m ，请你找出二进制字符串上存在长度为 m 的一组 1 的最后步骤。一组 1 是一个连续的、由 1 组成的子串，且左右两边不再有可以延伸的 1
复制链接

扫一扫

专栏目录