vector push_back 时间复杂度分析

  《C++程序设计语言》第4部分:标准库,里边写 vector,第一句如下:

The STL vector is the default container. Use it unless you have a good reason not to. If your suggested alternative is a list or built-in array, think twice.

  vector 确实是非常常用,比较基础,不想其他花里胡哨的 container 有很多问题可以问。我们知道 vector 在容量满了的时候的操作是:扩容(一般二倍,算法导论里证明这样均摊时间最少),复制,释放原来的空间,在面试过程中经常会问到 vector 的一个问题是:vector 的 push_back 时间复杂度是多少?

参考资料:《STL源码剖析》侯捷 著
http://cs.stackexchange.com/questions/9380/why-is-push-back-in-c-vectors-constant-amortized

  简单来说就是,均摊(Amortized)时间复杂度为O(1)。
  因为 vector 的 push_back 是在每次达到 capacity 才会申请新内存,然后复制过去,也就是进行 m * n 次操作,但是每 n 次操作才会进行这样的复制,所以相当于 n 次操作进行 (m + 1) * n 次操作,每次就是均摊 (m + 10),也就是常数时间 O(1)。

The important word here is “amortized”.
Amortized analysis is an analysis technique that examines a sequence of n operations. If the whole sequence runs in T(n) time, then each operation in the sequence runs in T(n)/n. The idea is that while a few operations in the sequence might be costly, they can’t happen often enough to weigh down the program. It’s important to note that this is different from average case analysis over some input distribution or randomized analysis. Amortized analysis established a worst case bound for the performance of an algorithm irrespective of the inputs. It’s most commonly used to analyse data structures, which have a persistent state throughout the program.

One of the most common examples given is the analysis of a stack with a multipop operations that pops k elements. A naive analysis of multipop would say that in the worst case multipop must take O(n) time since it might have to pop off all the elements of the stack. However, if you look at a sequence of operations, you’ll notice that the number of pops can not exceed the number of pushes. Thus over any sequence of n operations the number of pops can’t exceed O(n), and so multipop runs in O(1) amortized time even though occasionally a single call might take more time.

Now how does this relate to C++ vectors?
Vectors are implemented with arrays so to increase the size of a vector you must reallocate memory and copy the whole array over. Obviously, we wouldn’t want to do this very often. So, if you perform a push_back operation and the vector needs to allocate more space, it will increase the size by a factor m. Now this takes more memory, which you may not use in full, but the next few push_back operations all run in constant time.

Now if we do the amortized analysis of the push_back operation (which I found here) we’ll find that it runs in constant amortized time. Suppose you have n items and your multiplication factor is m. Then the number of relocations is roughly logm(n). The ith reallocation will cost proportional to mi, about the size of the current array. Thus the total time for n push back is ∑logm(n)i=1mi≈nmm−1, since it’s a geometric series. Divide this by n operations and we get that each operation takes mm−1, a constant. Lastly you have to be careful about choosing your factor m. If it’s too close to 1 then this constant gets too large for practical applications, but if m is too large, say 2, then you start wasting a lot of memory. The ideal growth rate varies by application, but I think some implementations use 1.5. The important word here is “amortized”. Amortized analysis is an analysis technique that examines a sequence of n operations. If the whole sequence runs in T(n) time, then each operation in the sequence runs in T(n)/n. The idea is that while a few operations in the sequence might be costly, they can’t happen often enough to weigh down the program. It’s important to note that this is different from average case analysis over some input distribution or randomized analysis. Amortized analysis established a worst case bound for the performance of an algorithm irrespective of the inputs. It’s most commonly used to analyse data structures, which have a persistent state throughout the program.

One of the most common examples given is the analysis of a stack with a multipop operations that pops k elements. A naive analysis of multipop would say that in the worst case multipop must take O(n) time since it might have to pop off all the elements of the stack. However, if you look at a sequence of operations, you’ll notice that the number of pops can not exceed the number of pushes. Thus over any sequence of n operations the number of pops can’t exceed O(n), and so multipop runs in O(1) amortized time even though occasionally a single call might take more time.

  • 2
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值