C++ STL vector

16 篇文章 3 订阅

作为一个effective C++ programmer, 必须需要了解STL是如何工作的。

vector 是STL中最常用的一个容器。 下面主要谈谈vector的一些实现上的细节, 偏向于应用。


vector 是一个dynamically allocadted array of elements(动态分配的数组)。 一旦了解了vector的工作原理, 那么接下来学习其他的STL的component就变得很容易的。

#include <iostream>
#include <vector>

using namespace std;

int main() {
    vector<int> v;
    cout << "If not specified, the default size of the vector is(vector constructor): ";
    cout << "capacity: " <<  v.capacity() << endl; // 0
    cout << "Size: " << v.size() << endl;

    cout << "Afterwards: " << endl;
    v.push_back(4); // 1
    cout << "capacity: " <<  v.capacity() << endl; // 0
    cout << "Size: " << v.size() << endl;

    v.push_back(3); // 2
    cout << "capacity: " <<  v.capacity() << endl; // 0
    cout << "Size: " << v.size() << endl;

    v.push_back(8); // 4
    cout << "capacity: " <<  v.capacity() << endl; // 0
    cout << "Size: " << v.size() << endl;

    v.push_back(5);
    cout << "capacity: " <<  v.capacity() << endl; // 0
    cout << "Size: " << v.size() << endl;

    v.push_back(2); // 8
    cout << "capacity: " <<  v.capacity() << endl; // 0
    cout << "Size: " << v.size() << endl;

    return 0;
}
运行结果为:


不难看出, 我们最开始是一个empty vector of  integers。 随着我们不断的push_back, 我们发现, 配置的vector的空间一旦满了, 就开辟成原来的vector二倍的容量大小, 将旧的vector的内容复制到新的vector中, 注意新的vector的capacity是旧的vector的容量的两倍。 依次类推。

具体的, 执行完下面的语句后, 我们在内存中的存储情况如下:

vector<int> v;
v.push_back(4);
v.push_back(3);
v.push_back(8);
v.push_back(5);
v.push_back(2);
情况如下:


上图中,  begin 指向the start of the bufffer, 可以通过使用vector的成员函数begin()获得这个迭代器。

end指向one  past the end of the valid, initialized elements of the vector。 可以通过end() 获取。  

这样, 我们不难得到, vector 的size(已经存放的元素的个数, 注意不是容量): end – begin = size, 或者 end() – begin()。

capacity pointers 指向one past the end of the buffer. 我们在上图中使用虚线框表示这个位置 : one past the end of the buffer, 并且这个位置不是 a legal part of the buffer. It tracks how much room for growth the vector has in its current buffer. 函数 capacity()这个buffer的Size。  即capacity – begin。

注意无论何时, there are exactly one buffer that stores all data in。  但是有个例外就是一个empty vector 没有buffer。

Each time we call push_back(), adding a new element to the vector, the end pointer will move one step forward.
What happens when our buffer runs out of space to store new elements? A new buffer will be allocated, twice the size [1] of the old buffer, all of our elements will be copied into the new buffer, and then the old buffer will be destroyed. Suppose on our vector from above we now made the following calls:

v.push_back(9);
v.push_back(10);
v.push_back(12);

执行完成之后, 我们的vector的存放方式如下:



此时我们的vector就满了。 即end = capacity。  如果此时我们在向当前的buffer push_back(), 如下:

v.push_back(1);

接下来, vector 将会allocate a new buffer of size 16。 然后copy all elements to the new buffer, 释放掉旧的buffer, 如下:



当然, vector 的buffer的容器满了的时候, 扩张的新的buffer的大小可以使原来的buffer的任意的倍数。 也可以说, 

 A vector can grow to any size, constrained only by available memory and addressability。

这样好处是什么呢: 就是将push_back 的操作变成 amortized cost of O(1)。 参见CLRS书中的amortized analysis, 动态数组的扩张

下面我们对vector的performance进行分析。

许多人会认为vector的性能比较差, 但是实际上不是的。 我们可以使用amortized analysis 进行分析。 是的每一个push_back 的 amortized cost 是O(1), 即常数的时间完成这一操作。


当我们的vector的buffer是1024的时候, 如果我们我们的这个1024的buffer满了的时候, 如果我们再次push_back, 的时候, 我们已经进行的copy 的操作

In actual fact, it’s not nearly as bad as most people intuit. To begin with, let’s think about how many elements the vector will be unnecessarily copying. Suppose you grew a vector to the size of 1024 elements, your vector’s buffer would be full. Now add another element, and the vector has to copy its buffer of 1024 elements over. Additionally, on the way to growing to size 1024, you had to copy as many as 1 + 2 + 4 + 8 + 16 + 32 + 64 + 128 + 256 + 512 = 1023 elements. So, to grow to 1025 elements, the vector has to make as many as 1023 + 1024 = 2047 element copies in overhead. More generally, if you grow your vector to size N, the vector might be making as many as 2*N element copies when it reallocates buffers, and on average will make 1.5*N such element copies. Note that the key to all of this is how vector grows exponentially. If you implemented vector by simply adding, say, 128 elements to its buffer every time it has to resize, instead of doubling its buffer, then the number of copies and performance would be terrible.


Is this really very bad? Actually, almost certainly not. Unless you are storing a bulky object that is expensive to copy in your vector, the copying overhead is likely trivial in comparison to the performance benefits of vector.


Formally, calling push_back() on a vector runs in O(1) amortized time. 


How do you choose which? Generally, the larger and bulkier the object is, the more likely it is you want to store pointers to it, rather than the object itself. Storing a vector<int*> would be very inefficient, since the pointers would be as large or larger than the integers and you’d have to have the overhead of the memory allocations too. But for a large object, like Frogatto’s custom_object class, a vector<custom_object*> is probably what we want. Note that to store an object directly, it must be copyable, i.e. have accessible copy constructors and assignment operators.


Note also that if you store a vector of pointers, the vector will not manage the memory pointed to by the pointers. If you want the object’s memory to be managed for you, you could use a vector
<boost::shared_ptr<particle> > to have a vector of ‘smart pointers’ that manage the memory they point to.


Now, let’s move on to some more things you can do with a vector. Let’s look at how we would iterate over all the elements of our vector and sum them up:


int sum = 0;
for(vector<int>::const_iterator i = v.begin(); i != v.end(); ++i) {
  sum += *i;
}
Now what is this ‘vector::const_iterator’ thing? Well, an iterator is a concept the STL introduces, intended to generalize the concept of a pointer. A pointer works well for moving over elements, inspecting them, modifying them, etc, but only if your underlying storage is an array — a flat buffer of data. If, in C, you wanted to iterate over a linked list, for instance, you’d likely have to write a loop that looks something like this:




for(node* ptr = list->begin; ptr != NULL; ptr = ptr->next) { ... }


…and then differently again for a data structure like a deque, and so forth. An iterator is a type that looks and behaves like a pointer, providing either all of a pointer’s operations, or at least a defined subset of them. The idea of an iterator is to allow access of members of a data structure using a uniform syntax, regardless of the data structure’s underlying implementation.


Thus, just think of a vector::const_iterator as behaving exactly like a const int* does.


Another important concept to understand regarding vector is known as iterator invalidation. Remember how when we push_back() on a vector, and it runs out of space, it’ll reallocate the buffer? Think about if you had a pointer to one of the elements within the vector. That pointer would now point to the old buffer, the one that has been destroyed.


In C++ terms, calling push_back() on a vector invalidates all iterators into that vector. Once you call push_back() all iterators you have into the vector are unusable, and the only thing you can legally do with them is reassign them to a new value. Reading or writing to the iterators may cause the program to crash, or a host of other nasty behavior.


This effect can be rather subtle. For instance, we had code something like this in Frogatto to detect collisions between objects:


for(vector<object_ptr>::iterator i = objects_.begin(); i != objects_.end(); ++i) {
  for(vector<object_ptr>::iterator j = objects_.begin(); j != objects_.end(); ++j) {
    if(i != j && objects_collide(*i, *j)) {
      handle_collide_event(*i, *j);
    }
  }
}
This code iterates over every object pair to see if there are collisions. Simple enough, right? Now what if inside an object’s collide event it spawns some new objects, and that spawning adds the new objects to the objects_ vector? Then the objects_ vector’s iterators are all invalidated, including i and j. We are still using them in our loops though! To make matters worse, it only occurs if the new objects happen to trigger a reallocation of the buffer.


Note that you can iterate over a vector using indexes, which a lot of people find easier/simpler than using iterators:




for(int i = 0; i < v.size(); ++i) {
...use v[i] instead of *i...
}


This has similar performance, but note it’s less general. You can’t use this approach to iterate over most of the other STL containers. In Frogatto, we have a foreach macro we use for most simple iterations.


So far we’ve covered growing a vector using push_back, and iterating over it. Let’s look quickly at the other operations vector supports:


pop_back(): This is the inverse to push_back, taking the last element off the vector
resize(): This lets you change the size of the vector to any size you want.
reserve() : This changes the capacity of the vector. Note that this doesn’t change the vector’s size, it just changes the size of the underlying buffer, to give more room for expansion of the buffer before the buffer has to be resized. Unlike calling resize(), this doesn’t change the behavior of the program, just the performance.
insert()/erase(): These operations allow you to add new elements anywhere in the vector, not just at the back. Note however that these operations take O(N) time.
front()/back(): Convenience operations to look at the first and last element of the vector
Remember that vector represents a contiguous buffer. If you want to erase an element in the middle of the vector, all the elements in front of the erased element will have to be shuffled back.


As an example, suppose you had a vector of particles, and wanted to remove all of the ones that have expired. Do not do this:


vector<particle>::iterator i = v.begin();
while(i != v.end()) {
  if(particle_expired(*i)) {
    i = v.erase(i);
  } else {
    ++i;
  }
}
Note that this code does do the correct thing. erase() will invalidate the iterator being erased, but it will return a valid iterator to the next element. The loop carefully moves over all particles erasing the expired ones. But the performance of this is potentially terrible.  Suppose we had a million particles in our vector, and they have all expired. We’ll be doing around half a trillion particle copies, just to empty out our vector!


So how should we do this? There is an algorithm supplied in the STL designed to do exactly that, called remove_if. This is how you do it:


vector<particle>::iterator end_vaild = remove_if(v.begin(), v.end(), particle_expired);
v.erase(end_valid, v.end());
How does this work? Firstly, all of the STL algorithms operate on iterators not on containers. If they operated on containers, you’d have to have a version for each type of container. So, the remove_if algorithm takes an iterator to the beginning, and then to the end of the sequence we want to operate on. It also takes the function to call on each element to see if it is to be removed.


Remove_if efficiently shuffles all elements that should not be removed forward, overwriting elements that should be removed. At the end of the call to remove_if, all the remaining elements are at the front of the sequence. Let us illustrate this with an example, suppose our vector contains particles with the following ID’s:




[4, 8, 2, 12, 19, 3, 7, 18]


Now suppose all particles with ID’s lower than 6 are being expired. After the call to remove_if, our vector will now contain this:




[8, 12, 19, 7, 18, ??, ??, ??]


See how everything less than 6 has been removed. Everything over 6 is now at the front of the vector, with the order maintained. However, the size of our vector hasn’t changed — because remove_if only has access to iterators, and iterators can’t change the size of the vector they point into — now at the end of the vector are some ‘garbage’ undefined values.


Fortunately, remove_if provides a convenience way to resize the vector and remove the garbage. It returns an iterator to the end of the valid values. So we use this iterator to remove all the invalid values at the end, with our erase call.


One final operation vector has which I want to talk about is swap(). You can call swap() on two vectors and they will be efficiently swapped. That is, they will simply swap the pointer values they have. This is useful in a variety of situations, for instance to ‘move’ a vector into a new location without the expense of a copy. Also, we have discussed the way a vector grows its buffer. Yet if you call shrinking operations such as resize() with a smaller value or pop_back() or even clear(), a vector never shrinks its buffer. So if you call push_back() a million times on a vector, then call clear(), the vector will still hold a huge buffer of memory. This is probably a reasonable decision, since any shrinking would risk pathological situations where a vector keeps growing and shrinking its buffer. However, it is useful to be able to shrink a vector by hand. Here’s a way you can manually clear a vector, so it has no buffer at all:




{
vector<int> tmp;
tmp.swap(v);
} //tmp is destroyed here, taking the swapped buffer with it.


There’s a lot more to understand about vectors, and all their implications. Hopefully this gives a good overview. Unless you have a very good reason not to, vector is generally the best container to store things in in C++. It is efficient and compact, and generally gives the best real-world performance of any container.


[1] Actually, the C++ Standard only requires that a vector grow exponentially, it doesn’t specify the exponent. So a vector could make its new buffer four times the size of its old buffer, for instance, or one-and-a-half times. But all implementations I know of double it each time.




  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
1 目标检测的定义 目标检测(Object Detection)的任务是找出图像中所有感兴趣的目标(物体),确定它们的类别和位置,是计算机视觉领域的核心问题之一。由于各类物体有不同的外观、形状和姿态,加上成像时光照、遮挡等因素的干扰,目标检测一直是计算机视觉领域最具有挑战性的问题。 目标检测任务可分为两个关键的子任务,目标定位和目标分类。首先检测图像中目标的位置(目标定位),然后给出每个目标的具体类别(目标分类)。输出结果是一个边界框(称为Bounding-box,一般形式为(x1,y1,x2,y2),表示框的左上角坐标和右下角坐标),一个置信度分数(Confidence Score),表示边界框中是否包含检测对象的概率和各个类别的概率(首先得到类别概率,经过Softmax可得到类别标签)。 1.1 Two stage方法 目前主流的基于深度学习的目标检测算法主要分为两类:Two stage和One stage。Two stage方法将目标检测过程分为两个阶段。第一个阶段是 Region Proposal 生成阶段,主要用于生成潜在的目标候选框(Bounding-box proposals)。这个阶段通常使用卷积神经网络(CNN)从输入图像中提取特征,然后通过一些技巧(如选择性搜索)来生成候选框。第二个阶段是分类和位置精修阶段,将第一个阶段生成的候选框输入到另一个 CNN 中进行分类,并根据分类结果对候选框的位置进行微调。Two stage 方法的优点是准确度较高,缺点是速度相对较慢。 常见Tow stage目标检测算法有:R-CNN系列、SPPNet等。 1.2 One stage方法 One stage方法直接利用模型提取特征值,并利用这些特征值进行目标的分类和定位,不需要生成Region Proposal。这种方法的优点是速度快,因为省略了Region Proposal生成的过程。One stage方法的缺点是准确度相对较低,因为它没有对潜在的目标进行预先筛选。 常见的One stage目标检测算法有:YOLO系列、SSD系列和RetinaNet等。 2 常见名词解释 2.1 NMS(Non-Maximum Suppression) 目标检测模型一般会给出目标的多个预测边界框,对成百上千的预测边界框都进行调整肯定是不可行的,需要对这些结果先进行一个大体的挑选。NMS称为非极大值抑制,作用是从众多预测边界框中挑选出最具代表性的结果,这样可以加快算法效率,其主要流程如下: 设定一个置信度分数阈值,将置信度分数小于阈值的直接过滤掉 将剩下框的置信度分数从大到小排序,选中值最大的框 遍历其余的框,如果和当前框的重叠面积(IOU)大于设定的阈值(一般为0.7),就将框删除(超过设定阈值,认为两个框的里面的物体属于同一个类别) 从未处理的框中继续选一个置信度分数最大的,重复上述过程,直至所有框处理完毕 2.2 IoU(Intersection over Union) 定义了两个边界框的重叠度,当预测边界框和真实边界框差异很小时,或重叠度很大时,表示模型产生的预测边界框很准确。边界框A、B的IOU计算公式为: 2.3 mAP(mean Average Precision) mAP即均值平均精度,是评估目标检测模型效果的最重要指标,这个值介于0到1之间,且越大越好。mAP是AP(Average Precision)的平均值,那么首先需要了解AP的概念。想要了解AP的概念,还要首先了解目标检测中Precision和Recall的概念。 首先我们设置置信度阈值(Confidence Threshold)和IoU阈值(一般设置为0.5,也会衡量0.75以及0.9的mAP值): 当一个预测边界框被认为是True Positive(TP)时,需要同时满足下面三个条件: Confidence Score > Confidence Threshold 预测类别匹配真实值(Ground truth)的类别 预测边界框的IoU大于设定的IoU阈值 不满足条件2或条件3,则认为是False Positive(FP)。当对应同一个真值有多个预测结果时,只有最高置信度分数的预测结果被认为是True Positive,其余被认为是False Positive。 Precision和Recall的概念如下图所示: Precision表示TP与预测边界框数量的比值 Recall表示TP与真实边界框数量的比值 改变不同的置信度阈值,可以获得多组Precision和Recall,Recall放X轴,Precision放Y轴,可以画出一个Precision-Recall曲线,简称P-R
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值