C Tutorial: A Beginner’s Guide to std::vector, Part 1 - JohnHau/mis GitHub Wiki
https://www.codeguru.com/cplusplus/c-tutorial-a-beginners-guide-to-stdvector-part-1/
This C++ tutorial is meant to help beginning and intermediate C++ programmers better understand the standard template class in the C++ programming language. Interested in learning more about C? Visit the TechRepublic Academy.
Using C++ Vectors The final technical vote of the C++ Standard took place on November 14th, 1997; that was quite a while ago. However, significant parts of the Standard, especially the Standard Library, are still not very popular among many C++ users. A constant reader of CodeGuru’s C++ forums may notice that many questions and answers still imply hand-crafted solutions that could be very elegantly solved by using the Standard Library.
One issue that comes up very often is the use of C-style arrays, with all their problems and drawbacks. People seem to be scared of the standard vector and its brother deque. One reason might be that the Standard Library documentation is mostly pretty elliptic and esoteric.
In this article, I will discuss C++ vectors and try to explain them in a way that is more accessible and understandable. I do not claim that this article is by any means complete; it is meant to give you a start in using vectors in C++ and to help you avoid the most common pitfalls. We will start small and will not try to handle the topic very academically.
Introduction to vector in C++ Vector is a template class that is a perfect replacement for the good old C-style arrays. It allows the same natural syntax that is used with plain arrays but offers a series of services that free the C++ programmer from taking care of the allocated memory and help to operate consistently on the contained objects.
The first step using vector is to include the appropriate header:
#include Note that the header file name does not have any extension; this is true for all of the Standard Library header files. The second thing to know is that all of the Standard Library lives in the namespace std. This means that you have to resolve the names by prepending std:: to them:
std::vector v; // declares a vector of integers For small projects, you can bring the entire namespace std into scope by inserting a using directive on top of your .cpp file:
#include using namespace std; //... vector v; // no need to prepend std:: any more This is okay for small projects, as long as you write the using directive in your .cpp file. Never write a using directive into a header file! This would bloat the entire namespace std into each and every .cpp file that includes that header. For larger projects, it is better to explicitly qualify every name accordingly. I am not a fan of such shortcuts. In this article, I will qualify each name accordingly. I will introduce some typedefs in the examples where appropriate—for better readability.
What is std::vector in C++ What is std::vector v;? It is a template class that will wrap an array of Ts. In this widely used notation, ‘T’ stands for any data type, built-in, or user-defined class. The vector will store the Ts in a contiguous memory area that it will handle for you, and let you access the individual Ts simply by writing v[0], v[1], and so on, exactly like you would do for a C-style array.
Note that for bigger projects it can be tedious to repeatedly write out the explicit type of the vectors. You may use a typedef if you want:
typedef std::vector int_vec_t; // or whatever you // want to name it //... int_vec_t v; Do not use a macro!
#define int_vec_t std::vector ; // very poor style! For the beginning, let’s see what a vector can do for us. Let’s start small and take the example of an array of integers. If you used plain arrays, you had either a static or a dynamic array:
size_t size = 10; int sarray[10]; int *darray = new int[size]; // do something with them: for(int i=0; i<10; ++i){ sarray[i] = i; darray[i] = i; } // don't forget to delete darray when you're done delete [] darray; Let’s do the same thing using a vector:
#include //... size_t size = 10; std::vector<int> array(size); // make room for 10 integers, // and initialize them to 0 // do something with them: for(int i=0; i<size; ++i){ array[i] = i; } // no need to delete anything As you see, vector combines the advantages of both the static and the dynamic array because it takes a non-const size parameter such as the dynamic one and automatically deletes the used memory like the static one.
The standard vector defines the operator [], to allow a “natural” syntax. For the sake of performance, the operator [] does not check whether the index is a valid one. Similar to a C-style array, using an invalid index will mostly buy you an access violation.
In addition to operator [], vector defines the member function at(). This function does the same thing as the operator [], but checks the index. If the index is invalid, it will throw an object of class std::out_of_range.
std::vector<int> array; try{ array.at(1000) = 0; } catch(std::out_of_range o){ std::cout<<o.what()<<std::endl; } Depending on the implementation of the C++ Standard Library you use, the above snippet will print a more or less explicit error message. STLPort prints the word “vector”, the Dinkumware implementation that comes with Visual C++ prints “invalid vector subscript”. Other implementations may print something else.
Note that vector is a standard container. The controlled sequence also can be accessed using iterators. More on iterators later in this article. For now, let’s keep it simple.
Now, what if you don’t know how many elements you will have? If you were using a C-style array to store the elements, you’d either need to implement a logic that allows a developer to grow your array from time to time, or you would allocate an array that is “big enough.” The latter is a poor man’s approach and the former will give you a headache. Not so vector:
#include #include //... std::vector array; char c = 0; while(c != 'x'){ std::cin>>c; array.push_back(c); } In the previous example, push_back() appends one element at a time to the array. This is what we want, but it has a small pitfall. To understand what that is, you have to know that a vector has a so-called ‘controlled sequence’ and a certain amount of allocated storage for that sequence. The controlled sequence is just another name for the array in the guts of the vector. To hold this array, vector will allocate some memory, mostly more than it needs. You can push_back() elements until the allocated memory is exhausted. Then, vector will trigger a reallocation and will grow the allocated memory block. This can mean that it will have to move (that means: copy) the controlled sequence into a larger block. And copying around a large number of elements can slow down your application dramatically. Note that the reallocation is absolutely transparent for you (barring catastrophic failure—out of memory). You need to do nothing; vector will do all what that takes under the hood. Of course, there is something you can do to avoid having vector reallocate the storage too often. Just read on.
In the previous example, we declared the vector using its default constructor. This creates an empty vector. Depending on the implementation of the Standard Library being used, the empty vector might or might not allocate some memory “just in case.” If we want to avoid a too-often reallocation of the vector‘s storage, we can use its reserve() member function:
#include #include //... std::vector array; array.reserve(10); // make room for 10 elements char c = 0; while(c != 'x'){ std::cin>>c; array.push_back(c); } The parameter we pass to reserve() depends on the context, of course. The function reserve() will ensure that we have room for at least 10 elements in this case. If the vector already has room for the required number of elements, reserve() does nothing. In other words, reserve() will grow the allocated storage of the vector, if necessary, but will never shrink it.
As a side note, the following two code snippets are not the same thing:
// snip 1: std::vector v(10); // snip 2: std::vector v; v.reserve(10); The first snippet defines a vector containing 10 integers, and initializes them with their default value (0). If we hadn’t integers but some user-defined class, vector would call the default ctor 10 times and contain 10 readily constructed objects. The second snippet defines an empty vector, and then tells it to make room for 10 integers. The vector will allocate enough memory to hold at least 10 integers, but will not initialize this memory. If we had no integers, but some user-defined class, the second snippet wouldn’t construct any instance of that class.
To find out how many elements would fit in the currently allocated storage of a vector, use the capacity() member function. To find out how many elements are currently contained by the vector, use the size() member function:
#include #include //... std::vector array; int i = 999; // some integer value array.reserve(10); // make room for 10 elements array.push_back(i); std::cout<<array.capacity()<<std::endl; std::cout<<array.size()<<std::endl; This will print
10 1 That means that the number of elements that can be added to a vector without triggering a reallocation always is capacity() – size().
Note that, for the previous example, only 0 is a valid index for array. Yes, we have made room for at least 10 elements with reserve(), but the memory is not initialized. Because int is a built-in type, writing all 10 elements with operator [] would actually work, but we would have a vector that is in an inconsistent state, because size() would still return 1. Moreover, if we tried to access the other elements than the first using array.at(), a
std::out_of_range would be thrown. At a first sight, this may seem inconvenient, but a closer look reveals why this is so: If the vector contained objects of a user-defined class, reserve() wouldn’t call any ctor. Accessing a not-yet-constructed object has undefined results and is a no-no in any case. The important thing to remember is that the role of reserve() is to minimize the number of potential reallocations and that it will not influence the number of elements in the controled sequence. A call to reserve() with a parameter smaller than the current capacity() is benign—it simply does nothing.
The correct way of enlarging the number of contained elements is to call vector‘s member function resize(). The member function resize() has following properties:
If the new size is larger than the old size of the vector, it will preserve all elements already present in the controlled sequence; the rest will be initialized according to the second parameter. If the new size is smaller than the old size, it will preserve only the first new_size elements. The rest is discarded and shouldn’t be used any more—consider these elements invalid. If the new size is larger than capacity(), it will reallocate storage so all new_size elements fit. resize() will never shrink capacity(). Example:
std::vector array; // create an empty vector array.reserve(3); // make room for 3 elements // at this point, capacity() is 3 // and size() is 0 array.push_back(999); // append an element array.resize(5); // resize the vector // at this point, the vector contains // 999, 0, 0, 0, 0 array.push_back(333); // append another element into the vector // at this point, the vector contains // 999, 0, 0, 0, 0, 333 array.reserve(1); // will do nothing, as capacity() > 1 array.resize(3); // at this point, the vector contains // 999, 0, 0 // capacity() remains 6 // size() is 3 array.resize(6, 1); // resize again, fill up with ones // at this point the vector contains // 999, 0, 0, 1, 1, 1 Another way to enlarge the number of controlled elements is to use push_back(). In certain cases, this might be more efficient than calling resize() and then writing the elements. Let’s have a closer look under the hood of vector, by looking at the following example:
class X { public: X():val_(0){} X(int val):val_(val){} int get(){return val_;} void set(int val){val_=val;} private: int val_; }; //.... std::vector ax; // create an empty vector containing // objects of type class X // version 1: ax.resize(10); // resize the controlled sequence for(int i=0; i<10; ++i){ ax[i].set(i); // set each element's value } //... // version 2: ax.reserve(10); // make room for 10 elements for(int i=0; i<10; ++i){ ax.push_back(X(i)); // insert elements using the second ctor } The two versions are equivalent, meaning that they will produce the same result. In both cases, we start with an empty vector. In the first version, we use resize() to grow the size of the controlled sequence to 10 elements. This will not only reallocate the vectors storage, but will also construct a sequence of 10 elements, using the default ctor of X. When resize() is finished, we will have 10 valid objects of type X in our vector, all of them having val_ == 0, because that’s what the default ctor of X does. In a second step, we pick every X in the sequence and use X::set() to change its val_.
In the second version, we call reserve() to make room for 10 elements. The vector will reallocate its storage and do nothing more than that. No element is constructed yet. In a second step, we create 10 objects of type X using the second ctor, thus giving them directly the correct value, and push_back() them into the vector.
Which method is more efficient? That probably also depends on the implementation of the Standard Library, but the second version is likely to be slightly more efficient because it doesn’t call X::set() for each element.
Now that we have seen how to declare a vector and how to fill it up, let’s see how we can operate on it. We will start with an analogy to C-style arrays and will progressively discover other possibilities, that are better or safer.
There are two ways of accessing a C-style array: either by using the subscript operator, or by using pointers. Also, passing a C-style array to a function means passing a pointer to the first element. Can we do the same thing with a vector? The answer is yes. Let’s take a small example:
#include
double mean(double *array, size_t n) { double m=0; for(size_t i=0; i<n; ++i){ m += array[i]; } return m/n; }
int main() { double a[] = {1, 2, 3, 4, 5}; std::cout<<mean(a, 5)<<std::endl; // will print 3 return 0; } When we say mean(a, 5), the first parameter actually is the address of the first element in the array &a[0]. We know that a vector is required to keep its elements in a contiguous block of memory, in order. That means that we can pass the address of the first element of a vector to the function mean() and it will work:
int main() { std::vector a; a.push_back(1); a.push_back(2); a.push_back(3); a.push_back(4); a.push_back(5); std::cout<<mean(&a[0], 5)<<std::endl; // will print 3 return 0; } That’s nice, but it’s still not quite the same. We were able to directly initialize the C-style array, but we had to push_back() the elements into the vector. Can we do better? Well, yes. We cannot directly use an initializer list for the vector, but we can use an intermediary array:
double p[] = {1, 2, 3, 4, 5}; std::vector a(p, p+5); Here we use another constructor provided by vector. It takes two parameters: a pointer to the first element of a C-style array and a pointer to one past the last element of that array. It will initialize the vector with a copy of each element in the array. Two things are important to note: The array is copied and it does not somehow go into the possession of the newly created vector, and the range we supply is from the first element to one past the last element in the array.
Understanding the second point is crucial when working with vectors or any other standard containers. The controlled sequence is always expressed in terms of [first, one-past-last)—not only for ctors, but also for every function that operates on a range of elements.
When taking the address of elements contained in a vector, there is something you have to watch out for: an internal reallocation of the vector will invalidate the pointers you hold to its elements.
std::vector v(5); int *pi = &v[3]; v.push_back(999); // <-- may trigger a reallocation *pi = 333; // <-- probably an error, pi isn't valid any more In the previous example, we take the address of the fourth element of the vector and store it in pi. Then we push_back() another element to the end of the vector. Then we try to use pi. Boom! The reason is that push_back() may trigger a reallocation of v‘s internal storage if this is not large enough to hold the additional element, too. pi will then point to a memory address that has just been deleted, and using it has undefined results. The bad news is that the vector might or might not reallocate the internal storage—you can’t tell on the general case. The solution is either not to use pointers that might have been invalidated, or to make sure that the vector won’t reallocate. The latter means to use reserve() wisely in order to have the vector handle memory (re)allocation at defined times.
From the member functions we have seen so far, only push_back() and resize() can invalidate pointers into the vector. There are other member functions that invalidate pointers; we will discuss them later in this tutorial.
Note that both the subscript operator and the member function at() never invalidate pointers into the vector.
Speaking of pointers into the vector, we can introduce a standard concept at this point: iterators. Iterators are the way the Standard Library models a common interface for all containers—vector, list, set, deque, and so on. The reason is that operations that are “natural” for one container (like subscripting for vector) do not make sense for other containers. The Standard Library needs a common way of applying algorithms like iterating, finding, sorting to all containers—thus the concept of iterators.
An iterator is a handle to a contained element. You can find an exact definition in your favorite textbook, if you want. The internal representation of an iterator is irrelevant at this point. Important is that if you have an iterator, you can dereference it to obtain the element it “points” to (for vector the most natural implementation of an iterator is indeed a plain vanilla pointer—but don’t count on this). Let’s get a grip on iterators with a small example:
#include #include
int main() { std::vector a; std::vector::const_iterator i; a.push_back(1); a.push_back(2); a.push_back(3); a.push_back(4); a.push_back(5); for(i=a.begin(); i!=a.end(); ++i){ std::cout<<(*i)<<std::endl; } return 0; } Let’s take this small program step by step:
std::vector::const_iterator i; This declares a const iterator i for a vector. We are using a const iterator because we do not intend to modify the contents of the vector.
...i=a.begin();... The member function begin() returns an iterator that “points” to the first element in the sequence.
...i!=a.end();... The member function end() returns an iterator that “points” to one-past-the-last-element in the sequence. Note that dereferencing the iterator returned by end() is illegal and has undefined results.
...++i You can advance from one element to the next by incrementing the iterator.
Note that the same program, but using pointers instead of iterators, leads to a very similar construct:
#include #include
int main() { std::vector a; const double *p; a.push_back(1); a.push_back(2); a.push_back(3); a.push_back(4); a.push_back(5); for(p=&a[0]; p!=&a[0]+5; ++p){ std::cout<<(*p)<<std::endl; } return 0; } So, if we can use pointers to basically achieve the same thing in the same way, why bother with iterators at all? The answer is that we have to use iterators if we want to apply some standard algorithm, like sorting, to the vector. The Standard Library does not implement the algorithms as member functions of the various containers, but as free template functions that can operate on many containers.
The combination of standard containers in general (and vector in particular) and standard algorithms, is a very powerful tool; unfortunately, much too often neglected by programmers. By using it you can avoid large portions of hand crafted, error-prone code, and it enables you to write compact, portable, and maintainable programs.
Let’s have a look at the member functions vector provides:
C++ Vector Functions: Constructors A complete set of C++ constructors, C++ destructor, and copy operator is provided. Let’s have a look at them on the example of a vector of standard strings:
typedef std::vector<std::string> str_vec_t; str_vec_t v1; // create an empty vector str_vec_t v2(10); // 10 copies of empty strings str_vec_t v3(10, "hello"); // 10 copies of the string // "hello" str_vec_t v4(v3); // copy ctor
std::list<std::string> sl; // create a list of strings
// and populate it
sl.push_back("cat");
sl.push_back("dog");
sl.push_back("mouse");
str_vec_t v5(sl.begin(), sl.end()); // a copy of the range in // another container // (here, a list)
v1 = v5; // will copy all elements // from v5 to v1 The assign() function in C++ The assign() function will reinitialize the vector. We can pass either a valid element range using the [first, last) iterators or we can specify the number of elements to be created and the element value.
v1.assign(sl.begin(), sl.end()); // copies the list into // the vector v1.assign(3, "hello"); // initializes the vector // with 3 strings "hello" The assignment completely changes the elements of the vector. The old elements (if any) are discarded and the size of the vector is set to the number of elements assigned. Of course, assign() may trigger an internal reallocation.
Stack operations in C++ We have seen the function push_back(). It appends an element to the end of the controlled sequence. There is a counterpart function, pop_back(), that removes the last element in the controlled sequence. The removed element becomes invalid, and size() is decremented. Note that pop_back() does not return the value of the popped element. You have to peek it before you pop it. The reason why this is so is exception-safe. Popping on an empty vector is an error and has undefined results.
std::vector v; v.push_back(999); v.pop_back(); Note that pop_back() does not shrink the capacity().
Predefined iterators in C++ We have seen the iterators begin() and end(). They point to the first, respectively, to one-past-the-last element in the controlled sequence. There also are rbegin() and rend() which point to the first, respectively, to the one-past-the-last element of the reverse sequence. Note that both rbegin() and rend() return the type reverse_iterator (or const_reverse_iterator for their const versions)—which is not the same as iterator, (respectively const_iterator). To obtain a “normal” iterator from a reverse iterator, use reverse_iterator‘s base() member function:
std::vector v; v.push_back(999); std::vector::reverse_iterator r = v.rbegin(); std::vector::iterator i = r.base(); // will point to the last // element in the sequence Element Access in C++ We have seen the subscript operator [] that provides unchecked access and the member function at(), which will throw an object of type std::out_of_range if the index passed is invalid. Two other member functions exist, front() and back(), which return a reference to the first, respectively the last element in the controlled sequence. Note that they do not return iterators!
std::vector v; v.push_back(999); // fill up the vector //... // following statements are equivalent: int i = v.front(); int i = v[0]; int i = v.at(0); int i = *(v.begin()); // following statements are equivalent: int j = v.back(); int j = v[v.size()-1]; int j = v.at(v.size()-1); int j = *(v.end()-1); Note that we cannot write *(–v.end()) because v.end() is not a l-value.
List Operations in C++ A few operations provided by vector are actually native for list. They are provided by the most containers and deal with inserting and erasing elements in the middle of the controlled sequence. Let’s demonstrate them by some examples:
#include #include int main() { std::vector q; q.push_back(10); q.push_back(11); q.push_back(12);
std::vector<int> v;
for(int i=0; i<5; ++i){
v.push_back(i);
}
// v contains 0 1 2 3 4
std::vector<int>::iterator it = v.begin() + 1;
// insert 33 before the second element:
it = v.insert(it, 33);
// v contains 0 33 1 2 3 4
// it points to the inserted element
//insert the contents of q before the second element:
v.insert(it, q.begin(), q.end());
// v contains 0 10 11 12 33 1 2 3 4
// iterator 'it' is invalid
it = v.begin() + 3;
// it points to the fourth element of v
// insert three time -1 before the fourth element:
v.insert(it, 3, -1);
// v contains 0 10 11 -1 -1 -1 12 33 1 2 3 4
// iterator 'it' is invalid
// erase the fifth element of v
it = v.begin() + 4;
v.erase(it);
// v contains 0 10 11 -1 -1 12 33 1 2 3 4
// iterator 'it' is invalid
// erase the second to the fifth element:
it = v.begin() + 1;
v.erase(it, it + 4);
// v contains 0 12 33 1 2 3 4
// iterator 'it' is invalid
// clear all of v's elements
v.clear();
return 0;
} Note that both insert() and erase() may invalidate any iterators you might hold. The first version of insert() returns an iterator that points to the inserted element. The other two versions return void. Inserting elements may trigger a reallocation. In this case, all iterators in the container become invalid. If no reallocation occurs (for example, by a call to reserve() prior to inserting), only iterators printing between the insertion point and the end of the sequence become invalid.
Erasing elements never triggers a reallocation, nor does it influence the capacity(). However, all iterators that point between the first element erased and the end of the sequence become invalid.
Calling clear() removes all elements from the controlled sequence. The memory allocated is not freed, however. All iterators become invalid, of course.
Note that both insert() and erase() are not very efficient for vectors. They are expected to perform in amortized linear time, O(n)+. If your application often uses insertion and erasure, vector probably isn’t the best choice of a container for you.
Comparison Operations in C++ You can compare the contents of two vectors on an element-by-element basis using the operators ==, != and <. Two vectors are equal if both have the same size() and the elements are correspondingly equal. Note that the capacity() of two equal vectors need not to be the same. The operator < orders the vector‘s lexicographically.
std::vector v1, v2; //... if(v1 == v2) ... Swapping Contents in Vectors Using C++ Sometimes, it is practical to be able to swap() the contents of two vectors. A common application is forcing a vector to release the memory it holds. We have seen that erasing the elements or clearing the vector doesn’t influence its capacity() (in other words, the memory allocated). We need to do a small trick:
std::vector v; //... v.clear(); v.swap(std::vector(v)); Normally (see below), vectors simply swap their guts. In the previous example, we create a temporary vector by using the copy ctor and swap() its contents with v. The temporary object will receive the entire memory held by v and v will receive the memory held by the temporary object—which is likely to allocate nothing on creation. The temporarily created vector is destroyed at the end of the above statement, and all the memory formally held by v is freed.
The vector class template has a second, default template parameter:
template<class T, class A = allocator > class vector ... The allocator is a class that supplies the functions used by the container to allocate and deallocate memory for its elements. In this tutorial, we assumed that we have a default allocator and we will continue to assume this. For the sake of completeness, note that swap() will perform in constant time (simply swap the guts of the two vectors) if both allocators are the same. This is in most cases so.
In this first tutorial, we have scratched the surface of the Standard Library and met std::vector. In the next tutorial, we will have a look at more advanced topics related to vector, respectively applying standard algorithms to it, and will discuss design decisions, such as the question of when to store objects and when to store pointers to objects in the vector. We also will introduce a close relative of the vector, that is, std::deque.