这是我第一次使用映射,我意识到插入元素的方法有很多种。您可以使用emplace(),操作符[]或insert(),加上使用value_type或make_pair等变量。虽然有很多关于他们所有人的信息和关于特定案例的问题,但我仍然不能理解大局。 我的两个问题是:
它们各自的优势是什么? 是否有必要在标准中增加放置位置?在没有它之前,有什么事情是不可能的吗?
这是我第一次使用映射,我意识到插入元素的方法有很多种。您可以使用emplace(),操作符[]或insert(),加上使用value_type或make_pair等变量。虽然有很多关于他们所有人的信息和关于特定案例的问题,但我仍然不能理解大局。 我的两个问题是:
它们各自的优势是什么? 是否有必要在标准中增加放置位置?在没有它之前,有什么事情是不可能的吗?
放置:利用右值引用来使用您已经创建的实际对象。这意味着没有复制或移动构造函数被调用,这对大型对象很好!O (log (N))的时间。
Insert:具有标准左值引用和右值引用的重载,以及指向要插入元素列表的迭代器,以及关于元素所属位置的“提示”。使用“hint”迭代器可以将插入所花费的时间降低到常数时间,否则就是O(log(N))时间。
操作符[]:检查对象是否存在,如果存在,则修改对该对象的引用,否则使用提供的键和值对这两个对象调用make_pair,然后执行与insert函数相同的工作。这是O(log(N))次。
make_pair:只做一对。
There was no "need" for adding emplace to the standard. In c++11 I believe the && type of reference was added. This removed the necessity for move semantics, and allowed optimization of some specific type of memory management. In particular, the rvalue reference. The overloaded insert(value_type &&) operator does not take advantage of the in_place semantics, and is therefore much less efficient. While it provides the capability of dealing with rvalue references, it ignores their key purpose, which is in place construction of objects.
除了优化机会和更简单的语法之外,插入和插入之间的一个重要区别是,后者允许显式转换。(这适用于整个标准库,而不仅仅是地图。)
下面是一个例子:
#include <vector>
struct foo
{
explicit foo(int);
};
int main()
{
std::vector<foo> v;
v.emplace(v.end(), 10); // Works
//v.insert(v.end(), 10); // Error, not explicit
v.insert(v.end(), foo(10)); // Also works
}
诚然,这是一个非常具体的细节,但是当您处理用户定义的转换链时,有必要记住这一点。
在映射的特殊情况下,旧的选项只有两个:操作符[]和插入(不同风格的插入)。我将开始解释这些。
操作符[]是一个查找或添加操作符。它将尝试在映射中找到具有给定键的元素,如果它存在,它将返回对存储值的引用。如果没有,它将创建一个新元素插入到默认初始化的位置,并返回对该元素的引用。
insert函数(在单元素类型中)接受一个value_type (std::pair<const Key,Value>),它使用键(第一个成员)并尝试插入它。因为std::map不允许重复,如果有一个现有的元素,它将不会插入任何东西。
两者之间的第一个区别是操作符[]需要能够构造一个默认初始化值,因此它不适用于不能默认初始化的值类型。两者之间的第二个区别是,当已经有一个具有给定键的元素时会发生什么。insert函数不会修改映射的状态,而是返回一个指向元素的迭代器(并返回一个false,表示它没有被插入)。
// assume m is std::map<int,int> already has an element with key 5 and value 0
m[5] = 10; // postcondition: m[5] == 10
m.insert(std::make_pair(5,15)); // m[5] is still 10
在insert的情况下,实参是一个value_type对象,可以通过不同的方式创建。你可以直接用适当的类型构造它,或者传递任何可以构造value_type的对象,这就是std::make_pair发挥作用的地方,因为它允许简单地创建std::pair对象,尽管它可能不是你想要的…
以下调用的净效果是类似的:
K t; V u;
std::map<K,V> m; // std::map<K,V>::value_type is std::pair<const K,V>
m.insert( std::pair<const K,V>(t,u) ); // 1
m.insert( std::map<K,V>::value_type(t,u) ); // 2
m.insert( std::make_pair(t,u) ); // 3
但它们并不是完全一样的……[1]和[2]是等价的。在这两种情况下,代码都会创建一个相同类型的临时对象(std::pair<const K,V>)并将其传递给insert函数。insert函数将在二叉搜索树中创建适当的节点,然后将参数中的value_type部分复制到该节点。使用value_type的好处是,value_type总是匹配value_type,你不能把std::pair参数的类型输入错误!
差值是[3]。函数std::make_pair是一个模板函数,它将创建一个std::pair。签名为:
template <typename T, typename U>
std::pair<T,U> make_pair(T const & t, U const & u );
I have intentionally not provided the template arguments to std::make_pair, as that is the common usage. And the implication is that the template arguments are deduced from the call, in this case to be T==K,U==V, so the call to std::make_pair will return a std::pair<K,V> (note the missing const). The signature requires value_type that is close but not the same as the returned value from the call to std::make_pair. Because it is close enough it will create a temporary of the correct type and copy initialize it. That will in turn be copied to the node, creating a total of two copies.
这可以通过提供模板参数来修复:
m.insert( std::make_pair<const K,V>(t,u) ); // 4
但这仍然很容易出错,就像显式地键入情况[1]中的类型一样。
到目前为止,我们有不同的调用insert的方法,需要在外部创建value_type并将该对象复制到容器中。或者,如果类型是默认可构造和可赋值的(故意只关注m[k]=v),则可以使用operator[],并且它需要一个对象的默认初始化,并将值复制到该对象中。
在c++ 11中,有了可变模板和完美转发,就有了一种通过放置(在适当位置创建)的方式向容器中添加元素的新方法。不同容器中的emplace函数做的基本相同的事情:该函数不是获取要复制到容器中的源,而是接受将转发到容器中存储的对象的构造函数的参数。
m.emplace(t,u); // 5
在[5]中,std::pair<const K, V>没有被创建并传递给emplace,而是将对t和u对象的引用传递给emplace,并将它们转发给数据结构内部value_type子对象的构造函数。在这种情况下,std::pair<const K,V>根本不会被复制,这是emplace相对于c++ 03的优势。与插入的情况一样,它不会覆盖映射中的值。
一个有趣的问题是,我没有想过如何在地图上实现放置,这在一般情况下并不是一个简单的问题。
假设你正在将Foo对象添加到set<Foo>对象中,Foo(int)是一个构造函数,并且Foo具有复制/移动构造函数。那么主要的“大局”区别在于:
emplace(0) - which calls set::emplace(int && my_args) - forwards the given arguments (i.e. the int 0) to a Foo constructor somewhere in the definition of the set::emplace method (e.g. there is a call like Foo(0) somewhere in that method's code). Importantly: NO Foo copy or move constructor is called. insert(0) - which calls set::insert(Foo && value) - first (1) creates the Foo object Foo(0) (because 0 has type int but insert needs an object of type Foo as input) that is used as the method's argument (i.e. as value). And then (2) this Foo object (in value) is used as an argument to a Foo (copy or move) constructor somewhere in the definition of the set::insert method.
下面的代码通过跟踪每个构造函数调用并在它们发生时告诉您有关它们的信息,展示了insert()与emplace()的不同之处的“大局思想”。将输出与代码进行比较可以明确insert()和emplace()之间的区别。
代码很简单,但有点长,所以为了节省时间,我建议您先阅读摘要,然后快速浏览一下代码(这应该足以理解代码及其输出)。
代码摘要:
The Foo class: uses static int foo_counter to keep track of the total number of Foo objects that have been constructed (or moved, copied, etc.) thus far. Each Foo object stores the (unique) value of foo_counter at the time of its creation in its local variable val. The unique object with val == 8 is called "foo8" or "Foo 8" (ditto for 1, 2, etc.). Every constructor/destructor call prints info about the call (e.g. calling Foo(11) will output "Foo(int) with val: 11"). main()'s body: insert()s and emplace()s Foo objects into an unordered_map<Foo,int> object umap with calls such as "umap.emplace(Foo(11), 0);" and "umap.insert({12, 0})" (0 is just some arbitrary int, it can be any value). Every line of code is printed to cout before it is executed.
Code
#include <iostream>
#include <unordered_map>
#include <utility>
using namespace std;
//Foo simply outputs what constructor is called with what value.
struct Foo {
static int foo_counter; //Track how many Foo objects have been created.
int val; //This Foo object was the val-th Foo object to be created.
Foo() { val = foo_counter++;
cout << "Foo() with val: " << val << '\n';
}
Foo(int value) : val(value) { foo_counter++;
cout << "Foo(int) with val: " << val << '\n';
}
Foo(Foo& f2) { val = foo_counter++;
cout << "Foo(Foo &) with val: " << val
<< " \tcreated from: \t" << f2.val << '\n';
}
Foo(const Foo& f2) { val = foo_counter++;
cout << "Foo(const Foo &) with val: " << val
<< " \tcreated from: \t" << f2.val << '\n';
}
Foo(Foo&& f2) { val = foo_counter++;
cout << "Foo(Foo&&) moving: " << f2.val
<< " \tand changing it to:\t" << val << '\n';
}
~Foo() { cout << "~Foo() destroying: " << val << '\n'; }
Foo& operator=(const Foo& rhs) {
cout << "Foo& operator=(const Foo& rhs) with rhs.val: " << rhs.val
<< " \tcalled with lhs.val = \t" << val
<< " \tChanging lhs.val to: \t" << rhs.val << '\n';
val = rhs.val; return *this;
}
bool operator==(const Foo &rhs) const { return val == rhs.val; }
bool operator<(const Foo &rhs) const { return val < rhs.val; }
};
int Foo::foo_counter = 0;
//Create a hash function for Foo in order to use Foo with unordered_map
template<> struct std::hash<Foo> {
size_t operator()(const Foo &f) const { return hash<int>{}(f.val); }
};
int main() {
unordered_map<Foo, int> umap;
int d; //Some int that will be umap's value. It is not important.
//Print the statement to be executed and then execute it.
cout << "\nFoo foo0, foo1, foo2, foo3;\n";
Foo foo0, foo1, foo2, foo3;
cout << "\numap.insert(pair<Foo, int>(foo0, d))\n";
umap.insert(pair<Foo, int>(foo0, d));
//Side note: equivalent to: umap.insert(make_pair(foo0, d));
cout << "\numap.insert(move(pair<Foo, int>(foo1, d)))\n";
umap.insert(move(pair<Foo, int>(foo1, d)));
//Side note: equiv. to: umap.insert(make_pair(foo1, d));
cout << "\npair<Foo, int> pair(foo2, d)\n";
pair<Foo, int> pair(foo2, d);
cout << "\numap.insert(pair)\n";
umap.insert(pair);
cout << "\numap.emplace(foo3, d)\n";
umap.emplace(foo3, d);
cout << "\numap.emplace(11, d)\n";
umap.emplace(11, d);
cout << "\numap.insert({12, d})\n";
umap.insert({12, d});
cout.flush();
}
输出
Foo foo0, foo1, foo2, foo3;
Foo() with val: 0
Foo() with val: 1
Foo() with val: 2
Foo() with val: 3
umap.insert(pair<Foo, int>(foo0, d))
Foo(Foo &) with val: 4 created from: 0
Foo(Foo&&) moving: 4 and changing it to: 5
~Foo() destroying: 4
umap.insert(move(pair<Foo, int>(foo1, d)))
Foo(Foo &) with val: 6 created from: 1
Foo(Foo&&) moving: 6 and changing it to: 7
~Foo() destroying: 6
pair<Foo, int> pair(foo2, d)
Foo(Foo &) with val: 8 created from: 2
umap.insert(pair)
Foo(const Foo &) with val: 9 created from: 8
umap.emplace(foo3, d)
Foo(Foo &) with val: 10 created from: 3
umap.emplace(11, d)
Foo(int) with val: 11
umap.insert({12, d})
Foo(int) with val: 12
Foo(const Foo &) with val: 13 created from: 12
~Foo() destroying: 12
~Foo() destroying: 8
~Foo() destroying: 3
~Foo() destroying: 2
~Foo() destroying: 1
~Foo() destroying: 0
~Foo() destroying: 13
~Foo() destroying: 11
~Foo() destroying: 5
~Foo() destroying: 10
~Foo() destroying: 7
~Foo() destroying: 9
大局
insert()和emplace()的主要区别是:
Whereas using insert() almost† always requires the construction or pre-existence of some Foo object in main()'s scope (followed by a copy or move), if using emplace() then any call to a Foo constructor is done entirely internally in the unordered_map (i.e. inside the scope of the emplace() method's definition). The argument(s) for the key that you pass to emplace() are directly forwarded to a Foo constructor call within unordered_map::emplace()'s definition (optional additional details: where this newly constructed object is immediately incorporated into one of unordered_map's member variables so that no destructor is called when execution leaves emplace() and no move or copy constructors are called).
†上面的“almost always”中出现“almost”的原因是因为insert()的一次重载实际上相当于emplace()。如cppreference.com页面所述,重载模板<class P> pair<iterator, bool> insert(p&&value)(该页上insert()的重载(2))等效于emplace(forward<P>(value))。由于我们对差异感兴趣,我将忽略这个过载,不再提及这个特殊的技术细节。
逐步执行代码
现在我将详细介绍代码及其输出。
首先,请注意,unordered_map总是在内部将Foo对象(而不是Foo *s)存储为键,当unordered_map被销毁时,这些键都将被销毁。在这里,unordered_map的内部键是foos 13、11、5、10、7和9。
So technically, our unordered_map actually stores pair<const Foo, int> objects, which in turn store the Foo objects. But to understand the "big picture idea" of how emplace() differs from insert() (see highlighted box above), it's okay to temporarily imagine this pair object as being entirely passive. Once you understand this "big picture idea," it's important to then back up and understand how the use of this intermediary pair object by unordered_map introduces subtle, but important, technicalities.
insert()ing each of foo0, foo1, and foo2 required 2 calls to one of Foo's copy/move constructors and 2 calls to Foo's destructor (as I now describe): insert()ing each of foo0 and foo1 created a temporary object (foo4 and foo6, respectively) whose destructor was then immediately called after the insertion completed. In addition, the unordered_map's internal Foos (which are foos 5 and 7) also had their destructors called when the unordered_map was destroyed once execution reached the end of main(). To insert() foo2, we instead first explicitly created a non-temporary pair object (called pair), which called Foo's copy constructor on foo2 (creating foo8 as an internal member of pair). We then insert()ed this pair, which resulted in unordered_map calling the copy constructor again (on foo8) to create its own internal copy (foo9). As with foos 0 and 1, the end result was two destructor calls for this insert()ion with the only difference being that foo8's destructor was called only when we reached the end of main() rather than being called immediately after insert() finished. emplace()ing foo3 resulted in only 1 copy/move constructor call (creating foo10 internally in the unordered_map ) and only 1 call to Foo's destructor. The reason why calling umap.emplace(foo3, d) called Foo's non-const copy constructor is the following: Since we're using emplace(), the compiler knows that foo3 (a non-const Foo object) is meant to be an argument to some Foo constructor. In this case, the most fitting Foo constructor is the non-const copy constructor Foo(Foo& f2). This is why umap.emplace(foo3, d) called a copy constructor while umap.emplace(11, d) did not. For foo11, we directly passed the integer 11 to emplace(11, d) so that unordered_map would call the Foo(int) constructor while execution is within its emplace() method. Unlike in (2) and (3), we didn't even need some pre-exiting foo object to do this. Importantly, notice that only 1 call to a Foo constructor occurred (which created foo11). We then directly passed the integer 12 to insert({12, d}). Unlike with emplace(11, d) (which recall resulted in only 1 call to a Foo constructor), this call to insert({12, d}) resulted in two calls to Foo's constructor (creating foo12 and foo13).
后记
从这里往哪里走?
a.玩转上面的源代码,并研究网上找到的insert()(例如这里)和emplace()(例如这里)的文档。如果您正在使用eclipse或NetBeans等IDE,那么您可以很容易地让IDE告诉您正在调用insert()或emplace()的哪个重载(在eclipse中,只需将鼠标光标稳定地停留在函数调用上一秒钟)。下面是一些可以尝试的代码:
cout << "\numap.insert({{" << Foo::foo_counter << ", d}})\n";
umap.insert({{Foo::foo_counter, d}});
//but umap.emplace({{Foo::foo_counter, d}}); results in a compile error!
cout << "\numap.insert(pair<const Foo, int>({" << Foo::foo_counter << ", d}))\n";
umap.insert(pair<const Foo, int>({Foo::foo_counter, d}));
//The above uses Foo(int) and then Foo(const Foo &), as expected. but the
// below call uses Foo(int) and the move constructor Foo(Foo&&).
//Do you see why?
cout << "\numap.insert(pair<Foo, int>({" << Foo::foo_counter << ", d}))\n";
umap.insert(pair<Foo, int>({Foo::foo_counter, d}));
//Not only that, but even more interesting is how the call below uses all
// three of Foo(int) and the Foo(Foo&&) move and Foo(const Foo &) copy
// constructors, despite the below call's only difference from the call above
// being the additional { }.
cout << "\numap.insert({pair<Foo, int>({" << Foo::foo_counter << ", d})})\n";
umap.insert({pair<Foo, int>({Foo::foo_counter, d})});
//Pay close attention to the subtle difference in the effects of the next
// two calls.
int cur_foo_counter = Foo::foo_counter;
cout << "\numap.insert({{cur_foo_counter, d}, {cur_foo_counter+1, d}}) where "
<< "cur_foo_counter = " << cur_foo_counter << "\n";
umap.insert({{cur_foo_counter, d}, {cur_foo_counter+1, d}});
cout << "\numap.insert({{Foo::foo_counter, d}, {Foo::foo_counter+1, d}}) where "
<< "Foo::foo_counter = " << Foo::foo_counter << "\n";
umap.insert({{Foo::foo_counter, d}, {Foo::foo_counter+1, d}});
//umap.insert(initializer_list<pair<Foo, int>>({{Foo::foo_counter, d}}));
//The call below works fine, but the commented out line above gives a
// compiler error. It's instructive to find out why. The two calls
// differ by a "const".
cout << "\numap.insert(initializer_list<pair<const Foo, int>>({{" << Foo::foo_counter << ", d}}))\n";
umap.insert(initializer_list<pair<const Foo, int>>({{Foo::foo_counter, d}}));
您很快就会看到unordered_map最终使用了对构造函数的哪个重载(参见参考资料),这对复制、移动、创建和/或销毁对象的数量以及发生这些情况的时间有重要影响。
b.看看当你使用其他容器类(例如set或unordered_multiset)而不是unordered_map时会发生什么。
c.现在使用Goo对象(只是一个重命名的Foo副本)而不是int作为unordered_map中的范围类型(即使用unordered_map<Foo, Goo>而不是unordered_map<Foo, int>),并查看调用了多少Goo构造函数以及调用了哪些Goo构造函数。(剧透:确实有影响,但不是很戏剧化。)
就功能或输出而言,它们都是相同的。
对于这两个大内存,对象放置是内存优化,不使用复制构造函数
简单详细的解释 https://medium.com/@sandywits/all-about-emplace-in-c-71fd15e06e44
还有一个没有在其他答案中讨论的附加问题,它适用于std::map以及std::unordered_map, std::set和std::unordered_set:
Insert使用键对象,这意味着如果键已经存在于容器中,则不需要分配节点。 Emplace需要首先构造键,这通常需要在每次调用它时分配一个节点。
从这个角度来看,如果键已经存在于容器中,那么放置可能比插入效率低。(这对于多线程应用程序来说可能很重要,例如在线程本地字典中,需要同步分配。)
Live demo: https://godbolt.org/z/ornYcTqW9. Note that with libstdc++, emplace allocates 10 times, while insert only once. With libc++, there is only one allocation with emplace as well; it seems there is some optimization that copies/moves keys*. I got the same outcome with Microsoft STL, so it actually seems that there is some missing optimization in libstdc++. However, the whole problem may not be related only to standard containers. For instance, concurrent_unordered_map from Intel/oneAPI TBB behaves the same as libstdc++ in this regard.
*注意,此优化不能应用于密钥既不可复制又不可移动的情况。在这个现场演示中,即使使用了emplace和libc++: https://godbolt.org/z/1b6b331qf,我们也有10个分配。(当然,对于不可复制和不可移动的键,我们不能使用insert或try_emplace,因此没有其他选项。)