HashSet vs. List性能

很明显，泛型HashSet<T>类的搜索性能要高于泛型List<T>类。只需将基于哈希的键与List<T>类中的线性方法进行比较。

然而，计算哈希键本身可能需要一些CPU周期，因此对于少量的项，线性搜索可以成为HashSet<T>的真正替代方法。

我的问题是:盈亏平衡在哪里?

为了简化场景(公平起见)，让我们假设List<T>类使用元素的Equals()方法来标识一个项。

当前回答

比较两种表现不同的结构的性能本质上是没有意义的。使用传达意图的结构。即使你说List<T>不会有重复，迭代顺序也无关紧要，使其与HashSet<T>相当，但使用List<T>仍然是一个糟糕的选择，因为它的容错能力相对较低。

也就是说，我将检查性能的其他方面，

+------------+--------+-------------+-----------+----------+----------+-----------+
| Collection | Random | Containment | Insertion | Addition |  Removal | Memory    |
|            | access |             |           |          |          |           |
+------------+--------+-------------+-----------+----------+----------+-----------+
| List<T>    | O(1)   | O(n)        | O(n)      | O(1)*    | O(n)     | Lesser    |
| HashSet<T> | O(n)   | O(1)        | n/a       | O(1)     | O(1)     | Greater** |
+------------+--------+-------------+-----------+----------+----------+-----------+

尽管在这两种情况下加法都是O(1)，但在HashSet中它会相对较慢，因为它涉及到在存储哈希代码之前预计算哈希代码的成本。 HashSet优越的可伸缩性有内存成本。每个条目连同它的哈希代码一起存储为一个新对象。这篇文章可能会给你一个想法。

2014-05-30 07:45:00

其他回答

您可以使用HybridDictionary自动检测断点，并接受空值，使其本质上与HashSet相同。

2011-10-23 16:55:51

您没有考虑到的一个因素是GetHashcode()函数的健壮性。有了完美的哈希函数，HashSet显然会有更好的搜索性能。但是随着哈希函数的减少，HashSet搜索时间也会减少。

2008-09-29 22:56:17

这取决于你在哈希什么。如果你的键是整数，在HashSet更快之前，你可能不需要很多项。如果你在一个字符串上输入键，那么它会更慢，这取决于输入的字符串。

你肯定可以很容易地建立一个基准吗?

2008-09-29 21:31:29

很多人都说，一旦你达到了速度实际上是一个问题的大小，HashSet<T>将总是击败List<T>，但这取决于你在做什么。

假设你有一个List<T>，其中平均只有5个项目。在大量的循环中，如果每个循环添加或删除一个项目，那么使用List<T>可能会更好。

我在我的机器上做了一个测试，它必须非常非常小才能从List< t>中获得优势。对于一个短字符串列表，在大小为5之后，对于大小为20之后的对象，优势就消失了。

1 item LIST strs time: 617ms
1 item HASHSET strs time: 1332ms

2 item LIST strs time: 781ms
2 item HASHSET strs time: 1354ms

3 item LIST strs time: 950ms
3 item HASHSET strs time: 1405ms

4 item LIST strs time: 1126ms
4 item HASHSET strs time: 1441ms

5 item LIST strs time: 1370ms
5 item HASHSET strs time: 1452ms

6 item LIST strs time: 1481ms
6 item HASHSET strs time: 1418ms

7 item LIST strs time: 1581ms
7 item HASHSET strs time: 1464ms

8 item LIST strs time: 1726ms
8 item HASHSET strs time: 1398ms

9 item LIST strs time: 1901ms
9 item HASHSET strs time: 1433ms

1 item LIST objs time: 614ms
1 item HASHSET objs time: 1993ms

4 item LIST objs time: 837ms
4 item HASHSET objs time: 1914ms

7 item LIST objs time: 1070ms
7 item HASHSET objs time: 1900ms

10 item LIST objs time: 1267ms
10 item HASHSET objs time: 1904ms

13 item LIST objs time: 1494ms
13 item HASHSET objs time: 1893ms

16 item LIST objs time: 1695ms
16 item HASHSET objs time: 1879ms

19 item LIST objs time: 1902ms
19 item HASHSET objs time: 1950ms

22 item LIST objs time: 2136ms
22 item HASHSET objs time: 1893ms

25 item LIST objs time: 2357ms
25 item HASHSET objs time: 1826ms

28 item LIST objs time: 2555ms
28 item HASHSET objs time: 1865ms

31 item LIST objs time: 2755ms
31 item HASHSET objs time: 1963ms

34 item LIST objs time: 3025ms
34 item HASHSET objs time: 1874ms

37 item LIST objs time: 3195ms
37 item HASHSET objs time: 1958ms

40 item LIST objs time: 3401ms
40 item HASHSET objs time: 1855ms

43 item LIST objs time: 3618ms
43 item HASHSET objs time: 1869ms

46 item LIST objs time: 3883ms
46 item HASHSET objs time: 2046ms

49 item LIST objs time: 4218ms
49 item HASHSET objs time: 1873ms

下面是以图表形式显示的数据:

代码如下:

static void Main(string[] args)
{
    int times = 10000000;

    for (int listSize = 1; listSize < 10; listSize++)
    {
        List<string> list = new List<string>();
        HashSet<string> hashset = new HashSet<string>();

        for (int i = 0; i < listSize; i++)
        {
            list.Add("string" + i.ToString());
            hashset.Add("string" + i.ToString());
        }

        Stopwatch timer = new Stopwatch();
        timer.Start();
        for (int i = 0; i < times; i++)
        {
            list.Remove("string0");
            list.Add("string0");
        }
        timer.Stop();
        Console.WriteLine(listSize.ToString() + " item LIST strs time: " + timer.ElapsedMilliseconds.ToString() + "ms");

        timer = new Stopwatch();
        timer.Start();
        for (int i = 0; i < times; i++)
        {
            hashset.Remove("string0");
            hashset.Add("string0");
        }
        timer.Stop();
        Console.WriteLine(listSize.ToString() + " item HASHSET strs time: " + timer.ElapsedMilliseconds.ToString() + "ms");
        Console.WriteLine();
    }

    for (int listSize = 1; listSize < 50; listSize+=3)
    {
        List<object> list = new List<object>();
        HashSet<object> hashset = new HashSet<object>();

        for (int i = 0; i < listSize; i++)
        {
            list.Add(new object());
            hashset.Add(new object());
        }

        object objToAddRem = list[0];
        
        Stopwatch timer = new Stopwatch();
        timer.Start();
        for (int i = 0; i < times; i++)
        {
            list.Remove(objToAddRem);
            list.Add(objToAddRem);
        }
        timer.Stop();
        Console.WriteLine(listSize.ToString() + " item LIST objs time: " + timer.ElapsedMilliseconds.ToString() + "ms");

        timer = new Stopwatch();
        timer.Start();
        for (int i = 0; i < times; i++)
        {
            hashset.Remove(objToAddRem);
            hashset.Add(objToAddRem);
        }
        timer.Stop();
        Console.WriteLine(listSize.ToString() + " item HASHSET objs time: " + timer.ElapsedMilliseconds.ToString() + "ms");
        Console.WriteLine();
    }

    Console.ReadLine();
}

2012-05-26 01:35:31

You're looking at this wrong. Yes a linear search of a List will beat a HashSet for a small number of items. But the performance difference usually doesn't matter for collections that small. It's generally the large collections you have to worry about, and that's where you think in terms of Big-O. However, if you've measured a real bottleneck on HashSet performance, then you can try to create a hybrid List/HashSet, but you'll do that by conducting lots of empirical performance tests - not asking questions on SO.

2010-05-06 03:43:35

HashSet vs. List性能

推荐文章

最新文章

标签