我经常遇到这样的情况:我想在声明查询的地方对查询进行求值。这通常是因为我需要对它进行多次迭代,计算成本很高。例如:

string raw = "...";
var lines = (from l in raw.Split('\n')
             let ll = l.Trim()
             where !string.IsNullOrEmpty(ll)
             select ll).ToList();

这很好。但是如果我不打算修改结果,那么我也可以调用ToArray()而不是ToList()。

然而,我想知道ToArray()是否通过首先调用ToList()来实现,因此内存效率比只调用ToList()低。

我疯了吗?我是否应该调用ToArray() -在知道内存不会被分配两次的情况下安全可靠?


当前回答

一个很晚的答案,但我认为这对谷歌人有帮助。

They both suck when they created using linq. They both implement same code to resize buffer if necessary. ToArray internally uses a class to convert IEnumerable<> to array, by allocating an array of 4 elements. If that is not enough than it doubles the size by creating a new array double the size of current and copying current array to it. At the end it allocates a new array of count of your items. If your query returns 129 elements then ToArray will make 6 allocations and memory copy operations to create a 256 element array and than am another array of 129 to return. so much for memory efficiency.

ToList做同样的事情,但是它跳过了最后的分配,因为您可以在将来添加项。List不关心它是从linq查询创建的还是手动创建的。

List在内存上更好,但在cpu上更差,因为List是一个通用的解决方案,每个操作都需要范围检查,除了.net内部的数组范围检查之外。

因此,如果你将迭代你的结果集太多次,那么数组是很好的,因为它意味着比列表更少的范围检查,编译器通常优化数组的顺序访问。

如果在创建List时指定capacity参数,则它的初始化分配可以更好。在这种情况下,它将只分配数组一次,假设您知道结果大小。linq的ToList没有指定重载来提供它,因此我们必须创建扩展方法,该方法创建一个具有给定容量的列表,然后使用list <>. addrange。

为了完成这个问题,我必须写出下面的句子

At the end, you can use either an ToArray, or ToList, performance will not be so different ( see answer of @EMP ). You are using C#. If you need performance then do not worry about writing about high performance code, but worry about not writing bad performance code. Always target x64 for high performance code. AFAIK, x64 JIT is based on C++ compiler, and does some funny things like tail recursion optimizations. With 4.5 you can also enjoy the profile guided optimization and multi core JIT. At last, you can use async/await pattern to process it quicker.

其他回答

首选ToListAsync<T>()。

在实体框架6中,这两个方法最终都调用相同的内部方法,但ToArrayAsync<T>()在最后调用list.ToArray(),实现为

T[] array = new T[_size];
Array.Copy(_items, 0, array, 0, _size);
return array;

所以ToArrayAsync<T>()有一些开销,因此ToListAsync<T>()是首选。

对于任何有兴趣在其他Linq-to-sql中使用此结果的人,例如

from q in context.MyTable
where myListOrArray.Contains(q.someID)
select q;

那么生成的SQL是相同的,无论你使用List或Array为myListOrArray。 现在我知道有些人可能会问为什么在这条语句之前枚举,但从IQueryable vs(列表或数组)生成的SQL之间是有区别的。

(七年后……)

其他几个(好的)答案集中在将会发生的微观性能差异上。

这篇文章只是一个补充,以提及由数组(T[])产生的IEnumerator<T>与由List<T>返回的IEnumerator之间存在的语义差异。

最好用例子来说明:

IList<int> source = Enumerable.Range(1, 10).ToArray();  // try changing to .ToList()

foreach (var x in source)
{
  if (x == 5)
    source[8] *= 100;
  Console.WriteLine(x);
}

上面的代码将毫无例外地运行,并产生输出:

1
2
3
4
5
6
7
8
900
10

这表明int[]返回的IEnumarator<int>并不跟踪自枚举器创建以来数组是否被修改过。

Note that I declared the local variable source as an IList<int>. In that way I make sure the C# compiler does not optimze the foreach statement into something which is equivalent to a for (var idx = 0; idx < source.Length; idx++) { /* ... */ } loop. This is something the C# compiler might do if I use var source = ...; instead. In my current version of the .NET framework the actual enumerator used here is a non-public reference-type System.SZArrayHelper+SZGenericArrayEnumerator`1[System.Int32] but of course this is an implementation detail.

现在,如果我将.ToArray()改为.ToList(),我只得到:

1
2
3
4
5

其次是一个系统。InvalidOperationException爆炸说:

修改集合;枚举操作可能无法执行。

在这种情况下,底层枚举器是公共可变值类型System. collections . generic . list ' 1+ enumerator [System. collections . generic . list]。Int32](在这种情况下,在IEnumerator<int>框内,因为我使用IList<int>)。

综上所述,List<T>生成的枚举数跟踪列表在枚举过程中是否发生变化,而T[]生成的枚举数则没有。因此,在. tolist()和. toarray()之间进行选择时,请考虑此差异。

人们经常添加一个额外的. toarray()或. tolist()来绕过一个在枚举器的生命周期内跟踪它是否被修改的集合。

(如果有人想知道List<>如何跟踪集合是否被修改,这个类中有一个私有字段_version,每当List<>被更新时,它都会被更改。实际上可以通过简单地删除索引器public T this[int index]的set访问器中增加_version的行来改变List<>的这种行为,就像最近在Dictionary<,>中所做的那样,如另一个答案所述。)

如果在IEnumerable<T>(例如,来自ORM)上使用ToList(),则通常是首选。如果序列的长度在开始时不知道,ToArray()会创建动态长度的集合(如List),然后将其转换为数组,这将花费额外的时间。

编辑2:(这是对原始答案的更正)

使用基准。NET,我们可以通过性能测量来确认,公认的答案实际上是正确的:ToList在一般情况下更快,因为它不需要从已分配的缓冲区中修剪空空间。ToArray可能会执行额外的分配和复制操作,以使缓冲区的大小精确到元素的数量。

为了确认这一点,使用下面的基准测试。

[MemoryDiagnoser]
[ShortRunJob]
public class Benchmarks
{
    [Params(0, 1, 6, 10, 42, 100, 1337, 10000)]
    public int Count { get; set; }

    public IEnumerable<int> Items => Enumerable.Range(0, Count).Where(i => i > 0);

    [Benchmark(Baseline = true)]
    public int[] ToArray() => Items.ToArray();

    [Benchmark]
    public List<int> ToList() => Items.ToList();
}

结果证实,在大多数情况下,ToList要快10% - 15%。

BenchmarkDotNet=v0.13.1, OS=Windows 10.0.22000
Intel Core i9-10885H CPU 2.40GHz, 1 CPU, 16 logical and 8 physical cores
.NET SDK=6.0.302
  [Host]     : .NET 6.0.7 (6.0.722.32202), X64 RyuJIT
  DefaultJob : .NET 6.0.7 (6.0.722.32202), X64 RyuJIT


|  Method | Count |         Mean |      Error |     StdDev | Ratio | RatioSD |   Gen 0 |  Gen 1 | Allocated |
|-------- |------ |-------------:|-----------:|-----------:|------:|--------:|--------:|-------:|----------:|
| ToArray |     0 |     29.73 ns |   0.546 ns |   0.536 ns |  1.00 |    0.00 |  0.0067 |      - |      56 B |
|  ToList |     0 |     31.51 ns |   0.485 ns |   0.405 ns |  1.06 |    0.02 |  0.0105 |      - |      88 B |
|         |       |              |            |            |       |         |         |        |           |
| ToArray |     1 |     37.36 ns |   0.314 ns |   0.294 ns |  1.00 |    0.00 |  0.0114 |      - |      96 B |
|  ToList |     1 |     36.75 ns |   0.605 ns |   0.537 ns |  0.98 |    0.01 |  0.0153 |      - |     128 B |
|         |       |              |            |            |       |         |         |        |           |
| ToArray |     6 |    100.05 ns |   1.522 ns |   1.349 ns |  1.00 |    0.00 |  0.0286 |      - |     240 B |
|  ToList |     6 |     85.16 ns |   0.808 ns |   0.756 ns |  0.85 |    0.01 |  0.0267 |      - |     224 B |
|         |       |              |            |            |       |         |         |        |           |
| ToArray |    10 |    137.20 ns |   2.766 ns |   2.452 ns |  1.00 |    0.00 |  0.0372 |      - |     312 B |
|  ToList |    10 |    123.05 ns |   2.198 ns |   1.949 ns |  0.90 |    0.01 |  0.0372 |      - |     312 B |
|         |       |              |            |            |       |         |         |        |           |
| ToArray |    42 |    398.25 ns |   6.583 ns |   5.836 ns |  1.00 |    0.00 |  0.0877 |      - |     736 B |
|  ToList |    42 |    352.04 ns |   4.976 ns |   4.411 ns |  0.88 |    0.02 |  0.0887 |      - |     744 B |
|         |       |              |            |            |       |         |         |        |           |
| ToArray |   100 |    730.80 ns |   6.501 ns |   6.081 ns |  1.00 |    0.00 |  0.1488 |      - |   1,248 B |
|  ToList |   100 |    705.49 ns |   9.947 ns |   9.305 ns |  0.97 |    0.01 |  0.1526 |      - |   1,280 B |
|         |       |              |            |            |       |         |         |        |           |
| ToArray |  1337 |  8,023.57 ns | 147.388 ns | 137.867 ns |  1.00 |    0.00 |  1.6785 | 0.0458 |  14,056 B |
|  ToList |  1337 |  7,980.27 ns | 138.469 ns | 122.749 ns |  1.00 |    0.02 |  1.9989 | 0.1221 |  16,736 B |
|         |       |              |            |            |       |         |         |        |           |
| ToArray | 10000 | 57,037.19 ns | 510.492 ns | 452.538 ns |  1.00 |    0.00 | 12.6343 | 1.7700 | 106,280 B |
|  ToList | 10000 | 57,728.15 ns | 583.353 ns | 517.127 ns |  1.01 |    0.01 | 15.5640 | 5.1270 | 131,496 B |

作为参考,下面是原始答案,不幸的是,它只在一个非常特殊的情况下执行基准测试,避免了中间的调整大小和复制操作。

最初的回答:

现在已经是2020年了,每个人都在使用。net Core 3.1,所以我决定用Benchmark.NET运行一些基准测试。

TL;DR: ToArray()在性能方面更好,如果不打算改变集合,则可以更好地传达意图。

编辑:从注释中可以看出,这些基准测试可能不是指示性的,因为Enumerable.Range(…)返回一个IEnumerable<T>,其中包含关于序列大小的信息,随后在ToArray()的优化中使用它来预分配正确大小的数组。考虑为您的具体场景手动测试性能。


    [MemoryDiagnoser]
    public class Benchmarks
    {
        [Params(0, 1, 6, 10, 39, 100, 666, 1000, 1337, 10000)]
        public int Count { get; set; }
    
        public IEnumerable<int> Items => Enumerable.Range(0, Count);
    
        [Benchmark(Description = "ToArray()", Baseline = true)]
        public int[] ToArray() => Items.ToArray();
    
        [Benchmark(Description = "ToList()")]
        public List<int> ToList() => Items.ToList();
    
        public static void Main() => BenchmarkRunner.Run<Benchmarks>();
    }

结果如下:


    BenchmarkDotNet=v0.12.0, OS=Windows 10.0.14393.3443 (1607/AnniversaryUpdate/Redstone1)
    Intel Core i5-4460 CPU 3.20GHz (Haswell), 1 CPU, 4 logical and 4 physical cores
    Frequency=3124994 Hz, Resolution=320.0006 ns, Timer=TSC
    .NET Core SDK=3.1.100
      [Host]     : .NET Core 3.1.0 (CoreCLR 4.700.19.56402, CoreFX 4.700.19.56404), X64 RyuJIT
      DefaultJob : .NET Core 3.1.0 (CoreCLR 4.700.19.56402, CoreFX 4.700.19.56404), X64 RyuJIT
    
    
    |    Method | Count |          Mean |       Error |      StdDev |        Median | Ratio | RatioSD |   Gen 0 | Gen 1 | Gen 2 | Allocated |
    |---------- |------ |--------------:|------------:|------------:|--------------:|------:|--------:|--------:|------:|------:|----------:|
    | ToArray() |     0 |      7.357 ns |   0.2096 ns |   0.1960 ns |      7.323 ns |  1.00 |    0.00 |       - |     - |     - |         - |
    |  ToList() |     0 |     13.174 ns |   0.2094 ns |   0.1958 ns |     13.084 ns |  1.79 |    0.05 |  0.0102 |     - |     - |      32 B |
    |           |       |               |             |             |               |       |         |         |       |       |           |
    | ToArray() |     1 |     23.917 ns |   0.4999 ns |   0.4676 ns |     23.954 ns |  1.00 |    0.00 |  0.0229 |     - |     - |      72 B |
    |  ToList() |     1 |     33.867 ns |   0.7350 ns |   0.6876 ns |     34.013 ns |  1.42 |    0.04 |  0.0331 |     - |     - |     104 B |
    |           |       |               |             |             |               |       |         |         |       |       |           |
    | ToArray() |     6 |     28.242 ns |   0.5071 ns |   0.4234 ns |     28.196 ns |  1.00 |    0.00 |  0.0280 |     - |     - |      88 B |
    |  ToList() |     6 |     43.516 ns |   0.9448 ns |   1.1949 ns |     42.896 ns |  1.56 |    0.06 |  0.0382 |     - |     - |     120 B |
    |           |       |               |             |             |               |       |         |         |       |       |           |
    | ToArray() |    10 |     31.636 ns |   0.5408 ns |   0.4516 ns |     31.657 ns |  1.00 |    0.00 |  0.0331 |     - |     - |     104 B |
    |  ToList() |    10 |     53.870 ns |   1.2988 ns |   2.2403 ns |     53.415 ns |  1.77 |    0.07 |  0.0433 |     - |     - |     136 B |
    |           |       |               |             |             |               |       |         |         |       |       |           |
    | ToArray() |    39 |     58.896 ns |   0.9441 ns |   0.8369 ns |     58.548 ns |  1.00 |    0.00 |  0.0713 |     - |     - |     224 B |
    |  ToList() |    39 |    138.054 ns |   2.8185 ns |   3.2458 ns |    138.937 ns |  2.35 |    0.08 |  0.0815 |     - |     - |     256 B |
    |           |       |               |             |             |               |       |         |         |       |       |           |
    | ToArray() |   100 |    119.167 ns |   1.6195 ns |   1.4357 ns |    119.120 ns |  1.00 |    0.00 |  0.1478 |     - |     - |     464 B |
    |  ToList() |   100 |    274.053 ns |   5.1073 ns |   4.7774 ns |    272.242 ns |  2.30 |    0.06 |  0.1578 |     - |     - |     496 B |
    |           |       |               |             |             |               |       |         |         |       |       |           |
    | ToArray() |   666 |    569.920 ns |  11.4496 ns |  11.2450 ns |    571.647 ns |  1.00 |    0.00 |  0.8688 |     - |     - |    2728 B |
    |  ToList() |   666 |  1,621.752 ns |  17.1176 ns |  16.0118 ns |  1,623.566 ns |  2.85 |    0.05 |  0.8793 |     - |     - |    2760 B |
    |           |       |               |             |             |               |       |         |         |       |       |           |
    | ToArray() |  1000 |    796.705 ns |  16.7091 ns |  19.8910 ns |    796.610 ns |  1.00 |    0.00 |  1.2951 |     - |     - |    4064 B |
    |  ToList() |  1000 |  2,453.110 ns |  48.1121 ns |  65.8563 ns |  2,460.190 ns |  3.09 |    0.10 |  1.3046 |     - |     - |    4096 B |
    |           |       |               |             |             |               |       |         |         |       |       |           |
    | ToArray() |  1337 |  1,057.983 ns |  20.9810 ns |  41.4145 ns |  1,041.028 ns |  1.00 |    0.00 |  1.7223 |     - |     - |    5416 B |
    |  ToList() |  1337 |  3,217.550 ns |  62.3777 ns |  61.2633 ns |  3,203.928 ns |  2.98 |    0.13 |  1.7357 |     - |     - |    5448 B |
    |           |       |               |             |             |               |       |         |         |       |       |           |
    | ToArray() | 10000 |  7,309.844 ns | 160.0343 ns | 141.8662 ns |  7,279.387 ns |  1.00 |    0.00 | 12.6572 |     - |     - |   40064 B |
    |  ToList() | 10000 | 23,858.032 ns | 389.6592 ns | 364.4874 ns | 23,759.001 ns |  3.26 |    0.08 | 12.6343 |     - |     - |   40096 B |
    
    // * Hints *
    Outliers
      Benchmarks.ToArray(): Default -> 2 outliers were removed (35.20 ns, 35.29 ns)
      Benchmarks.ToArray(): Default -> 2 outliers were removed (38.51 ns, 38.88 ns)
      Benchmarks.ToList(): Default  -> 1 outlier  was  removed (64.69 ns)
      Benchmarks.ToArray(): Default -> 1 outlier  was  removed (67.02 ns)
      Benchmarks.ToArray(): Default -> 1 outlier  was  removed (130.08 ns)
      Benchmarks.ToArray(): Default -> 1 outlier  was  detected (541.82 ns)
      Benchmarks.ToArray(): Default -> 1 outlier  was  removed (7.82 us)
    
    // * Legends *
      Count     : Value of the 'Count' parameter
      Mean      : Arithmetic mean of all measurements
      Error     : Half of 99.9% confidence interval
      StdDev    : Standard deviation of all measurements
      Median    : Value separating the higher half of all measurements (50th percentile)
      Ratio     : Mean of the ratio distribution ([Current]/[Baseline])
      RatioSD   : Standard deviation of the ratio distribution ([Current]/[Baseline])
      Gen 0     : GC Generation 0 collects per 1000 operations
      Gen 1     : GC Generation 1 collects per 1000 operations
      Gen 2     : GC Generation 2 collects per 1000 operations
      Allocated : Allocated memory per single operation (managed only, inclusive, 1KB = 1024B)
      1 ns      : 1 Nanosecond (0.000000001 sec)