在LINQ查询中调用ToList()或ToArray()更好吗?

我经常遇到这样的情况:我想在声明查询的地方对查询进行求值。这通常是因为我需要对它进行多次迭代，计算成本很高。例如:

string raw = "...";
var lines = (from l in raw.Split('\n')
             let ll = l.Trim()
             where !string.IsNullOrEmpty(ll)
             select ll).ToList();

这很好。但是如果我不打算修改结果，那么我也可以调用ToArray()而不是ToList()。

然而，我想知道ToArray()是否通过首先调用ToList()来实现，因此内存效率比只调用ToList()低。

我疯了吗?我是否应该调用ToArray() -在知道内存不会被分配两次的情况下安全可靠?

当前回答

内存总是会被分配两次——或者类似的情况。由于不能调整数组的大小，这两种方法都将使用某种机制在不断增长的集合中收集数据。(好吧，这个名单本身就是一个不断增长的集合。)

List使用数组作为内部存储，并在需要时将容量增加一倍。这意味着平均2/3的项目至少被重新分配过一次，其中一半至少被重新分配过两次，一半至少被重新分配过三次，以此类推。这意味着每个项目平均被重新分配了1.3次，这并不是很大的开销。

还要记住，如果你在收集字符串，集合本身只包含对字符串的引用，字符串本身不会被重新分配。

2009-07-09 19:57:49

其他回答

一种选择是添加自己的扩展方法，该方法返回一个只读的ICollection<T>。当您既不想使用数组/列表的索引属性，也不想从列表中添加/删除时，这可能比使用ToList或ToArray更好。

public static class EnumerableExtension
{
    /// <summary>
    /// Causes immediate evaluation of the linq but only if required.
    /// As it returns a readonly ICollection, is better than using ToList or ToArray
    /// when you do not want to use the indexing properties of an IList, or add to the collection.
    /// </summary>
    /// <typeparam name="T"></typeparam>
    /// <param name="enumerable"></param>
    /// <returns>Readonly collection</returns>
    public static ICollection<T> Evaluate<T>(this IEnumerable<T> enumerable)
    {
        //if it's already a readonly collection, use it
        var collection = enumerable as ICollection<T>;
        if ((collection != null) && collection.IsReadOnly)
        {
            return collection;
        }
        //or make a new collection
        return enumerable.ToList().AsReadOnly();
    }
}

单元测试:

[TestClass]
public sealed class EvaluateLinqTests
{
    [TestMethod]
    public void EvalTest()
    {
        var list = new List<int> {1, 2, 3};
        var linqResult = list.Select(i => i);
        var linqResultEvaluated = list.Select(i => i).Evaluate();
        list.Clear();
        Assert.AreEqual(0, linqResult.Count());
        //even though we have cleared the underlying list, the evaluated list does not change
        Assert.AreEqual(3, linqResultEvaluated.Count());
    }

    [TestMethod]
    public void DoesNotSaveCreatingListWhenHasListTest()
    {
        var list = new List<int> {1, 2, 3};
        var linqResultEvaluated = list.Evaluate();
        //list is not readonly, so we expect a new list
        Assert.AreNotSame(list, linqResultEvaluated);
    }

    [TestMethod]
    public void SavesCreatingListWhenHasReadonlyListTest()
    {
        var list = new List<int> {1, 2, 3}.AsReadOnly();
        var linqResultEvaluated = list.Evaluate();
        //list is readonly, so we don't expect a new list
        Assert.AreSame(list, linqResultEvaluated);
    }

    [TestMethod]
    public void SavesCreatingListWhenHasArrayTest()
    {
        var list = new[] {1, 2, 3};
        var linqResultEvaluated = list.Evaluate();
        //arrays are readonly (wrt ICollection<T> interface), so we don't expect a new object
        Assert.AreSame(list, linqResultEvaluated);
    }

    [TestMethod]
    [ExpectedException(typeof (NotSupportedException))]
    public void CantAddToResultTest()
    {
        var list = new List<int> {1, 2, 3};
        var linqResultEvaluated = list.Evaluate();
        Assert.AreNotSame(list, linqResultEvaluated);
        linqResultEvaluated.Add(4);
    }

    [TestMethod]
    [ExpectedException(typeof (NotSupportedException))]
    public void CantRemoveFromResultTest()
    {
        var list = new List<int> {1, 2, 3};
        var linqResultEvaluated = list.Evaluate();
        Assert.AreNotSame(list, linqResultEvaluated);
        linqResultEvaluated.Remove(1);
    }
}

2012-11-28 09:37:05

还要记住，如果你在收集字符串，集合本身只包含对字符串的引用，字符串本身不会被重新分配。

2009-07-09 19:57:49

(七年后……)

其他几个(好的)答案集中在将会发生的微观性能差异上。

这篇文章只是一个补充，以提及由数组(T[])产生的IEnumerator<T>与由List<T>返回的IEnumerator之间存在的语义差异。

最好用例子来说明:

IList<int> source = Enumerable.Range(1, 10).ToArray();  // try changing to .ToList()

foreach (var x in source)
{
  if (x == 5)
    source[8] *= 100;
  Console.WriteLine(x);
}

上面的代码将毫无例外地运行，并产生输出:

这表明int[]返回的IEnumarator<int>并不跟踪自枚举器创建以来数组是否被修改过。

Note that I declared the local variable source as an IList<int>. In that way I make sure the C# compiler does not optimze the foreach statement into something which is equivalent to a for (var idx = 0; idx < source.Length; idx++) { /* ... */ } loop. This is something the C# compiler might do if I use var source = ...; instead. In my current version of the .NET framework the actual enumerator used here is a non-public reference-type System.SZArrayHelper+SZGenericArrayEnumerator`1[System.Int32] but of course this is an implementation detail.

现在，如果我将.ToArray()改为.ToList()，我只得到:

其次是一个系统。InvalidOperationException爆炸说:

修改集合;枚举操作可能无法执行。

在这种情况下，底层枚举器是公共可变值类型System. collections . generic . list ' 1+ enumerator [System. collections . generic . list]。Int32](在这种情况下，在IEnumerator<int>框内，因为我使用IList<int>)。

综上所述，List<T>生成的枚举数跟踪列表在枚举过程中是否发生变化，而T[]生成的枚举数则没有。因此，在. tolist()和. toarray()之间进行选择时，请考虑此差异。

人们经常添加一个额外的. toarray()或. tolist()来绕过一个在枚举器的生命周期内跟踪它是否被修改的集合。

(如果有人想知道List<>如何跟踪集合是否被修改，这个类中有一个私有字段_version，每当List<>被更新时，它都会被更改。实际上可以通过简单地删除索引器public T this[int index]的set访问器中增加_version的行来改变List<>的这种行为，就像最近在Dictionary<，>中所做的那样，如另一个答案所述。)

2016-12-20 16:03:00

一个很晚的答案，但我认为这对谷歌人有帮助。

They both suck when they created using linq. They both implement same code to resize buffer if necessary. ToArray internally uses a class to convert IEnumerable<> to array, by allocating an array of 4 elements. If that is not enough than it doubles the size by creating a new array double the size of current and copying current array to it. At the end it allocates a new array of count of your items. If your query returns 129 elements then ToArray will make 6 allocations and memory copy operations to create a 256 element array and than am another array of 129 to return. so much for memory efficiency.

ToList做同样的事情，但是它跳过了最后的分配，因为您可以在将来添加项。List不关心它是从linq查询创建的还是手动创建的。

List在内存上更好，但在cpu上更差，因为List是一个通用的解决方案，每个操作都需要范围检查，除了.net内部的数组范围检查之外。

因此，如果你将迭代你的结果集太多次，那么数组是很好的，因为它意味着比列表更少的范围检查，编译器通常优化数组的顺序访问。

如果在创建List时指定capacity参数，则它的初始化分配可以更好。在这种情况下，它将只分配数组一次，假设您知道结果大小。linq的ToList没有指定重载来提供它，因此我们必须创建扩展方法，该方法创建一个具有给定容量的列表，然后使用list <>. addrange。

为了完成这个问题，我必须写出下面的句子

At the end, you can use either an ToArray, or ToList, performance will not be so different ( see answer of @EMP ). You are using C#. If you need performance then do not worry about writing about high performance code, but worry about not writing bad performance code. Always target x64 for high performance code. AFAIK, x64 JIT is based on C++ compiler, and does some funny things like tail recursion optimizations. With 4.5 you can also enjoy the profile guided optimization and multi core JIT. At last, you can use async/await pattern to process it quicker.

2013-10-08 15:11:24

我知道这是一个老帖子，但在有了同样的问题和做了一些研究之后，我发现了一些有趣的东西，可能值得分享。

首先，我同意@mquander和他的回答。在性能方面，两者是相同的。

但是，我一直在使用Reflector查看System.Linq.Enumerable扩展名称空间中的方法，并注意到一个非常常见的优化。只要可能，IEnumerable<T>源就转换为IList<T>或ICollection<T>来优化方法。例如，查看ElementAt(int)。

有趣的是，微软选择只优化IList<T>，而不是IList。微软似乎更喜欢使用IList<T>接口。

2010-07-12 19:55:40

在LINQ查询中调用ToList()或ToArray()更好吗?

推荐文章

最新文章

标签