在LINQ查询中调用ToList()或ToArray()更好吗?

我经常遇到这样的情况:我想在声明查询的地方对查询进行求值。这通常是因为我需要对它进行多次迭代，计算成本很高。例如:

string raw = "...";
var lines = (from l in raw.Split('\n')
             let ll = l.Trim()
             where !string.IsNullOrEmpty(ll)
             select ll).ToList();

这很好。但是如果我不打算修改结果，那么我也可以调用ToArray()而不是ToList()。

然而，我想知道ToArray()是否通过首先调用ToList()来实现，因此内存效率比只调用ToList()低。

我疯了吗?我是否应该调用ToArray() -在知道内存不会被分配两次的情况下安全可靠?

当前回答

性能差异并不显著，因为List<T>是作为动态大小的数组实现的。调用ToArray()(它使用内部Buffer<T>类来增长数组)或ToList()(它调用List<T>(IEnumerable<T>)构造函数)将最终成为将它们放入数组并增长数组直到适合它们为止的问题。

如果您希望具体确认这一事实，请查看Reflector中所讨论的方法的实现——您将看到它们的代码几乎完全相同。

2009-07-09 19:33:54

其他回答

一个很晚的答案，但我认为这对谷歌人有帮助。

They both suck when they created using linq. They both implement same code to resize buffer if necessary. ToArray internally uses a class to convert IEnumerable<> to array, by allocating an array of 4 elements. If that is not enough than it doubles the size by creating a new array double the size of current and copying current array to it. At the end it allocates a new array of count of your items. If your query returns 129 elements then ToArray will make 6 allocations and memory copy operations to create a 256 element array and than am another array of 129 to return. so much for memory efficiency.

ToList做同样的事情，但是它跳过了最后的分配，因为您可以在将来添加项。List不关心它是从linq查询创建的还是手动创建的。

List在内存上更好，但在cpu上更差，因为List是一个通用的解决方案，每个操作都需要范围检查，除了.net内部的数组范围检查之外。

因此，如果你将迭代你的结果集太多次，那么数组是很好的，因为它意味着比列表更少的范围检查，编译器通常优化数组的顺序访问。

如果在创建List时指定capacity参数，则它的初始化分配可以更好。在这种情况下，它将只分配数组一次，假设您知道结果大小。linq的ToList没有指定重载来提供它，因此我们必须创建扩展方法，该方法创建一个具有给定容量的列表，然后使用list <>. addrange。

为了完成这个问题，我必须写出下面的句子

At the end, you can use either an ToArray, or ToList, performance will not be so different ( see answer of @EMP ). You are using C#. If you need performance then do not worry about writing about high performance code, but worry about not writing bad performance code. Always target x64 for high performance code. AFAIK, x64 JIT is based on C++ compiler, and does some funny things like tail recursion optimizations. With 4.5 you can also enjoy the profile guided optimization and multi core JIT. At last, you can use async/await pattern to process it quicker.

2013-10-08 15:11:24

您应该根据理想的设计选择来决定使用ToList还是ToArray。如果您想要一个只能通过索引迭代和访问的集合，请选择ToArray。如果您希望以后能够轻松地从集合中添加和删除额外的功能，那么可以使用ToList(并不是说您不能添加到数组中，但这通常不是合适的工具)。

如果性能很重要，您还应该考虑哪些操作会更快。实际上，您不会调用ToList或ToArray一百万次，但可能会对获得的集合进行一百万次操作。在这方面[]更好，因为List<>是[]，有一些开销。查看这个线程的一些效率比较:List<int>或int[]

在我自己不久前的测试中，我发现ToArray更快。我不确定这些测试有多偏颇。然而，性能差异是如此微不足道，只有在循环运行这些查询数百万次时才能明显看出。

2012-12-07 10:42:03

(七年后……)

其他几个(好的)答案集中在将会发生的微观性能差异上。

这篇文章只是一个补充，以提及由数组(T[])产生的IEnumerator<T>与由List<T>返回的IEnumerator之间存在的语义差异。

最好用例子来说明:

IList<int> source = Enumerable.Range(1, 10).ToArray();  // try changing to .ToList()

foreach (var x in source)
{
  if (x == 5)
    source[8] *= 100;
  Console.WriteLine(x);
}

上面的代码将毫无例外地运行，并产生输出:

这表明int[]返回的IEnumarator<int>并不跟踪自枚举器创建以来数组是否被修改过。

Note that I declared the local variable source as an IList<int>. In that way I make sure the C# compiler does not optimze the foreach statement into something which is equivalent to a for (var idx = 0; idx < source.Length; idx++) { /* ... */ } loop. This is something the C# compiler might do if I use var source = ...; instead. In my current version of the .NET framework the actual enumerator used here is a non-public reference-type System.SZArrayHelper+SZGenericArrayEnumerator`1[System.Int32] but of course this is an implementation detail.

现在，如果我将.ToArray()改为.ToList()，我只得到:

其次是一个系统。InvalidOperationException爆炸说:

修改集合;枚举操作可能无法执行。

在这种情况下，底层枚举器是公共可变值类型System. collections . generic . list ' 1+ enumerator [System. collections . generic . list]。Int32](在这种情况下，在IEnumerator<int>框内，因为我使用IList<int>)。

综上所述，List<T>生成的枚举数跟踪列表在枚举过程中是否发生变化，而T[]生成的枚举数则没有。因此，在. tolist()和. toarray()之间进行选择时，请考虑此差异。

人们经常添加一个额外的. toarray()或. tolist()来绕过一个在枚举器的生命周期内跟踪它是否被修改的集合。

(如果有人想知道List<>如何跟踪集合是否被修改，这个类中有一个私有字段_version，每当List<>被更新时，它都会被更改。实际上可以通过简单地删除索引器public T this[int index]的set访问器中增加_version的行来改变List<>的这种行为，就像最近在Dictionary<，>中所做的那样，如另一个答案所述。)

2016-12-20 16:03:00

我发现人们在这里做的其他基准测试都有不足，所以这里是我的尝试。如果你发现我的方法有问题，请告诉我。

/* This is a benchmarking template I use in LINQPad when I want to do a
 * quick performance test. Just give it a couple of actions to test and
 * it will give you a pretty good idea of how long they take compared
 * to one another. It's not perfect: You can expect a 3% error margin
 * under ideal circumstances. But if you're not going to improve
 * performance by more than 3%, you probably don't care anyway.*/
void Main()
{
    // Enter setup code here
    var values = Enumerable.Range(1, 100000)
        .Select(i => i.ToString())
        .ToArray()
        .Select(i => i);
    values.GetType().Dump();
    var actions = new[]
    {
        new TimedAction("ToList", () =>
        {
            values.ToList();
        }),
        new TimedAction("ToArray", () =>
        {
            values.ToArray();
        }),
        new TimedAction("Control", () =>
        {
            foreach (var element in values)
            {
                // do nothing
            }
        }),
        // Add tests as desired
    };
    const int TimesToRun = 1000; // Tweak this as necessary
    TimeActions(TimesToRun, actions);
}


#region timer helper methods
// Define other methods and classes here
public void TimeActions(int iterations, params TimedAction[] actions)
{
    Stopwatch s = new Stopwatch();
    int length = actions.Length;
    var results = new ActionResult[actions.Length];
    // Perform the actions in their initial order.
    for (int i = 0; i < length; i++)
    {
        var action = actions[i];
        var result = results[i] = new ActionResult { Message = action.Message };
        // Do a dry run to get things ramped up/cached
        result.DryRun1 = s.Time(action.Action, 10);
        result.FullRun1 = s.Time(action.Action, iterations);
    }
    // Perform the actions in reverse order.
    for (int i = length - 1; i >= 0; i--)
    {
        var action = actions[i];
        var result = results[i];
        // Do a dry run to get things ramped up/cached
        result.DryRun2 = s.Time(action.Action, 10);
        result.FullRun2 = s.Time(action.Action, iterations);
    }
    results.Dump();
}

public class ActionResult
{
    public string Message { get; set; }
    public double DryRun1 { get; set; }
    public double DryRun2 { get; set; }
    public double FullRun1 { get; set; }
    public double FullRun2 { get; set; }
}

public class TimedAction
{
    public TimedAction(string message, Action action)
    {
        Message = message;
        Action = action;
    }
    public string Message { get; private set; }
    public Action Action { get; private set; }
}

public static class StopwatchExtensions
{
    public static double Time(this Stopwatch sw, Action action, int iterations)
    {
        sw.Restart();
        for (int i = 0; i < iterations; i++)
        {
            action();
        }
        sw.Stop();

        return sw.Elapsed.TotalMilliseconds;
    }
}
#endregion

你可以在这里下载LINQPad脚本。

结果:

调整上面的代码，你会发现:

当处理较小的数组时，差异就不那么显著了。在处理整型而不是字符串时，这种差异不太显著。使用大型结构体而不是字符串通常会花费更多的时间，但并不会真正改变比例。

这与投票最多的答案的结论一致:

除非您的代码经常生成许多大型数据列表，否则不太可能注意到性能上的差异。(当创建1000个包含100K字符串的列表时，只有200ms的差异。) ToList()始终运行得更快，如果不打算长时间保留结果，那么它是一个更好的选择。

更新

@JonHanna指出，根据Select的实现，ToList()或ToArray()实现可以提前预测结果集合的大小。将上面代码中的. select (i => i)替换为Where(i => true)会产生非常相似的结果，并且更有可能这样做，而不管. net实现如何。

2017-09-08 19:16:54

如果您希望具体确认这一事实，请查看Reflector中所讨论的方法的实现——您将看到它们的代码几乎完全相同。

2009-07-09 19:33:54

在LINQ查询中调用ToList()或ToArray()更好吗?

推荐文章

最新文章

标签