我在一次面试中被问到这个问题。它们都是O(nlogn),但大多数人使用快速排序而不是归并排序。为什么呢?


当前回答

快速排序并不比归并排序好。对于O(n²)(很少发生的最坏情况),快速排序可能比归并排序的O(nlogn)慢得多。快速排序的开销更小,所以对于小n和速度较慢的计算机,它会更好。但是今天的计算机是如此之快,以至于合并排序的额外开销可以忽略不计,并且在大多数情况下,非常慢的快速排序的风险远远超过合并排序的微不足道的开销。

此外,归并排序将具有相同键的项按原始顺序保留,这是一个有用的属性。

其他回答

正如许多人所注意到的,快速排序的平均情况性能要比归并排序快。但这只适用于假设按需访问任何内存段的时间为常数的情况。

在RAM中,这种假设通常不太坏(由于缓存的存在,这种假设并不总是正确的,但也不太坏)。然而,如果你的数据结构足够大,可以存储在磁盘上,那么快速排序就会因为磁盘平均每秒进行200次随机查找而被扼杀。但是,同样的磁盘在按顺序每秒读取或写入兆字节的数据方面没有任何问题。这正是归并排序所做的。

因此,如果数据必须在磁盘上排序,你真的,真的想使用归并排序的一些变体。(通常你快速排序子列表,然后开始将它们合并到某个大小阈值以上。)

Furthermore if you have to do anything with datasets of that size, think hard about how to avoid seeks to disk. For instance this is why it is standard advice that you drop indexes before doing large data loads in databases, and then rebuild the index later. Maintaining the index during the load means constantly seeking to disk. By contrast if you drop the indexes, then the database can rebuild the index by first sorting the information to be dealt with (using a mergesort of course!) and then loading it into a BTREE datastructure for the index. (BTREEs are naturally kept in order, so you can load one from a sorted dataset with few seeks to disk.)

在许多情况下,了解如何避免磁盘寻道使我将数据处理工作花费数小时而不是数天或数周。

亩! 快速排序并不比归并排序更好,它非常适合于不同类型的应用。

归并排序是值得考虑的,如果速度是本质,糟糕的最差情况性能不能容忍,并且有额外的空间可用

你说他们«他们都是O(nlogn)[…]»。这是错误的。«快速排序使用大约n^2/2比较在最坏的情况下。

然而,根据我的经验,最重要的属性是在使用带有命令式范式的编程语言进行排序时,可以轻松实现顺序访问。

1 Sedgewick,算法

在所有条件相同的情况下,我希望大多数人使用最方便的方法,这往往是qsort(3)。除此之外,快速排序在数组上非常快,就像归并排序是列表的常用选择一样。

我想知道的是为什么基数排序和桶排序这么少见。它们是O(n)至少在链表上是这样的它所需要的只是将键转换为序数的方法。(字符串和浮动工作得很好。)

我认为原因与计算机科学的教学方式有关。我甚至不得不向我的讲师演示算法分析,它确实有可能比O(nlog (n))更快地排序。(他证明了比较排序不能比O(nlog (n))快,这是正确的)

在其他新闻中,浮点数可以按整数排序,但之后必须将负数反转。

编辑: 实际上,这里有一种更糟糕的将浮点数作为整数排序的方法:http://www.stereopsis.com/radix.html。注意,不管你实际使用什么排序算法,比特翻转技巧都可以使用……

One of the reason is more philosophical. Quicksort is Top->Down philosophy. With n elements to sort, there are n! possibilities. With 2 partitions of m & n-m which are mutually exclusive, the number of possibilities go down in several orders of magnitude. m! * (n-m)! is smaller by several orders than n! alone. imagine 5! vs 3! *2!. 5! has 10 times more possibilities than 2 partitions of 2 & 3 each . and extrapolate to 1 million factorial vs 900K!*100K! vs. So instead of worrying about establishing any order within a range or a partition,just establish order at a broader level in partitions and reduce the possibilities within a partition. Any order established earlier within a range will be disturbed later if the partitions themselves are not mutually exclusive.

任何自下而上的排序方法,如归并排序或堆排序,就像工人或雇员的方法一样,人们很早就开始在微观层面进行比较。但是,一旦在它们之间发现了一个元素,这个顺序就必然会丢失。这些方法非常稳定和可预测,但要做一定量的额外工作。

Quick Sort is like Managerial approach where one is not initially concerned about any order , only about meeting a broad criterion with No regard for order. Then the partitions are narrowed until you get a sorted set. The real challenge in Quicksort is in finding a partition or criterion in the dark when you know nothing about the elements to sort. That is why we either need to spend some effort to find a median value or pick 1 at random or some arbitrary "Managerial" approach . To find a perfect median can take significant amount of effort and leads to a stupid bottom up approach again. So Quicksort says just a pick a random pivot and hope that it will be somewhere in the middle or do some work to find median of 3 , 5 or something more to find a better median but do not plan to be perfect & don't waste any time in initially ordering. That seems to do well if you are lucky or sometimes degrades to n^2 when you don't get a median but just take a chance. Any way data is random. right. So I agree more with the top ->down logical approach of quicksort & it turns out that the chance it takes about pivot selection & comparisons that it saves earlier seems to work better more times than any meticulous & thorough stable bottom ->up approach like merge sort. But

快速排序和合并排序的小增加。

它还可以依赖于排序项的类型。如果访问项、交换和比较不是简单的操作,就像比较平面内存中的整数一样,那么归并排序可能是更可取的算法。

例如,我们在远程服务器上使用网络协议对项目进行排序。

而且,在像“链表”这样的自定义容器中,也没有快速排序的好处。 1. 对链表进行归并排序,不需要额外的内存。 2. 快速排序中对元素的访问不是顺序的(在内存中)