我相信有一种方法可以找到长度为n的O(n)无序数组中第k大的元素。也可能是期望O(n)之类的。我们该怎么做呢?


当前回答

还有一种算法,比快速选择算法性能更好。它叫做弗洛伊德-铆钉(FR)算法。

原文:https://doi.org/10.1145/360680.360694

下载版本:http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.309.7108&rep=rep1&type=pdf

维基百科文章https://en.wikipedia.org/wiki/Floyd%E2%80%93Rivest_algorithm

我尝试在c++中实现快速选择和FR算法。我还将它们与标准c++库实现std::nth_element(基本上是quickselect和heapselect的introselect混合)进行了比较。结果是快速选择和nth_element的平均运行,而FR算法的平均运行约。速度是它们的两倍。

我用于FR算法的示例代码:

template <typename T>
T FRselect(std::vector<T>& data, const size_t& n)
{
    if (n == 0)
        return *(std::min_element(data.begin(), data.end()));
    else if (n == data.size() - 1)
        return *(std::max_element(data.begin(), data.end()));
    else
        return _FRselect(data, 0, data.size() - 1, n);
}

template <typename T>
T _FRselect(std::vector<T>& data, const size_t& left, const size_t& right, const size_t& n)
{
    size_t leftIdx = left;
    size_t rightIdx = right;

    while (rightIdx > leftIdx)
    {
        if (rightIdx - leftIdx > 600)
        {
            size_t range = rightIdx - leftIdx + 1;
            long long i = n - (long long)leftIdx + 1;
            long long z = log(range);
            long long s = 0.5 * exp(2 * z / 3);
            long long sd = 0.5 * sqrt(z * s * (range - s) / range) * sgn(i - (long long)range / 2);

            size_t newLeft = fmax(leftIdx, n - i * s / range + sd);
            size_t newRight = fmin(rightIdx, n + (range - i) * s / range + sd);

            _FRselect(data, newLeft, newRight, n);
        }
        T t = data[n];
        size_t i = leftIdx;
        size_t j = rightIdx;
        // arrange pivot and right index
        std::swap(data[leftIdx], data[n]);
        if (data[rightIdx] > t)
            std::swap(data[rightIdx], data[leftIdx]);

        while (i < j)
        {
            std::swap(data[i], data[j]);
            ++i; --j;
            while (data[i] < t) ++i;
            while (data[j] > t) --j;
        }

        if (data[leftIdx] == t)
            std::swap(data[leftIdx], data[j]);
        else
        {
            ++j;
            std::swap(data[j], data[rightIdx]);
        }
        // adjust left and right towards the boundaries of the subset
        // containing the (k - left + 1)th smallest element
        if (j <= n)
            leftIdx = j + 1;
        if (n <= j)
            rightIdx = j - 1;
    }

    return data[leftIdx];
}

template <typename T>
int sgn(T val) {
    return (T(0) < val) - (val < T(0));
}

其他回答

如果你想要一个真正的O(n)算法,而不是O(kn)或类似的算法,那么你应该使用快速选择(它基本上是快速排序,你会丢弃你不感兴趣的分区)。我的教授写了一篇很棒的文章,包括运行时分析:(参考)

QuickSelect算法可以快速找到包含n个元素的无序数组中的第k个最小元素。这是一个随机算法,所以我们计算最坏情况下的预期运行时间。

这是算法。

QuickSelect(A, k)
  let r be chosen uniformly at random in the range 1 to length(A)
  let pivot = A[r]
  let A1, A2 be new arrays
  # split into a pile A1 of small elements and A2 of big elements
  for i = 1 to n
    if A[i] < pivot then
      append A[i] to A1
    else if A[i] > pivot then
      append A[i] to A2
    else
      # do nothing
  end for
  if k <= length(A1):
    # it's in the pile of small elements
    return QuickSelect(A1, k)
  else if k > length(A) - length(A2)
    # it's in the pile of big elements
    return QuickSelect(A2, k - (length(A) - length(A2))
  else
    # it's equal to the pivot
    return pivot

这个算法的运行时间是多少?如果对手为我们抛硬币,我们可能会发现主元总是最大的元素,k总是1,给出的运行时间为

T(n) = Theta(n) + T(n-1) = Theta(n2)

但如果选择确实是随机的,则预期运行时间由

T(n) <= Theta(n) + (1/n) ∑i=1 to nT(max(i, n-i-1))

我们做了一个不完全合理的假设递归总是落在A1或A2中较大的那个。

让我们猜测对于某个a T(n) <= an,然后我们得到

T(n) 
 <= cn + (1/n) ∑i=1 to nT(max(i-1, n-i))
 = cn + (1/n) ∑i=1 to floor(n/2) T(n-i) + (1/n) ∑i=floor(n/2)+1 to n T(i)
 <= cn + 2 (1/n) ∑i=floor(n/2) to n T(i)
 <= cn + 2 (1/n) ∑i=floor(n/2) to n ai

现在我们要用加号右边这个可怕的和来吸收左边的cn。如果我们将其限定为2(1/n)∑i=n/2到n an,我们大致得到2(1/n)(n/2)an = an。但是这个太大了,没有多余的空间来挤进一个cn。让我们用等差级数公式展开和:

i=floor(n/2) to n i  
 = ∑i=1 to n i - ∑i=1 to floor(n/2) i  
 = n(n+1)/2 - floor(n/2)(floor(n/2)+1)/2  
 <= n2/2 - (n/4)2/2  
 = (15/32)n2

我们利用n“足够大”的优势,用更干净(更小)的n/4替换丑陋的地板(n/2)因子。现在我们可以继续

cn + 2 (1/n) ∑i=floor(n/2) to n ai,
 <= cn + (2a/n) (15/32) n2
 = n (c + (15/16)a)
 <= an

提供了> 16c。

得到T(n) = O(n)显然是(n)所以我们得到T(n) = (n)

我想提出一个答案

如果我们取前k个元素并将它们排序成一个k个值的链表

对于每一个其他的值,即使在最坏的情况下如果我们对剩下的n-k个值进行插入排序即使在最坏的情况下,比较的数量也将是k*(n-k)对于前k个要排序的值让它是k*(k-1)所以结果是(nk-k)也就是o(n)

干杯

我实现了在n个未排序元素中寻找第k个最小值的动态规划,特别是竞赛方法。执行时间为O(n + klog(n))。所使用的机制在维基百科关于选择算法的页面上被列为方法之一(如上面的帖子之一所示)。你可以阅读算法,也可以在我的博客页面“查找k个最小值”上找到代码(java)。此外,逻辑可以对列表进行部分排序——在O(klog(n))时间内返回第一个K min(或max)。

虽然代码提供了第k个最小值的结果,但可以使用类似的逻辑来查找O(klog(n))中的第k个最大值,忽略创建比赛树的前期工作。

这是一个Javascript实现。

如果您释放了不能修改数组的约束,则可以使用两个索引来标识“当前分区”(经典快速排序样式- http://www.nczonline.net/blog/2012/11/27/computer-science-in-javascript-quicksort/)来防止使用额外的内存。

function kthMax(a, k){
    var size = a.length;

    var pivot = a[ parseInt(Math.random()*size) ]; //Another choice could have been (size / 2) 

    //Create an array with all element lower than the pivot and an array with all element higher than the pivot
    var i, lowerArray = [], upperArray = [];
    for (i = 0; i  < size; i++){
        var current = a[i];

        if (current < pivot) {
            lowerArray.push(current);
        } else if (current > pivot) {
            upperArray.push(current);
        }
    }

    //Which one should I continue with?
    if(k <= upperArray.length) {
        //Upper
        return kthMax(upperArray, k);
    } else {
        var newK = k - (size - lowerArray.length);

        if (newK > 0) {
            ///Lower
            return kthMax(lowerArray, newK);
        } else {
            //None ... it's the current pivot!
            return pivot;
        }   
    }
}  

如果你想测试它的表现,你可以使用这个变量:

    function kthMax (a, k, logging) {
         var comparisonCount = 0; //Number of comparison that the algorithm uses
         var memoryCount = 0;     //Number of integers in memory that the algorithm uses
         var _log = logging;

         if(k < 0 || k >= a.length) {
            if (_log) console.log ("k is out of range"); 
            return false;
         }      

         function _kthmax(a, k){
             var size = a.length;
             var pivot = a[parseInt(Math.random()*size)];
             if(_log) console.log("Inputs:", a,  "size="+size, "k="+k, "pivot="+pivot);

             // This should never happen. Just a nice check in this exercise
             // if you are playing with the code to avoid never ending recursion            
             if(typeof pivot === "undefined") {
                 if (_log) console.log ("Ops..."); 
                 return false;
             }

             var i, lowerArray = [], upperArray = [];
             for (i = 0; i  < size; i++){
                 var current = a[i];
                 if (current < pivot) {
                     comparisonCount += 1;
                     memoryCount++;
                     lowerArray.push(current);
                 } else if (current > pivot) {
                     comparisonCount += 2;
                     memoryCount++;
                     upperArray.push(current);
                 }
             }
             if(_log) console.log("Pivoting:",lowerArray, "*"+pivot+"*", upperArray);

             if(k <= upperArray.length) {
                 comparisonCount += 1;
                 return _kthmax(upperArray, k);
             } else if (k > size - lowerArray.length) {
                 comparisonCount += 2;
                 return _kthmax(lowerArray, k - (size - lowerArray.length));
             } else {
                 comparisonCount += 2;
                 return pivot;
             }
     /* 
      * BTW, this is the logic for kthMin if we want to implement that... ;-)
      * 

             if(k <= lowerArray.length) {
                 return kthMin(lowerArray, k);
             } else if (k > size - upperArray.length) {
                 return kthMin(upperArray, k - (size - upperArray.length));
             } else 
                 return pivot;
     */            
         }

         var result = _kthmax(a, k);
         return {result: result, iterations: comparisonCount, memory: memoryCount};
     }

剩下的代码只是创建一些游乐场:

    function getRandomArray (n){
        var ar = [];
        for (var i = 0, l = n; i < l; i++) {
            ar.push(Math.round(Math.random() * l))
        }

        return ar;
    }

    //Create a random array of 50 numbers
    var ar = getRandomArray (50);   

现在给你做几次测试。 因为Math.random()每次都会产生不同的结果:

    kthMax(ar, 2, true);
    kthMax(ar, 2);
    kthMax(ar, 2);
    kthMax(ar, 2);
    kthMax(ar, 2);
    kthMax(ar, 2);
    kthMax(ar, 34, true);
    kthMax(ar, 34);
    kthMax(ar, 34);
    kthMax(ar, 34);
    kthMax(ar, 34);
    kthMax(ar, 34);

如果你测试它几次,你甚至可以看到经验的迭代次数,平均来说,O(n) ~=常数* n, k的值不会影响算法。

虽然不是很确定O(n)复杂度,但肯定在O(n)和nLog(n)之间。也肯定更接近于O(n)而不是nLog(n)函数是用Java编写的

public int quickSelect(ArrayList<Integer>list, int nthSmallest){
    //Choose random number in range of 0 to array length
    Random random =  new Random();
    //This will give random number which is not greater than length - 1
    int pivotIndex = random.nextInt(list.size() - 1); 

    int pivot = list.get(pivotIndex);

    ArrayList<Integer> smallerNumberList = new ArrayList<Integer>();
    ArrayList<Integer> greaterNumberList = new ArrayList<Integer>();

    //Split list into two. 
    //Value smaller than pivot should go to smallerNumberList
    //Value greater than pivot should go to greaterNumberList
    //Do nothing for value which is equal to pivot
    for(int i=0; i<list.size(); i++){
        if(list.get(i)<pivot){
            smallerNumberList.add(list.get(i));
        }
        else if(list.get(i)>pivot){
            greaterNumberList.add(list.get(i));
        }
        else{
            //Do nothing
        }
    }

    //If smallerNumberList size is greater than nthSmallest value, nthSmallest number must be in this list 
    if(nthSmallest < smallerNumberList.size()){
        return quickSelect(smallerNumberList, nthSmallest);
    }
    //If nthSmallest is greater than [ list.size() - greaterNumberList.size() ], nthSmallest number must be in this list
    //The step is bit tricky. If confusing, please see the above loop once again for clarification.
    else if(nthSmallest > (list.size() - greaterNumberList.size())){
        //nthSmallest will have to be changed here. [ list.size() - greaterNumberList.size() ] elements are already in 
        //smallerNumberList
        nthSmallest = nthSmallest - (list.size() - greaterNumberList.size());
        return quickSelect(greaterNumberList,nthSmallest);
    }
    else{
        return pivot;
    }
}