我相信有一种方法可以找到长度为n的O(n)无序数组中第k大的元素。也可能是期望O(n)之类的。我们该怎么做呢?


当前回答

虽然不是很确定O(n)复杂度,但肯定在O(n)和nLog(n)之间。也肯定更接近于O(n)而不是nLog(n)函数是用Java编写的

public int quickSelect(ArrayList<Integer>list, int nthSmallest){
    //Choose random number in range of 0 to array length
    Random random =  new Random();
    //This will give random number which is not greater than length - 1
    int pivotIndex = random.nextInt(list.size() - 1); 

    int pivot = list.get(pivotIndex);

    ArrayList<Integer> smallerNumberList = new ArrayList<Integer>();
    ArrayList<Integer> greaterNumberList = new ArrayList<Integer>();

    //Split list into two. 
    //Value smaller than pivot should go to smallerNumberList
    //Value greater than pivot should go to greaterNumberList
    //Do nothing for value which is equal to pivot
    for(int i=0; i<list.size(); i++){
        if(list.get(i)<pivot){
            smallerNumberList.add(list.get(i));
        }
        else if(list.get(i)>pivot){
            greaterNumberList.add(list.get(i));
        }
        else{
            //Do nothing
        }
    }

    //If smallerNumberList size is greater than nthSmallest value, nthSmallest number must be in this list 
    if(nthSmallest < smallerNumberList.size()){
        return quickSelect(smallerNumberList, nthSmallest);
    }
    //If nthSmallest is greater than [ list.size() - greaterNumberList.size() ], nthSmallest number must be in this list
    //The step is bit tricky. If confusing, please see the above loop once again for clarification.
    else if(nthSmallest > (list.size() - greaterNumberList.size())){
        //nthSmallest will have to be changed here. [ list.size() - greaterNumberList.size() ] elements are already in 
        //smallerNumberList
        nthSmallest = nthSmallest - (list.size() - greaterNumberList.size());
        return quickSelect(greaterNumberList,nthSmallest);
    }
    else{
        return pivot;
    }
}

其他回答

下面是一个随机化快速选择的c++实现。这个想法是随机选择一个主元。为了实现随机分区,我们使用一个随机函数rand()来生成l和r之间的索引,将随机生成索引处的元素与最后一个元素交换,最后调用以最后一个元素为枢轴的标准分区过程。

#include<iostream>
#include<climits>
#include<cstdlib>
using namespace std;

int randomPartition(int arr[], int l, int r);

// This function returns k'th smallest element in arr[l..r] using
// QuickSort based method.  ASSUMPTION: ALL ELEMENTS IN ARR[] ARE DISTINCT
int kthSmallest(int arr[], int l, int r, int k)
{
    // If k is smaller than number of elements in array
    if (k > 0 && k <= r - l + 1)
    {
        // Partition the array around a random element and
        // get position of pivot element in sorted array
        int pos = randomPartition(arr, l, r);

        // If position is same as k
        if (pos-l == k-1)
            return arr[pos];
        if (pos-l > k-1)  // If position is more, recur for left subarray
            return kthSmallest(arr, l, pos-1, k);

        // Else recur for right subarray
        return kthSmallest(arr, pos+1, r, k-pos+l-1);
    }

    // If k is more than number of elements in array
    return INT_MAX;
}

void swap(int *a, int *b)
{
    int temp = *a;
    *a = *b;
    *b = temp;
}

// Standard partition process of QuickSort().  It considers the last
// element as pivot and moves all smaller element to left of it and
// greater elements to right. This function is used by randomPartition()
int partition(int arr[], int l, int r)
{
    int x = arr[r], i = l;
    for (int j = l; j <= r - 1; j++)
    {
        if (arr[j] <= x) //arr[i] is bigger than arr[j] so swap them
        {
            swap(&arr[i], &arr[j]);
            i++;
        }
    }
    swap(&arr[i], &arr[r]); // swap the pivot
    return i;
}

// Picks a random pivot element between l and r and partitions
// arr[l..r] around the randomly picked element using partition()
int randomPartition(int arr[], int l, int r)
{
    int n = r-l+1;
    int pivot = rand() % n;
    swap(&arr[l + pivot], &arr[r]);
    return partition(arr, l, r);
}

// Driver program to test above methods
int main()
{
    int arr[] = {12, 3, 5, 7, 4, 19, 26};
    int n = sizeof(arr)/sizeof(arr[0]), k = 3;
    cout << "K'th smallest element is " << kthSmallest(arr, 0, n-1, k);
    return 0;
}

上述解的最坏情况时间复杂度仍为O(n2)。在最坏的情况下,随机函数可能总是选择一个角元素。上述随机化QuickSelect的期望时间复杂度为Θ(n)

我会这样做:

initialize empty doubly linked list l
for each element e in array
    if e larger than head(l)
        make e the new head of l
        if size(l) > k
            remove last element from l

the last element of l should now be the kth largest element

您可以简单地存储指向链表中第一个和最后一个元素的指针。它们只在更新列表时更改。

更新:

initialize empty sorted tree l
for each element e in array
    if e between head(l) and tail(l)
        insert e into l // O(log k)
        if size(l) > k
            remove last element from l

the last element of l should now be the kth largest element

你可以在O(n)个时间和常数空间中找到第k个最小的元素。如果我们认为数组只用于整数。

方法是对数组值的范围进行二分搜索。如果min_value和max_value都在整数范围内,我们可以对该范围进行二分搜索。 我们可以写一个比较器函数,它会告诉我们是否有任何值是第k个最小值或小于第k个最小值或大于第k个最小值。 进行二分搜索,直到找到第k小的数

这是它的代码

类解决方案:

def _iskthsmallest(self, A, val, k):
    less_count, equal_count = 0, 0
    for i in range(len(A)):
        if A[i] == val: equal_count += 1
        if A[i] < val: less_count += 1

    if less_count >= k: return 1
    if less_count + equal_count < k: return -1
    return 0

def kthsmallest_binary(self, A, min_val, max_val, k):
    if min_val == max_val:
        return min_val
    mid = (min_val + max_val)/2
    iskthsmallest = self._iskthsmallest(A, mid, k)
    if iskthsmallest == 0: return mid
    if iskthsmallest > 0: return self.kthsmallest_binary(A, min_val, mid, k)
    return self.kthsmallest_binary(A, mid+1, max_val, k)

# @param A : tuple of integers
# @param B : integer
# @return an integer
def kthsmallest(self, A, k):
    if not A: return 0
    if k > len(A): return 0
    min_val, max_val = min(A), max(A)
    return self.kthsmallest_binary(A, min_val, max_val, k)

我实现了在n个未排序元素中寻找第k个最小值的动态规划,特别是竞赛方法。执行时间为O(n + klog(n))。所使用的机制在维基百科关于选择算法的页面上被列为方法之一(如上面的帖子之一所示)。你可以阅读算法,也可以在我的博客页面“查找k个最小值”上找到代码(java)。此外,逻辑可以对列表进行部分排序——在O(klog(n))时间内返回第一个K min(或max)。

虽然代码提供了第k个最小值的结果,但可以使用类似的逻辑来查找O(klog(n))中的第k个最大值,忽略创建比赛树的前期工作。

还有Wirth的选择算法,它的实现比QuickSelect简单。Wirth的选择算法比QuickSelect慢,但经过一些改进,它变得更快。

更详细地说。使用Vladimir Zabrodsky的MODIFIND优化和3中位数的枢轴选择,并注意算法划分部分的最后步骤,我提出了以下算法(想象一下,命名为“LefSelect”):

#define F_SWAP(a,b) { float temp=(a);(a)=(b);(b)=temp; }

# Note: The code needs more than 2 elements to work
float lefselect(float a[], const int n, const int k) {
    int l=0, m = n-1, i=l, j=m;
    float x;

    while (l<m) {
        if( a[k] < a[i] ) F_SWAP(a[i],a[k]);
        if( a[j] < a[i] ) F_SWAP(a[i],a[j]);
        if( a[j] < a[k] ) F_SWAP(a[k],a[j]);

        x=a[k];
        while (j>k & i<k) {
            do i++; while (a[i]<x);
            do j--; while (a[j]>x);

            F_SWAP(a[i],a[j]);
        }
        i++; j--;

        if (j<k) {
            while (a[i]<x) i++;
            l=i; j=m;
        }
        if (k<i) {
            while (x<a[j]) j--;
            m=j; i=l;
        }
    }
    return a[k];
}

在我这里做的基准测试中,LefSelect比QuickSelect快20-30%。