我相信有一种方法可以找到长度为n的O(n)无序数组中第k大的元素。也可能是期望O(n)之类的。我们该怎么做呢?
当前回答
虽然不是很确定O(n)复杂度,但肯定在O(n)和nLog(n)之间。也肯定更接近于O(n)而不是nLog(n)函数是用Java编写的
public int quickSelect(ArrayList<Integer>list, int nthSmallest){
//Choose random number in range of 0 to array length
Random random = new Random();
//This will give random number which is not greater than length - 1
int pivotIndex = random.nextInt(list.size() - 1);
int pivot = list.get(pivotIndex);
ArrayList<Integer> smallerNumberList = new ArrayList<Integer>();
ArrayList<Integer> greaterNumberList = new ArrayList<Integer>();
//Split list into two.
//Value smaller than pivot should go to smallerNumberList
//Value greater than pivot should go to greaterNumberList
//Do nothing for value which is equal to pivot
for(int i=0; i<list.size(); i++){
if(list.get(i)<pivot){
smallerNumberList.add(list.get(i));
}
else if(list.get(i)>pivot){
greaterNumberList.add(list.get(i));
}
else{
//Do nothing
}
}
//If smallerNumberList size is greater than nthSmallest value, nthSmallest number must be in this list
if(nthSmallest < smallerNumberList.size()){
return quickSelect(smallerNumberList, nthSmallest);
}
//If nthSmallest is greater than [ list.size() - greaterNumberList.size() ], nthSmallest number must be in this list
//The step is bit tricky. If confusing, please see the above loop once again for clarification.
else if(nthSmallest > (list.size() - greaterNumberList.size())){
//nthSmallest will have to be changed here. [ list.size() - greaterNumberList.size() ] elements are already in
//smallerNumberList
nthSmallest = nthSmallest - (list.size() - greaterNumberList.size());
return quickSelect(greaterNumberList,nthSmallest);
}
else{
return pivot;
}
}
其他回答
下面是eladv建议的算法的实现(我也把随机pivot的实现放在这里):
public class Median {
public static void main(String[] s) {
int[] test = {4,18,20,3,7,13,5,8,2,1,15,17,25,30,16};
System.out.println(selectK(test,8));
/*
int n = 100000000;
int[] test = new int[n];
for(int i=0; i<test.length; i++)
test[i] = (int)(Math.random()*test.length);
long start = System.currentTimeMillis();
random_selectK(test, test.length/2);
long end = System.currentTimeMillis();
System.out.println(end - start);
*/
}
public static int random_selectK(int[] a, int k) {
if(a.length <= 1)
return a[0];
int r = (int)(Math.random() * a.length);
int p = a[r];
int small = 0, equal = 0, big = 0;
for(int i=0; i<a.length; i++) {
if(a[i] < p) small++;
else if(a[i] == p) equal++;
else if(a[i] > p) big++;
}
if(k <= small) {
int[] temp = new int[small];
for(int i=0, j=0; i<a.length; i++)
if(a[i] < p)
temp[j++] = a[i];
return random_selectK(temp, k);
}
else if (k <= small+equal)
return p;
else {
int[] temp = new int[big];
for(int i=0, j=0; i<a.length; i++)
if(a[i] > p)
temp[j++] = a[i];
return random_selectK(temp,k-small-equal);
}
}
public static int selectK(int[] a, int k) {
if(a.length <= 5) {
Arrays.sort(a);
return a[k-1];
}
int p = median_of_medians(a);
int small = 0, equal = 0, big = 0;
for(int i=0; i<a.length; i++) {
if(a[i] < p) small++;
else if(a[i] == p) equal++;
else if(a[i] > p) big++;
}
if(k <= small) {
int[] temp = new int[small];
for(int i=0, j=0; i<a.length; i++)
if(a[i] < p)
temp[j++] = a[i];
return selectK(temp, k);
}
else if (k <= small+equal)
return p;
else {
int[] temp = new int[big];
for(int i=0, j=0; i<a.length; i++)
if(a[i] > p)
temp[j++] = a[i];
return selectK(temp,k-small-equal);
}
}
private static int median_of_medians(int[] a) {
int[] b = new int[a.length/5];
int[] temp = new int[5];
for(int i=0; i<b.length; i++) {
for(int j=0; j<5; j++)
temp[j] = a[5*i + j];
Arrays.sort(temp);
b[i] = temp[2];
}
return selectK(b, b.length/2 + 1);
}
}
你可以在O(n)个时间和常数空间中找到第k个最小的元素。如果我们认为数组只用于整数。
方法是对数组值的范围进行二分搜索。如果min_value和max_value都在整数范围内,我们可以对该范围进行二分搜索。 我们可以写一个比较器函数,它会告诉我们是否有任何值是第k个最小值或小于第k个最小值或大于第k个最小值。 进行二分搜索,直到找到第k小的数
这是它的代码
类解决方案:
def _iskthsmallest(self, A, val, k):
less_count, equal_count = 0, 0
for i in range(len(A)):
if A[i] == val: equal_count += 1
if A[i] < val: less_count += 1
if less_count >= k: return 1
if less_count + equal_count < k: return -1
return 0
def kthsmallest_binary(self, A, min_val, max_val, k):
if min_val == max_val:
return min_val
mid = (min_val + max_val)/2
iskthsmallest = self._iskthsmallest(A, mid, k)
if iskthsmallest == 0: return mid
if iskthsmallest > 0: return self.kthsmallest_binary(A, min_val, mid, k)
return self.kthsmallest_binary(A, mid+1, max_val, k)
# @param A : tuple of integers
# @param B : integer
# @return an integer
def kthsmallest(self, A, k):
if not A: return 0
if k > len(A): return 0
min_val, max_val = min(A), max(A)
return self.kthsmallest_binary(A, min_val, max_val, k)
如果你想要一个真正的O(n)算法,而不是O(kn)或类似的算法,那么你应该使用快速选择(它基本上是快速排序,你会丢弃你不感兴趣的分区)。我的教授写了一篇很棒的文章,包括运行时分析:(参考)
QuickSelect算法可以快速找到包含n个元素的无序数组中的第k个最小元素。这是一个随机算法,所以我们计算最坏情况下的预期运行时间。
这是算法。
QuickSelect(A, k)
let r be chosen uniformly at random in the range 1 to length(A)
let pivot = A[r]
let A1, A2 be new arrays
# split into a pile A1 of small elements and A2 of big elements
for i = 1 to n
if A[i] < pivot then
append A[i] to A1
else if A[i] > pivot then
append A[i] to A2
else
# do nothing
end for
if k <= length(A1):
# it's in the pile of small elements
return QuickSelect(A1, k)
else if k > length(A) - length(A2)
# it's in the pile of big elements
return QuickSelect(A2, k - (length(A) - length(A2))
else
# it's equal to the pivot
return pivot
这个算法的运行时间是多少?如果对手为我们抛硬币,我们可能会发现主元总是最大的元素,k总是1,给出的运行时间为
T(n) = Theta(n) + T(n-1) = Theta(n2)
但如果选择确实是随机的,则预期运行时间由
T(n) <= Theta(n) + (1/n) ∑i=1 to nT(max(i, n-i-1))
我们做了一个不完全合理的假设递归总是落在A1或A2中较大的那个。
让我们猜测对于某个a T(n) <= an,然后我们得到
T(n)
<= cn + (1/n) ∑i=1 to nT(max(i-1, n-i))
= cn + (1/n) ∑i=1 to floor(n/2) T(n-i) + (1/n) ∑i=floor(n/2)+1 to n T(i)
<= cn + 2 (1/n) ∑i=floor(n/2) to n T(i)
<= cn + 2 (1/n) ∑i=floor(n/2) to n ai
现在我们要用加号右边这个可怕的和来吸收左边的cn。如果我们将其限定为2(1/n)∑i=n/2到n an,我们大致得到2(1/n)(n/2)an = an。但是这个太大了,没有多余的空间来挤进一个cn。让我们用等差级数公式展开和:
∑i=floor(n/2) to n i
= ∑i=1 to n i - ∑i=1 to floor(n/2) i
= n(n+1)/2 - floor(n/2)(floor(n/2)+1)/2
<= n2/2 - (n/4)2/2
= (15/32)n2
我们利用n“足够大”的优势,用更干净(更小)的n/4替换丑陋的地板(n/2)因子。现在我们可以继续
cn + 2 (1/n) ∑i=floor(n/2) to n ai,
<= cn + (2a/n) (15/32) n2
= n (c + (15/16)a)
<= an
提供了> 16c。
得到T(n) = O(n)显然是(n)所以我们得到T(n) = (n)
转到这个链接的结尾:...........
http://www.geeksforgeeks.org/kth-smallestlargest-element-unsorted-array-set-3-worst-case-linear-time/
下面是完整实现的链接,其中相当广泛地解释了在无序算法中查找第k个元素的算法是如何工作的。基本思想是像快速排序一样对数组进行分区。但为了避免极端情况(例如每一步都选择最小的元素作为主元,使算法运行时间退化为O(n^2)),采用特殊的主元选择,称为中位数的中位数算法。在最坏情况和平均情况下,整个解在O(n)时间内运行。
这里是全文的链接(它是关于寻找第k个最小的元素,但寻找第k个最大的元素的原理是相同的):
在无序数组中寻找第k个最小元素