编写一个程序，从一个包含10亿个数字的数组中找出100个最大的数字

最近我参加了一个面试，面试官要求我“编写一个程序，从一个包含10亿个数字的数组中找出100个最大的数字”。

我只能给出一个蛮力解决方案，即以O(nlogn)时间复杂度对数组进行排序，并取最后100个数字。

Arrays.sort(array);

面试官正在寻找一个更好的时间复杂度，我尝试了几个其他的解决方案，但都没有回答他。有没有更好的时间复杂度解决方案?

当前回答

I would find out who had the time to put a billion numbers into an array and fire him. Must work for government. At least if you had a linked list you could insert a number into the middle without moving half a billion to make room. Even better a Btree allows for a binary search. Each comparison eliminates half of your total. A hash algorithm would allow you to populate the data structure like a checkerboard but not so good for sparse data. As it is your best bet is to have a solution array of 100 integers and keep track of the lowest number in your solution array so you can replace it when you come across a higher number in the original array. You would have to look at every element in the original array assuming it is not sorted to begin with.

2013-10-09 15:11:46

其他回答

管理一个单独的列表是额外的工作，每次你找到另一个替代物时，你都必须在整个列表中移动东西。把它排序，选前100名。

2013-10-09 16:32:56

Time ~ O(100 * N)
Space ~ O(100 + N)

创建一个包含100个空槽的空列表对于输入列表中的每个数字: 如果数字小于第一个，跳过否则用这个数字代替它然后，将数字通过相邻的交换;直到它比下一个小返回列表

注意:如果log(input-list.size) + c < 100，那么最佳的方法是对输入列表进行排序，然后拆分前100项。

2013-10-09 06:19:07

复杂度为O(N)

首先创建一个100个int的数组，将这个数组的第一个元素初始化为N个值的第一个元素，用另一个变量CurrentBig来跟踪当前元素的索引

遍历N个值

if N[i] > M[CurrentBig] {

M[CurrentBig]=N[i]; ( overwrite the current value with the newly found larger number)

CurrentBig++;      ( go to the next position in the M array)

CurrentBig %= 100; ( modulo arithmetic saves you from using lists/hashes etc.)

M[CurrentBig]=N[i];    ( pick up the current value again to use it for the next Iteration of the N array)

}

完成后，从CurrentBig中打印M数组100次模100:-) 对于学生:确保代码的最后一行在代码退出之前没有胜过有效数据

2013-10-09 08:42:24

这个问题只需一行c++代码就可以用N log(100)的复杂度(而不是N log N)来回答。

 std::vector<int> myvector = ...; // Define your 1 billion numbers. 
                                 // Assumed integer just for concreteness 
 std::partial_sort (myvector.begin(), myvector.begin()+100, myvector.end());

最终答案将是一个向量，其中前100个元素保证是数组中最大的100个数字，而其余元素是无序的

c++ STL(标准库)对于这类问题非常方便。

注意:我并不是说这是最佳的解决方案，但它可以挽救你的面试。

2013-10-27 15:12:26

简单的解决方案是使用优先队列，将前100个数字添加到队列中，并跟踪队列中最小的数字，然后遍历其他10亿个数字，每当我们发现一个比优先队列中最大的数字大的数字时，我们删除最小的数字，添加新的数字，并再次跟踪队列中最小的数字。

如果这些数字是随机顺序的，这就很好了，因为当我们迭代10亿个随机数字时，下一个数字是目前为止最大的100个数字之一的情况是非常罕见的。但这些数字可能不是随机的。如果数组已经按升序排序，则始终向优先队列插入一个元素。

我们先从数组中选取100,000个随机数。为了避免可能很慢的随机访问，我们添加了400个随机组，每个组有250个连续的数字。通过这种随机选择，我们可以非常确定，剩下的数字中很少有进入前100位的，因此执行时间将非常接近于一个简单的循环，将10亿个数字与某个最大值进行比较。

2016-04-04 18:42:33

编写一个程序，从一个包含10亿个数字的数组中找出100个最大的数字

推荐文章

最新文章

标签