编写一个程序，从一个包含10亿个数字的数组中找出100个最大的数字

最近我参加了一个面试，面试官要求我“编写一个程序，从一个包含10亿个数字的数组中找出100个最大的数字”。

我只能给出一个蛮力解决方案，即以O(nlogn)时间复杂度对数组进行排序，并取最后100个数字。

Arrays.sort(array);

面试官正在寻找一个更好的时间复杂度，我尝试了几个其他的解决方案，但都没有回答他。有没有更好的时间复杂度解决方案?

当前回答

取十亿个数字中的前一百个，然后排序。现在只需遍历十亿，如果源数大于100中最小的数，则按排序顺序插入。你得到的结果更接近于O(n)除以集合的大小。

2013-10-07 14:56:15

其他回答

你可以遍历这些数字，需要O(n)

只要发现一个大于当前最小值的值，就将新值添加到一个大小为100的循环队列中。

循环队列的最小值就是新的比较值。继续往队列中添加。如果已满，则从队列中提取最小值。

2013-10-07 14:45:37

如果在面试中被问到这个问题，面试官可能想看你解决问题的过程，而不仅仅是你的算法知识。

The description is quite general so maybe you can ask him the range or meaning of these numbers to make the problem clear. Doing this may impress an interviewer. If, for example, these numbers stands for people's age then it's a much easier problem. With a reasonable assumption that nobody alive is older than 200, you can use an integer array of size 200 (maybe 201) to count the number of people with the same age in just one iteration. Here the index means the age. After this it's a piece of cake to find 100 largest numbers. By the way this algorithm is called counting sort.

无论如何，让问题更具体、更清楚对你在面试中是有好处的。

2013-10-08 18:04:09

另一个O(n)算法-

该算法通过消元法找到最大的100个

考虑所有的百万数字的二进制表示。从最重要的位开始。确定MSB是否为1可以通过布尔运算与适当的数字相乘来完成。如果百万个数字中有超过100个1，就去掉其他带0的数字。现在剩下的数从下一个最有效的位开始。计算排除后剩余数字的数量，只要这个数字大于100，就继续进行。

主要的布尔运算可以在图形处理器上并行完成

2013-10-09 12:40:14

从十亿个数字中找到前100个最好使用包含100个元素的最小堆。

首先用遇到的前100个数字对最小堆进行质数。Min-heap将前100个数字中最小的存储在根(顶部)。

现在，当你继续计算其他数字时，只将它们与根数(100中最小的数)进行比较。

如果遇到的新数字大于最小堆的根，则将根替换为该数字，否则忽略它。

作为在最小堆中插入新数字的一部分，堆中最小的数字将移到顶部(根)。

一旦我们遍历了所有的数字，我们将得到最小堆中最大的100个数字。

2017-11-24 10:55:00

我知道这可能会被埋没，但这是我对一个基MSD的变化的想法。

伪代码:

//billion is the array of 1 billion numbers
int[] billion = getMyBillionNumbers();
//this assumes these are 32-bit integers and we are using hex digits
int[][] mynums = int[8][16];

for number in billion
    putInTop100Array(number)

function putInTop100Array(number){
    //basically if we got past all the digits successfully
    if(number == null)
        return true;
    msdIdx = getMsdIdx(number);
    msd = getMsd(number);
    //check if the idx above where we are is already full
    if(mynums[msdIdx][msd+1] > 99) {
        return false;
    } else if(putInTop100Array(removeMSD(number)){
        mynums[msdIdx][msd]++;
        //we've found 100 digits here, no need to keep looking below where we are
        if(mynums[msdIdx][msd] > 99){
           for(int i = 0; i < mds; i++){
              //making it 101 just so we can tell the difference
              //between numbers where we actually found 101, and 
              //where we just set it
              mynums[msdIdx][i] = 101;
           }
        }
        return true;
    }
    return false;
}

函数getMsdIdx(int num)将返回最高位(非零)的下标。函数getMsd(int num)将返回最高位。函数removeMSD(int num)将从一个数字中删除最有效的数字并返回该数字(如果删除最有效的数字后什么都没有留下，则返回null)。

完成后，剩下的就是遍历mynums以获取前100位数字。这大概是这样的:

int[] nums = int[100];
int idx = 0;
for(int i = 7; i >= 0; i--){
    int timesAdded = 0;
    for(int j = 16; j >=0 && timesAdded < 100; j--){
        for(int k = mynums[i][j]; k > 0; k--){
            nums[idx] += j;
            timesAdded++;
            idx++;
        }
    }
}

我需要注意的是，尽管上面的图看起来时间复杂度很高，但实际上它只有O(7*100)左右。

快速解释一下这是为了做什么: 从本质上讲，这个系统试图基于数字中数字的索引和数字的值来使用2d数组中的每个数字。它使用这些值作为索引来跟踪数组中插入了多少数值。当达到100时，它会关闭所有“较低的分支”。

这个算法的时间大概是O(十亿*log(16)*7)+O(100)。我可能是错的。此外，这很可能需要调试，因为它有点复杂，我只是把它写在我的头上。

编辑:没有解释的反对票是没有帮助的。如果你认为这个答案不正确，请留下评论。我很确定，StackOverflow甚至告诉你这样做，当你向下投票。

2013-10-08 23:53:16

编写一个程序，从一个包含10亿个数字的数组中找出100个最大的数字

推荐文章

最新文章

标签