有没有什么情况下你更喜欢O(log n)时间复杂度而不是O(1)时间复杂度?还是O(n)到O(log n)

你能举个例子吗?


当前回答

假设您正在嵌入式系统上实现一个黑名单,其中0到1,000,000之间的数字可能被列入黑名单。这就给你留下了两个选择:

使用1,000,000位的bitset 使用黑名单整数的排序数组,并使用二进制搜索来访问它们

对bitset的访问将保证常量访问。从时间复杂度来看,它是最优的。从理论和实践的角度来看(它是O(1),常量开销极低)。

不过,你可能更喜欢第二种解决方案。特别是如果您希望黑名单整数的数量非常小,因为这样内存效率更高。

即使您不为内存稀缺的嵌入式系统开发,我也可以将任意限制从1,000,000增加到1,000,000,000,000,并提出相同的论点。那么bitset将需要大约125G的内存。保证最坏情况复杂度为O(1)可能无法说服您的老板为您提供如此强大的服务器。

在这里,我强烈倾向于二叉搜索(O(log n))或二叉树(O(log n))而不是O(1)位集。在实践中,最坏情况复杂度为O(n)的哈希表可能会击败所有这些算法。

其他回答

我在这里的回答是,在随机矩阵的所有行的快速随机加权选择是一个例子,当m不是太大时,复杂度为O(m)的算法比复杂度为O(log(m))的算法更快。

我很惊讶没有人提到内存绑定应用程序。

可能存在一种算法具有较少的浮点运算,这要么是因为它的复杂性(即O(1) < O(log n)),要么是因为复杂度前面的常数更小(即2n2 < 6n2)。无论如何,如果较低的FLOP算法的内存限制更大,您可能仍然更喜欢具有更多FLOP的算法。

我所说的“内存受限”是指您经常访问的数据经常超出缓存。为了获取这些数据,在对其执行操作之前,必须将内存从实际内存空间拉到缓存中。这个抓取步骤通常非常慢——比您的操作本身慢得多。

因此,如果你的算法需要更多的操作(但这些操作是在已经在缓存中的数据上执行的[因此不需要读取]),它仍然会在实际的walltime方面以更少的操作(必须在缓存外的数据上执行[因此需要读取])胜过你的算法。

对于安全应用程序来说,这经常是这样的情况,我们希望设计算法缓慢的问题,以阻止某人过快地获得问题的答案。

这里有几个我能想到的例子。

Password hashing is sometimes made arbitrarily slow in order to make it harder to guess passwords by brute-force. This Information Security post has a bullet point about it (and much more). Bit Coin uses a controllably slow problem for a network of computers to solve in order to "mine" coins. This allows the currency to be mined at a controlled rate by the collective system. Asymmetric ciphers (like RSA) are designed to make decryption without the keys intentionally slow in order to prevent someone else without the private key to crack the encryption. The algorithms are designed to be cracked in hopefully O(2^n) time where n is the bit-length of the key (this is brute force).

在CS的其他地方,快速排序在最坏的情况下是O(n²),但在一般情况下是O(n*log(n))。因此,在分析算法效率时,“大O”分析有时并不是您唯一关心的事情。

选择大O复杂度高的算法而不是大O复杂度低的算法的原因有很多:

most of the time, lower big-O complexity is harder to achieve and requires skilled implementation, a lot of knowledge and a lot of testing. big-O hides the details about a constant: algorithm that performs in 10^5 is better from big-O point of view than 1/10^5 * log(n) (O(1) vs O(log(n)), but for most reasonable n the first one will perform better. For example the best complexity for matrix multiplication is O(n^2.373) but the constant is so high that no (to my knowledge) computational libraries use it. big-O makes sense when you calculate over something big. If you need to sort array of three numbers, it matters really little whether you use O(n*log(n)) or O(n^2) algorithm. sometimes the advantage of the lowercase time complexity can be really negligible. For example there is a data structure tango tree which gives a O(log log N) time complexity to find an item, but there is also a binary tree which finds the same in O(log n). Even for huge numbers of n = 10^20 the difference is negligible. time complexity is not everything. Imagine an algorithm that runs in O(n^2) and requires O(n^2) memory. It might be preferable over O(n^3) time and O(1) space when the n is not really big. The problem is that you can wait for a long time, but highly doubt you can find a RAM big enough to use it with your algorithm parallelization is a good feature in our distributed world. There are algorithms that are easily parallelizable, and there are some that do not parallelize at all. Sometimes it makes sense to run an algorithm on 1000 commodity machines with a higher complexity than using one machine with a slightly better complexity. in some places (security) a complexity can be a requirement. No one wants to have a hash algorithm that can hash blazingly fast (because then other people can bruteforce you way faster) although this is not related to switch of complexity, but some of the security functions should be written in a manner to prevent timing attack. They mostly stay in the same complexity class, but are modified in a way that it always takes worse case to do something. One example is comparing that strings are equal. In most applications it makes sense to break fast if the first bytes are different, but in security you will still wait for the very end to tell the bad news. somebody patented the lower-complexity algorithm and it is more economical for a company to use higher complexity than to pay money. some algorithms adapt well to particular situations. Insertion sort, for example, has an average time-complexity of O(n^2), worse than quicksort or mergesort, but as an online algorithm it can efficiently sort a list of values as they are received (as user input) where most other algorithms can only efficiently operate on a complete list of values.

Yes.

在实际情况下,我们运行了一些使用短字符串和长字符串键进行表查找的测试。

我们使用了std::map, std::unordered_map和一个哈希,该哈希最多对字符串长度进行10次采样(我们的键倾向于guidlike,所以这是体面的),以及一个哈希,对每个字符进行采样(理论上减少了冲突),一个未排序的向量,其中我们进行==比较,以及(如果我没记错的话)一个未排序的向量,其中我们还存储了一个哈希,首先比较哈希,然后比较字符。

这些算法的范围从O(1) (unordered_map)到O(n)(线性搜索)。

对于中等大小的N,通常O(N)优于O(1)。我们怀疑这是因为基于节点的容器需要我们的计算机在内存中跳跃更多,而基于线性的容器则不需要。

O(lgn)存在于两者之间。我不记得是怎么回事了。

性能差异并不大,在更大的数据集上,基于哈希的表现要好得多。所以我们坚持使用基于哈希的无序映射。

实际上,对于合理的n大小,O(lgn)为O(1)。如果你的计算机在你的表中只有40亿的空间,那么O(lgn)的上界是32。(lg(2^32)=32)(在计算机科学中,lg是log based 2的简称)。

在实践中,lg(n)算法比O(1)算法慢,不是因为对数增长因子,而是因为lg(n)部分通常意味着算法有一定程度的复杂性,并且这种复杂性比lg(n)项中的任何“增长”都增加了更大的常数因子。

然而,复杂的O(1)算法(如哈希映射)很容易具有类似或更大的常数因子。