有没有什么情况下你更喜欢O(log n)时间复杂度而不是O(1)时间复杂度?还是O(n)到O(log n)
你能举个例子吗?
有没有什么情况下你更喜欢O(log n)时间复杂度而不是O(1)时间复杂度?还是O(n)到O(log n)
你能举个例子吗?
当前回答
假设您正在嵌入式系统上实现一个黑名单,其中0到1,000,000之间的数字可能被列入黑名单。这就给你留下了两个选择:
使用1,000,000位的bitset 使用黑名单整数的排序数组,并使用二进制搜索来访问它们
对bitset的访问将保证常量访问。从时间复杂度来看,它是最优的。从理论和实践的角度来看(它是O(1),常量开销极低)。
不过,你可能更喜欢第二种解决方案。特别是如果您希望黑名单整数的数量非常小,因为这样内存效率更高。
即使您不为内存稀缺的嵌入式系统开发,我也可以将任意限制从1,000,000增加到1,000,000,000,000,并提出相同的论点。那么bitset将需要大约125G的内存。保证最坏情况复杂度为O(1)可能无法说服您的老板为您提供如此强大的服务器。
在这里,我强烈倾向于二叉搜索(O(log n))或二叉树(O(log n))而不是O(1)位集。在实践中,最坏情况复杂度为O(n)的哈希表可能会击败所有这些算法。
其他回答
有很多很好的答案,其中一些提到了常量因素,输入大小和内存限制,以及许多其他原因,复杂性只是一个理论指导原则,而不是最终决定现实世界是否适合给定的目的或速度。
这里有一个简单而具体的例子来说明这些想法。假设我们想要找出一个数组是否有重复的元素。简单的二次型方法是编写一个嵌套循环:
const hasDuplicate = arr => { 对于(设I = 0;I < arrr .length;我+ +){ For(令j = I + 1;J < arrr .length;j + +) { If (arr[i] === arr[j]) { 返回true; } } } 返回错误; }; console.log(hasDuplicate([1,2,3,4])); console.log(hasDuplicate([1,2,4,4]));
但这可以通过创建一组数据结构(即删除重复项),然后将其大小与数组的长度进行比较,在线性时间内完成:
const hasDuplicate = arr => new Set(arr)。== arrr .length; console.log(hasDuplicate([1,2,3,4])); console.log(hasDuplicate([1,2,4,4]));
大O告诉我们,从时间复杂性的角度来看,新的Set方法将更好地扩展。
然而,事实证明,“天真的”二次元方法有很多大O不能解释的:
没有额外的内存占用 没有堆内存分配(没有新的) 临时Set没有垃圾收集 早期的救助;在已知副本可能位于数组前面的情况下,不需要检查多个元素。
如果我们的用例是在有限的小数组上,我们有一个资源受限的环境和/或其他已知的常见情况属性,允许我们通过基准测试建立嵌套循环在特定工作负载上更快,这可能是一个好主意。
另一方面,也许可以预先创建一次集合并重复使用,在所有查找中摊销其开销成本。
这不可避免地导致可维护性/可读性/优雅性和其他“软”成本。在这种情况下,新的Set()方法可能更具可读性,但通常(如果不是更多的话)要获得更好的复杂性需要付出巨大的工程成本。
创建和维护持久的、有状态的Set结构可能会带来bug、内存/缓存压力、代码复杂性和所有其他设计权衡方式。最优地协商这些权衡是软件工程的一个重要部分,而时间复杂性只是帮助指导这个过程的一个因素。
我还没有看到其他一些例子:
In real-time environments, for example resource-constrained embedded systems, sometimes complexity sacrifices are made (typically related to caches and memory or scheduling) to avoid incurring occasional worst-case penalties that can't be tolerated because they might cause jitter. Also in embedded programming, the size of the code itself can cause cache pressure, impacting memory performance. If an algorithm has worse complexity but will result in massive code size savings, that might be a reason to choose it over an algorithm that's theoretically better. In most implementations of recursive linearithmic algorithms like quicksort, when the array is small enough, a quadratic sorting algorithm like insertion sort is often called because the overhead of recursive function calls on increasingly tiny arrays tends to outweigh the cost of nested loops. Insertion sort is also fast on mostly-sorted arrays as the inner loop won't run much. This answer discusses this in an older version of Chrome's V8 engine before they moved to Timsort.
选择大O复杂度高的算法而不是大O复杂度低的算法的原因有很多:
most of the time, lower big-O complexity is harder to achieve and requires skilled implementation, a lot of knowledge and a lot of testing. big-O hides the details about a constant: algorithm that performs in 10^5 is better from big-O point of view than 1/10^5 * log(n) (O(1) vs O(log(n)), but for most reasonable n the first one will perform better. For example the best complexity for matrix multiplication is O(n^2.373) but the constant is so high that no (to my knowledge) computational libraries use it. big-O makes sense when you calculate over something big. If you need to sort array of three numbers, it matters really little whether you use O(n*log(n)) or O(n^2) algorithm. sometimes the advantage of the lowercase time complexity can be really negligible. For example there is a data structure tango tree which gives a O(log log N) time complexity to find an item, but there is also a binary tree which finds the same in O(log n). Even for huge numbers of n = 10^20 the difference is negligible. time complexity is not everything. Imagine an algorithm that runs in O(n^2) and requires O(n^2) memory. It might be preferable over O(n^3) time and O(1) space when the n is not really big. The problem is that you can wait for a long time, but highly doubt you can find a RAM big enough to use it with your algorithm parallelization is a good feature in our distributed world. There are algorithms that are easily parallelizable, and there are some that do not parallelize at all. Sometimes it makes sense to run an algorithm on 1000 commodity machines with a higher complexity than using one machine with a slightly better complexity. in some places (security) a complexity can be a requirement. No one wants to have a hash algorithm that can hash blazingly fast (because then other people can bruteforce you way faster) although this is not related to switch of complexity, but some of the security functions should be written in a manner to prevent timing attack. They mostly stay in the same complexity class, but are modified in a way that it always takes worse case to do something. One example is comparing that strings are equal. In most applications it makes sense to break fast if the first bytes are different, but in security you will still wait for the very end to tell the bad news. somebody patented the lower-complexity algorithm and it is more economical for a company to use higher complexity than to pay money. some algorithms adapt well to particular situations. Insertion sort, for example, has an average time-complexity of O(n^2), worse than quicksort or mergesort, but as an online algorithm it can efficiently sort a list of values as they are received (as user input) where most other algorithms can only efficiently operate on a complete list of values.
A more general question is if there are situations where one would prefer an O(f(n)) algorithm to an O(g(n)) algorithm even though g(n) << f(n) as n tends to infinity. As others have already mentioned, the answer is clearly "yes" in the case where f(n) = log(n) and g(n) = 1. It is sometimes yes even in the case that f(n) is polynomial but g(n) is exponential. A famous and important example is that of the Simplex Algorithm for solving linear programming problems. In the 1970s it was shown to be O(2^n). Thus, its worse-case behavior is infeasible. But -- its average case behavior is extremely good, even for practical problems with tens of thousands of variables and constraints. In the 1980s, polynomial time algorithms (such a Karmarkar's interior-point algorithm) for linear programming were discovered, but 30 years later the simplex algorithm still seems to be the algorithm of choice (except for certain very large problems). This is for the obvious reason that average-case behavior is often more important than worse-case behavior, but also for a more subtle reason that the simplex algorithm is in some sense more informative (e.g. sensitivity information is easier to extract).
当n很小时,O(1)总是很慢。
给已经好的答案锦上添花。一个实际的例子是postgres数据库中的哈希索引和b树索引。
哈希索引形成一个哈希表索引来访问磁盘上的数据,而btree顾名思义使用的是btree数据结构。
大O时间是O(1) vs O(logN)
目前不鼓励在postgres中使用哈希索引,因为在现实生活中,特别是在数据库系统中,实现无冲突的哈希是非常困难的(可能导致O(N)最坏情况的复杂性),正因为如此,使它们具有崩溃安全性就更加困难了(在postgres中称为提前写日志- WAL)。
在这种情况下进行这种权衡,因为O(logN)对于索引来说已经足够好了,而实现O(1)非常困难,而且时间差并不重要。