我最近遇到了一种被称为跳跃表的数据结构。它看起来和二叉搜索树有着非常相似的行为。
为什么要在二叉搜索树上使用跳跃表呢?
我最近遇到了一种被称为跳跃表的数据结构。它看起来和二叉搜索树有着非常相似的行为。
为什么要在二叉搜索树上使用跳跃表呢?
当前回答
从你引用的维基百科文章中:
Θ(n) operations, which force us to visit every node in ascending order (such as printing the entire list) provide the opportunity to perform a behind-the-scenes derandomization of the level structure of the skip-list in an optimal way, bringing the skip list to O(log n) search time. [...] A skip list, upon which we have not recently performed [any such] Θ(n) operations, does not provide the same absolute worst-case performance guarantees as more traditional balanced tree data structures, because it is always possible (though with very low probability) that the coin-flips used to build the skip list will produce a badly balanced structure
编辑:所以这是一种权衡:跳过列表使用更少的内存,但风险是它们可能退化为不平衡的树。
其他回答
从你引用的维基百科文章中:
Θ(n) operations, which force us to visit every node in ascending order (such as printing the entire list) provide the opportunity to perform a behind-the-scenes derandomization of the level structure of the skip-list in an optimal way, bringing the skip list to O(log n) search time. [...] A skip list, upon which we have not recently performed [any such] Θ(n) operations, does not provide the same absolute worst-case performance guarantees as more traditional balanced tree data structures, because it is always possible (though with very low probability) that the coin-flips used to build the skip list will produce a badly balanced structure
编辑:所以这是一种权衡:跳过列表使用更少的内存,但风险是它们可能退化为不平衡的树。
此外,除了给出的答案(易于实现,与平衡树的性能相结合)。我发现实现顺序遍历(向前和向后)要简单得多,因为跳过列表在其实现中实际上有一个链表。
跳过列表确实具有锁剥离的优势。但是,最短时间取决于新节点的级别如何确定。这通常是使用Random()完成的。在56000个单词的字典上,跳跃表比展开树花费更多的时间,而树比哈希表花费更多的时间。前两个不能匹配哈希表的运行时。此外,哈希表的数组也可以以并发的方式进行锁剥离。
当需要引用的局部性时,使用跳过列表和类似的有序列表。例如:在应用程序中查找日期前后的航班。
内存中二叉搜索展开树非常好,使用频率也更高。
跳过列表Vs展开树Vs哈希表运行时的字典查找op
跳过列表是使用列表实现的。
对于单链表和双链表存在无锁解决方案,但是对于任何O(logn)数据结构,没有直接只使用CAS的无锁解决方案。
但是,您可以使用基于CAS的列表来创建跳跃列表。
(请注意,使用CAS创建的MCAS允许任意数据结构,并且使用MCAS创建了概念证明红黑树)。
所以,尽管它们很奇怪,但它们却非常有用:-)
首先,你不能公平地比较一个随机数据结构和一个给你最坏情况保证的数据结构。
跳跃表等价于随机平衡的二叉搜索树(RBST),在Dean和Jones的“探索跳跃表和二叉搜索树之间的对偶性”中有更详细的解释。
另一方面,你也可以有确定性跳跃表来保证最坏情况下的性能,参见Munro等人。
Contra to what some claim above, you can have implementations of binary search trees (BST) that work well in concurrent programming. A potential problem with the concurrency-focused BSTs is that you can't easily get the same had guarantees about balancing as you would from a red-black (RB) tree. (But "standard", i.e. randomzided, skip lists don't give you these guarantees either.) There's a trade-off between maintaining balancing at all times and good (and easy to program) concurrent access, so relaxed RB trees are usually used when good concurrency is desired. The relaxation consists in not re-balancing the tree right away. For a somewhat dated (1998) survey see Hanke's ''The Performance of Concurrent Red-Black Tree Algorithms'' [ps.gz].
最近的改进之一是所谓的彩色树(基本上你有一些权重,这样黑色将是1,红色将是0,但你也允许值介于两者之间)。半音树如何对抗跳跃表?让我们看看布朗等人。《A General Technique for Non-blocking Trees》(2014)不得不说:
对于128个线程,我们的算法优于Java的非阻塞skiplist Bronson等人的基于锁的AVL树增长了63%至224%,使用软件事务内存(STM)的RBT增长了13至134倍
EDIT to add: Pugh's lock-based skip list, which was benchmarked in Fraser and Harris (2007) "Concurrent Programming Without Lock" as coming close to their own lock-free version (a point amply insisted upon in the top answer here), is also tweaked for good concurrent operation, cf. Pugh's "Concurrent Maintenance of Skip Lists", although in a rather mild way. Nevertheless one newer/2009 paper "A Simple Optimistic skip-list Algorithm" by Herlihy et al., which proposes a supposedly simpler (than Pugh's) lock-based implementation of concurrent skip lists, criticized Pugh for not providing a proof of correctness convincing enough for them. Leaving aside this (maybe too pedantic) qualm, Herlihy et al. show that their simpler lock-based implementation of a skip list actually fails to scale as well as the JDK's lock-free implementation thereof, but only for high contention (50% inserts, 50% deletes and 0% lookups)... which Fraser and Harris didn't test at all; Fraser and Harris only tested 75% lookups, 12.5% inserts and 12.5% deletes (on skip list with ~500K elements). The simpler implementation of Herlihy et al. also comes close to the lock-free solution from the JDK in the case of low contention that they tested (70% lookups, 20% inserts, 10% deletes); they actually beat the lock-free solution for this scenario when they made their skip list big enough, i.e. going from 200K to 2M elements, so that the probability of contention on any lock became negligible. It would have been nice if Herlihy et al. had gotten over their hangup over Pugh's proof and tested his implementation too, but alas they didn't do that.
EDIT2:我找到了一个(2015年出版的)所有基准的母矿:Gramoli的“比你想知道的更多的同步。同步工作台,测量同步对并发算法的影响”:这是一个与这个问题相关的摘录图片。
"Algo.4" is a precursor (older, 2011 version) of Brown et al.'s mentioned above. (I don't know how much better or worse the 2014 version is). "Algo.26" is Herlihy's mentioned above; as you can see it gets trashed on updates, and much worse on the Intel CPUs used here than on the Sun CPUs from the original paper. "Algo.28" is ConcurrentSkipListMap from the JDK; it doesn't do as well as one might have hoped compared to other CAS-based skip list implementations. The winners under high-contention are "Algo.2" a lock-based algorithm (!!) described by Crain et al. in "A Contention-Friendly Binary Search Tree" and "Algo.30" is the "rotating skiplist" from "Logarithmic data structures for multicores". "Algo.29" is the "No hot spot non-blocking skip list". Be advised that Gramoli is a co-author to all three of these winner-algorithm papers. "Algo.27" is the C++ implementation of Fraser's skip list.
Gramoli的结论是,搞砸一个基于cas的并发树实现要比搞砸一个类似的跳跃列表容易得多。根据这些数据,很难有异议。他对这个事实的解释是:
The difficulty in designing a tree that is lock-free stems from the difficulty of modifying multiple references atomically. Skip lists consist of towers linked to each other through successor pointers and in which each node points to the node immediately below it. They are often considered similar to trees because each node has a successor in the successor tower and below it, however, a major distinction is that the downward pointer is generally immutable hence simplifying the atomic modification of a node. This distinction is probably the reason why skip lists outperform trees under heavy contention as observed in Figure [above].
在Brown等人最近的工作中,克服这一困难是一个关键问题。 他们有一篇完整的(2013年)论文“非阻塞数据结构的语用原语” 构建多记录LL/SC复合“原语”,他们称之为LLX/SCX,本身使用(机器级)CAS实现。Brown等人在2014年(而不是2011年)并发树实现中使用了LLX/SCX构建块。
I think it's perhaps also worth summarizing here the fundamental ideas of the "no hot spot"/contention-friendly (CF) skip list. It addapts an essential idea from the relaxed RB trees (and similar concrrency friedly data structures): the towers are no longer built up immediately upon insertion, but delayed until there's less contention. Conversely, the deletion of a tall tower can create many contentions; this was observed as far back as Pugh's 1990 concurrent skip-list paper, which is why Pugh introduced pointer reversal on deletion (a tidbit that Wikipedia's page on skip lists still doesn't mention to this day, alas). The CF skip list takes this a step further and delays deleting the upper levels of a tall tower. Both kinds of delayed operations in CF skip lists are carried out by a (CAS based) separate garbage-collector-like thread, which its authors call the "adapting thread".
Synchrobench代码(包括所有测试的算法)可在https://github.com/gramoli/synchrobench上获得。 最新的Brown et al实现(不包括在上面)可以在http://www.cs.toronto.edu/~tabrown/chromatic/ConcurrentChromaticTreeMap.java上获得,有人有32+核的机器可用吗?我的观点是你们可以自己运行这些。