我最近遇到了一种被称为跳跃表的数据结构。它看起来和二叉搜索树有着非常相似的行为。
为什么要在二叉搜索树上使用跳跃表呢?
我最近遇到了一种被称为跳跃表的数据结构。它看起来和二叉搜索树有着非常相似的行为。
为什么要在二叉搜索树上使用跳跃表呢?
从你引用的维基百科文章中:
Θ(n) operations, which force us to visit every node in ascending order (such as printing the entire list) provide the opportunity to perform a behind-the-scenes derandomization of the level structure of the skip-list in an optimal way, bringing the skip list to O(log n) search time. [...] A skip list, upon which we have not recently performed [any such] Θ(n) operations, does not provide the same absolute worst-case performance guarantees as more traditional balanced tree data structures, because it is always possible (though with very low probability) that the coin-flips used to build the skip list will produce a badly balanced structure
编辑:所以这是一种权衡:跳过列表使用更少的内存,但风险是它们可能退化为不平衡的树。
跳过列表更适合并发访问/修改。Herb Sutter写了一篇关于并发环境中的数据结构的文章。它有更深入的信息。
二叉搜索树最常用的实现是红黑树。同时出现的问题是当树被修改时,它经常需要重新平衡。重新平衡操作可能会影响树的大部分,这将需要在许多树节点上使用互斥锁。在跳跃列表中插入一个节点要本地化得多,只有直接链接到受影响节点的节点才需要被锁定。
Jon Harrops的评论更新
我读了弗雷泽和哈里斯的最新论文《无锁并发编程》。如果你对无锁数据结构感兴趣,这是很好的东西。本文主要研究事务性内存和多字比较交换MCAS的理论操作。这两种方法都是在软件中模拟的,因为目前还没有硬件支持它们。他们能够在软件中构建MCAS,这让我印象深刻。
我没有发现事务性内存的东西特别引人注目,因为它需要一个垃圾收集器。此外,软件事务内存也受到性能问题的困扰。然而,如果硬件事务内存变得普遍起来,我会非常兴奋。最后,它仍然处于研究阶段,在未来十年左右的时间里不会用于生产代码。
在8.2节中,他们比较了几种并发树实现的性能。我将总结他们的发现。值得下载pdf,因为它在第50、53和54页有一些非常有用的图表。
Locking skip lists is insanely fast. They scale incredibly well with the number of concurrent accesses. This is what makes skip lists special, other lock based data structures tend to croak under pressure. Lock-free skip lists are consistently faster than locking skip lists but only barely. transactional skip lists are consistently 2-3 times slower than the locking and non-locking versions. locking red-black trees croak under concurrent access. Their performance degrades linearly with each new concurrent user. Of the two known locking red-black tree implementations, one essentially has a global lock during tree rebalancing. The other uses fancy (and complicated) lock escalation but still doesn't significantly outperform the global lock version. lock-free red-black trees don't exist (no longer true, see Update). transactional red-black trees are comparable with transactional skip-lists. That was very surprising and very promising. Transactional memory, though slower if far easier to write. It can be as easy as quick search and replace on the non-concurrent version.
更新 这是一篇关于无锁树的论文:使用CAS的无锁红黑树。 我还没有深入研究,但从表面上看,它似乎是可靠的。
在实践中,我发现在我的项目中,b -树的性能比跳过列表要好。跳跃表似乎更容易理解,但实现b -树并不难。
我所知道的一个优点是,一些聪明的人已经想出了如何实现只使用原子操作的无锁并发跳过列表。例如,Java 6包含ConcurrentSkipListMap类,如果您不喜欢的话,可以读取它的源代码。
但是写一个并发b树的变体也不是很难——我看到别人做过——如果你在沿着树向下走的时候先发制人地拆分和合并节点,“以防万一”,那么你就不必担心死锁,而且一次只需要在树的两个层次上持有一个锁。同步开销会稍微高一些,但是b树可能更快。
跳过列表是使用列表实现的。
对于单链表和双链表存在无锁解决方案,但是对于任何O(logn)数据结构,没有直接只使用CAS的无锁解决方案。
但是,您可以使用基于CAS的列表来创建跳跃列表。
(请注意,使用CAS创建的MCAS允许任意数据结构,并且使用MCAS创建了概念证明红黑树)。
所以,尽管它们很奇怪,但它们却非常有用:-)
跳过列表确实具有锁剥离的优势。但是,最短时间取决于新节点的级别如何确定。这通常是使用Random()完成的。在56000个单词的字典上,跳跃表比展开树花费更多的时间,而树比哈希表花费更多的时间。前两个不能匹配哈希表的运行时。此外,哈希表的数组也可以以并发的方式进行锁剥离。
当需要引用的局部性时,使用跳过列表和类似的有序列表。例如:在应用程序中查找日期前后的航班。
内存中二叉搜索展开树非常好,使用频率也更高。
跳过列表Vs展开树Vs哈希表运行时的字典查找op
首先,你不能公平地比较一个随机数据结构和一个给你最坏情况保证的数据结构。
跳跃表等价于随机平衡的二叉搜索树(RBST),在Dean和Jones的“探索跳跃表和二叉搜索树之间的对偶性”中有更详细的解释。
另一方面,你也可以有确定性跳跃表来保证最坏情况下的性能,参见Munro等人。
Contra to what some claim above, you can have implementations of binary search trees (BST) that work well in concurrent programming. A potential problem with the concurrency-focused BSTs is that you can't easily get the same had guarantees about balancing as you would from a red-black (RB) tree. (But "standard", i.e. randomzided, skip lists don't give you these guarantees either.) There's a trade-off between maintaining balancing at all times and good (and easy to program) concurrent access, so relaxed RB trees are usually used when good concurrency is desired. The relaxation consists in not re-balancing the tree right away. For a somewhat dated (1998) survey see Hanke's ''The Performance of Concurrent Red-Black Tree Algorithms'' [ps.gz].
最近的改进之一是所谓的彩色树(基本上你有一些权重,这样黑色将是1,红色将是0,但你也允许值介于两者之间)。半音树如何对抗跳跃表?让我们看看布朗等人。《A General Technique for Non-blocking Trees》(2014)不得不说:
对于128个线程,我们的算法优于Java的非阻塞skiplist Bronson等人的基于锁的AVL树增长了63%至224%,使用软件事务内存(STM)的RBT增长了13至134倍
EDIT to add: Pugh's lock-based skip list, which was benchmarked in Fraser and Harris (2007) "Concurrent Programming Without Lock" as coming close to their own lock-free version (a point amply insisted upon in the top answer here), is also tweaked for good concurrent operation, cf. Pugh's "Concurrent Maintenance of Skip Lists", although in a rather mild way. Nevertheless one newer/2009 paper "A Simple Optimistic skip-list Algorithm" by Herlihy et al., which proposes a supposedly simpler (than Pugh's) lock-based implementation of concurrent skip lists, criticized Pugh for not providing a proof of correctness convincing enough for them. Leaving aside this (maybe too pedantic) qualm, Herlihy et al. show that their simpler lock-based implementation of a skip list actually fails to scale as well as the JDK's lock-free implementation thereof, but only for high contention (50% inserts, 50% deletes and 0% lookups)... which Fraser and Harris didn't test at all; Fraser and Harris only tested 75% lookups, 12.5% inserts and 12.5% deletes (on skip list with ~500K elements). The simpler implementation of Herlihy et al. also comes close to the lock-free solution from the JDK in the case of low contention that they tested (70% lookups, 20% inserts, 10% deletes); they actually beat the lock-free solution for this scenario when they made their skip list big enough, i.e. going from 200K to 2M elements, so that the probability of contention on any lock became negligible. It would have been nice if Herlihy et al. had gotten over their hangup over Pugh's proof and tested his implementation too, but alas they didn't do that.
EDIT2:我找到了一个(2015年出版的)所有基准的母矿:Gramoli的“比你想知道的更多的同步。同步工作台,测量同步对并发算法的影响”:这是一个与这个问题相关的摘录图片。
"Algo.4" is a precursor (older, 2011 version) of Brown et al.'s mentioned above. (I don't know how much better or worse the 2014 version is). "Algo.26" is Herlihy's mentioned above; as you can see it gets trashed on updates, and much worse on the Intel CPUs used here than on the Sun CPUs from the original paper. "Algo.28" is ConcurrentSkipListMap from the JDK; it doesn't do as well as one might have hoped compared to other CAS-based skip list implementations. The winners under high-contention are "Algo.2" a lock-based algorithm (!!) described by Crain et al. in "A Contention-Friendly Binary Search Tree" and "Algo.30" is the "rotating skiplist" from "Logarithmic data structures for multicores". "Algo.29" is the "No hot spot non-blocking skip list". Be advised that Gramoli is a co-author to all three of these winner-algorithm papers. "Algo.27" is the C++ implementation of Fraser's skip list.
Gramoli的结论是,搞砸一个基于cas的并发树实现要比搞砸一个类似的跳跃列表容易得多。根据这些数据,很难有异议。他对这个事实的解释是:
The difficulty in designing a tree that is lock-free stems from the difficulty of modifying multiple references atomically. Skip lists consist of towers linked to each other through successor pointers and in which each node points to the node immediately below it. They are often considered similar to trees because each node has a successor in the successor tower and below it, however, a major distinction is that the downward pointer is generally immutable hence simplifying the atomic modification of a node. This distinction is probably the reason why skip lists outperform trees under heavy contention as observed in Figure [above].
在Brown等人最近的工作中,克服这一困难是一个关键问题。 他们有一篇完整的(2013年)论文“非阻塞数据结构的语用原语” 构建多记录LL/SC复合“原语”,他们称之为LLX/SCX,本身使用(机器级)CAS实现。Brown等人在2014年(而不是2011年)并发树实现中使用了LLX/SCX构建块。
I think it's perhaps also worth summarizing here the fundamental ideas of the "no hot spot"/contention-friendly (CF) skip list. It addapts an essential idea from the relaxed RB trees (and similar concrrency friedly data structures): the towers are no longer built up immediately upon insertion, but delayed until there's less contention. Conversely, the deletion of a tall tower can create many contentions; this was observed as far back as Pugh's 1990 concurrent skip-list paper, which is why Pugh introduced pointer reversal on deletion (a tidbit that Wikipedia's page on skip lists still doesn't mention to this day, alas). The CF skip list takes this a step further and delays deleting the upper levels of a tall tower. Both kinds of delayed operations in CF skip lists are carried out by a (CAS based) separate garbage-collector-like thread, which its authors call the "adapting thread".
Synchrobench代码(包括所有测试的算法)可在https://github.com/gramoli/synchrobench上获得。 最新的Brown et al实现(不包括在上面)可以在http://www.cs.toronto.edu/~tabrown/chromatic/ConcurrentChromaticTreeMap.java上获得,有人有32+核的机器可用吗?我的观点是你们可以自己运行这些。