我认为两者都在做同样的工作,您如何决定使用哪一个进行同步呢?
这个理论
In theory, when a thread tries to lock a mutex and it does not succeed, because the mutex is already locked, it will go to sleep, immediately allowing another thread to run. It will continue to sleep until being woken up, which will be the case once the mutex is being unlocked by whatever thread was holding the lock before. When a thread tries to lock a spinlock and it does not succeed, it will continuously re-try locking it, until it finally succeeds; thus it will not allow another thread to take its place (however, the operating system will forcefully switch to another thread, once the CPU runtime quantum of the current thread has been exceeded, of course).
这个问题
The problem with mutexes is that putting threads to sleep and waking them up again are both rather expensive operations, they'll need quite a lot of CPU instructions and thus also take some time. If now the mutex was only locked for a very short amount of time, the time spent in putting a thread to sleep and waking it up again might exceed the time the thread has actually slept by far and it might even exceed the time the thread would have wasted by constantly polling on a spinlock. On the other hand, polling on a spinlock will constantly waste CPU time and if the lock is held for a longer amount of time, this will waste a lot more CPU time and it would have been much better if the thread was sleeping instead.
解决方案
Using spinlocks on a single-core/single-CPU system makes usually no sense, since as long as the spinlock polling is blocking the only available CPU core, no other thread can run and since no other thread can run, the lock won't be unlocked either. IOW, a spinlock wastes only CPU time on those systems for no real benefit. If the thread was put to sleep instead, another thread could have ran at once, possibly unlocking the lock and then allowing the first thread to continue processing, once it woke up again.
在多核/多cpu系统上,如果大量的锁只被持有很短的时间,那么不断地让线程进入睡眠状态并再次唤醒它们所浪费的时间可能会显著降低运行时性能。当使用自旋锁时,线程有机会利用它们的全部运行时量(总是只阻塞很短的一段时间,但随后立即继续它们的工作),从而获得更高的处理吞吐量。
这种做法
由于程序员经常无法事先知道互斥锁或自旋锁哪个更好(例如,因为目标架构的CPU核数是未知的),操作系统也无法知道某段代码是针对单核环境还是多核环境进行了优化,大多数系统并没有严格区分互斥锁和自旋锁。事实上,大多数现代操作系统都有混合互斥锁和混合自旋锁。这到底是什么意思呢?
A hybrid mutex behaves like a spinlock at first on a multi-core system. If a thread cannot lock the mutex, it won't be put to sleep immediately, since the mutex might get unlocked pretty soon, so instead the mutex will first behave exactly like a spinlock. Only if the lock has still not been obtained after a certain amount of time (or retries or any other measuring factor), the thread is really put to sleep. If the same code runs on a system with only a single core, the mutex will not spinlock, though, as, see above, that would not be beneficial.
A hybrid spinlock behaves like a normal spinlock at first, but to avoid wasting too much CPU time, it may have a back-off strategy. It will usually not put the thread to sleep (since you don't want that to happen when using a spinlock), but it may decide to stop the thread (either immediately or after a certain amount of time; this is called "yielding") and allow another thread to run, thus increasing chances that the spinlock is unlocked (you still have the costs of a thread switch but not the costs of putting a thread to sleep and waking it up again).
总结
If in doubt, use mutexes, they are usually the better choice and most modern systems will allow them to spinlock for a very short amount of time, if this seems beneficial. Using spinlocks can sometimes improve performance, but only under certain conditions and the fact that you are in doubt rather tells me, that you are not working on any project currently where a spinlock might be beneficial. You might consider using your own "lock object", that can either use a spinlock or a mutex internally (e.g. this behavior could be configurable when creating such an object), initially use mutexes everywhere and if you think that using a spinlock somewhere might really help, give it a try and compare the results (e.g. using a profiler), but be sure to test both cases, a single-core and a multi-core system before you jump to conclusions (and possibly different operating systems, if your code will be cross-platform).
更新:iOS警告
Actually not iOS specific but iOS is the platform where most developers may face that problem: If your system has a thread scheduler, that does not guarantee that any thread, no matter how low its priority may be, will eventually get a chance to run, then spinlocks can lead to permanent deadlocks. The iOS scheduler distinguishes different classes of threads and threads on a lower class will only run if no thread in a higher class wants to run as well. There is no back-off strategy for this, so if you permanently have high class threads available, low class threads will never get any CPU time and thus never any chance to perform any work.
The problem appears as follow: Your code obtains a spinlock in a low prio class thread and while it is in the middle of that lock, the time quantum has exceeded and the thread stops running. The only way how this spinlock can be released again is if that low prio class thread gets CPU time again but this is not guaranteed to happen. You may have a couple of high prio class threads that constantly want to run and the task scheduler will always prioritize those. One of them may run across the spinlock and try to obtain it, which isn't possible of course, and the system will make it yield. The problem is: A thread that yielded is immediately available for running again! Having a higher prio than the thread holding the lock, the thread holding the lock has no chance to get CPU runtime. Either some other thread will get runtime or the thread that just yielded.
Why does this problem not occur with mutexes? When the high prio thread cannot obtain the mutex, it won't yield, it may spin a bit but will eventually be sent to sleep. A sleeping thread is not available for running until it is woken up by an event, e.g. an event like the mutex being unlocked it has been waiting for. Apple is aware of that problem and has deprecated OSSpinLock as a result. The new lock is called os_unfair_lock. This lock avoids the situation mentioned above as it is aware of the different thread priority classes. If you are sure that using spinlocks is a good idea in your iOS project, use that one. Stay away from OSSpinLock! And under no circumstances implement your own spinlocks in iOS! If in doubt, use a mutex. macOS is not affected by this issue as it has a different thread scheduler that won't allow any thread (even low prio threads) to "run dry" on CPU time, still the same situation can arise there and will then lead to very poor performance, thus OSSpinLock is deprecated on macOS as well.
继续Mecki的建议,Alexander Sandler的博客,Alex on Linux上的文章pthread互斥锁vs pthread自旋锁展示了如何使用#ifdef实现自旋锁和互斥锁来测试行为。
但是,一定要根据你的观察来做最后的决定,理解正如所举的例子是一个孤立的案例,你的项目要求、环境可能完全不同。
还请注意,在某些环境和条件下(例如在调度级别>= dispatch level的windows上运行),您不能使用互斥,而应该使用自旋锁。 在unix上-同样的事情。
这是竞争对手stackexchange unix网站上的等效问题: https://unix.stackexchange.com/questions/5107/why-are-spin-locks-good-choices-in-linux-kernel-design-instead-of-something-more
windows系统上的调度信息: http://download.microsoft.com/download/e/b/a/eba1050f-a31d-436b-9281-92cdfeae4b45/IRQL_thread.doc
自旋锁和互斥锁同步机制现在非常常见。
让我们首先考虑Spinlock。
基本上,它是一个忙碌的等待操作,这意味着我们必须等待指定的锁被释放,然后才能继续进行下一个操作。概念上很简单,但实现起来却不是那么回事。例如:如果锁还没有释放,那么线程已经被换出并进入睡眠状态,我们应该处理它吗?当两个线程同时请求访问时,如何处理同步锁?
通常,最直观的想法是通过一个变量来处理同步,以保护临界区。互斥锁的概念是相似的,但它们仍然是不同的。重点关注:CPU利用率。自旋锁需要消耗CPU时间来等待执行操作,因此,我们可以总结两者之间的差异:
在同构多核环境中,如果在临界区花费的时间较小,则使用Spinlock,因为我们可以减少上下文切换时间。(单核比较不重要,因为有些系统在中间实现Spinlock开关)
在Windows中,使用Spinlock会将线程升级到DISPATCH_LEVEL,这在某些情况下可能是不允许的,所以这次我们必须使用互斥锁(APC_LEVEL)。
梅基的回答非常准确。然而,在单个处理器上,当任务等待中断服务例程提供的锁时,使用自旋锁可能是有意义的。中断将控制权传递给ISR, ISR将准备好资源供等待任务使用。在将控制权交还给被中断的任务之前,它将以释放锁结束。旋转任务将发现自旋锁可用并继续。
在单核/单CPU系统上使用自旋锁通常没有意义,因为只要自旋锁轮询阻塞了唯一可用的CPU核心,就没有其他线程可以运行,而且由于没有其他线程可以运行,锁也不会被解锁。低,自旋锁只在这些系统上浪费CPU时间,没有真正的好处
这是错误的。在单处理器系统上使用自旋锁不会浪费cpu周期,因为一旦一个进程使用了自旋锁,抢占就被禁用了,因此,就不会有其他进程自旋了!只是使用它没有任何意义!因此,在Uni系统上的自旋锁被内核在编译时用preempt_disable代替!
使用自旋锁的规则很简单:当且仅当锁持有的实际时间有界且足够小时使用自旋锁。
注意,通常用户实现的自旋锁不满足这个要求,因为它们不禁用中断。除非禁用了抢占,否则在自旋锁进行时进行抢占违反了有限时间要求。
足够小是一种判断,取决于上下文。
例外:一些内核编程即使在时间没有限制的情况下也必须使用自旋锁。特别是如果CPU没有工作要做,它别无选择,只能旋转,直到有更多的工作出现。
Special danger: in low level programming take great care when multiple interrupt priorities exist (usually there is at least one non-maskable interrupt). In this higher priority pre-emptions can run even if interrupts at the thread priority are disabled (such as priority hardware services, often related to the virtual memory management). Provided a strict priority separation is maintained, the condition for bounded real time must be relaxed and replaced with bounded system time at that priority level. Note in this case not only can the lock holder be pre-empted but the spinner can also be interrupted; this is generally not a problem because there's nothing you can do about it.
Spinlocks can actually perform very poorly on NUMA machines. The issue is easy to understand and very hard to fix (without switching to a mutex, that is). Consider a spinlock that lives in DRAM "near" core A, and threads on A and B contending for that lock. Assume that B is remote from this DRAM. As we all know, this means that memory accesses by A will be 5x or so faster than memory accesses by B, because B's accesses will need to traverse the bus of the NUMA chip, while A's accesses are local and hence avoid that bus traversal.
实际上,A的自旋逻辑将比B的快5倍或更多。是的,他们争论,B破坏了A,但影响是不对称的:当A赢得了下一次访问锁的竞争时,它将获得本地加载和存储,因此将以更高的指令速率旋转。当B旋转时,这些远程加载和存储将是缓慢的,所以B旋转是慢动作。
结果是,我们在Derecho的工作中观察到,我们得到了一个非常不公平的自旋。A比B更受青睐,被B锁定需要很长时间。
How would you observe this? In our case, we use LibFabrics, and that library has a few threads that get scattered over multiple cores. Within the LibFabric logic, A and B are spinning to lock and then check a completion queue associated with RDMA hardware. So the effect is that A gets to check this queue 5x more often than B. In cases where an action by B is needed (the completed operation at the head of that queue is owned by B), A effectively starves B for access -- slowing down LibFabrics in an extreme way, that snowballed to greatly impact our Derecho code. We've seen cases where A's access is so strongly favored that B might wait as long as 10ms for the lock -- even though under uncontended situations, B would grab this lock in 0.2us. So, the effect can be quite extreme.
结论?甚至不要考虑在NUMA系统上使用自旋锁,因为您的线程可能(1)在不同的NUMA内核上,(2)在分配自旋锁的DRAM上具有不同的位置。您将看到大量的性能问题!(3)当使用具有多个线程的第三方库时,请记住他们可能没有阅读此讨论,可能会出错!