我试图理解多处理相对于线程的优势。我知道多处理绕过了全局解释器锁,但是还有什么其他的优势,线程不能做同样的事情吗?


当前回答

正如问题中提到的,Python中的多处理是实现真正并行的唯一方法。多线程无法实现这一点,因为GIL阻止线程并行运行。

As a consequence, threading may not always be useful in Python, and in fact, may even result in worse performance depending on what you are trying to achieve. For example, if you are performing a CPU-bound task such as decompressing gzip files or 3D-rendering (anything CPU intensive) then threading may actually hinder your performance rather than help. In such a case, you would want to use Multiprocessing as only this method actually runs in parallel and will help distribute the weight of the task at hand. There could be some overhead to this since Multiprocessing involves copying the memory of a script into each subprocess which may cause issues for larger-sized applications.

然而,当您的任务是io绑定时,多线程就变得有用了。例如,如果您的大部分任务涉及等待api调用,那么您将使用多线程,因为为什么不在等待时在另一个线程中启动另一个请求,而不是让您的CPU无所事事。

博士TL;

多线程是并发的,用于io绑定的任务 Multiprocessing实现了真正的并行,用于cpu受限的任务

其他回答

正如问题中提到的,Python中的多处理是实现真正并行的唯一方法。多线程无法实现这一点,因为GIL阻止线程并行运行。

As a consequence, threading may not always be useful in Python, and in fact, may even result in worse performance depending on what you are trying to achieve. For example, if you are performing a CPU-bound task such as decompressing gzip files or 3D-rendering (anything CPU intensive) then threading may actually hinder your performance rather than help. In such a case, you would want to use Multiprocessing as only this method actually runs in parallel and will help distribute the weight of the task at hand. There could be some overhead to this since Multiprocessing involves copying the memory of a script into each subprocess which may cause issues for larger-sized applications.

然而,当您的任务是io绑定时,多线程就变得有用了。例如,如果您的大部分任务涉及等待api调用,那么您将使用多线程,因为为什么不在等待时在另一个线程中启动另一个请求,而不是让您的CPU无所事事。

博士TL;

多线程是并发的,用于io绑定的任务 Multiprocessing实现了真正的并行,用于cpu受限的任务

As I learnd in university most of the answers above are right. In PRACTISE on different platforms (always using python) spawning multiple threads ends up like spawning one process. The difference is the multiple cores share the load instead of only 1 core processing everything at 100%. So if you spawn for example 10 threads on a 4 core pc, you will end up getting only the 25% of the cpus power!! And if u spawn 10 processes u will end up with the cpu processing at 100% (if u dont have other limitations). Im not a expert in all the new technologies. Im answering with own real experience background

以下是我想到的一些优点和缺点。

多处理

Pros

独立的内存空间 代码通常很简单 利用多个cpu和核 避免了cPython的GIL限制 消除了对同步原语的大部分需求,除非您使用共享内存(相反,它更像是IPC的通信模型) 子进程是可中断/可杀死的 Python多处理模块包含有用的抽象,其接口类似于线程。线程 必须使用cPython进行cpu绑定处理

Cons

IPC有点复杂,开销更大(通信模型vs.共享内存/对象) 更大的内存占用

线程

Pros

轻量级——低内存占用 共享内存-使访问状态从另一个上下文更容易 允许您轻松地创建响应式ui 正确释放GIL的cPython C扩展模块将并行运行 对于I/ o约束应用程序来说是一个很好的选择

Cons

cPython -服从GIL 不是可中断/ killable 如果不遵循命令队列/消息泵模型(使用queue模块),则必须手动使用同步原语(需要对锁定的粒度进行决策) 代码通常更难理解和正确编写——竞争条件的可能性急剧增加

进程可能有多个线程。这些线程可以共享内存,并且是进程中的执行单元。

进程运行在CPU上,因此线程驻留在每个进程之下。进程是独立运行的独立实体。如果您想在每个进程之间共享数据或状态,您可以使用内存存储工具,如缓存(redis, memcache),文件或数据库。

Threading's job is to enable applications to be responsive. Suppose you have a database connection and you need to respond to user input. Without threading, if the database connection is busy the application will not be able to respond to the user. By splitting off the database connection into a separate thread you can make the application more responsive. Also because both threads are in the same process, they can access the same data structures - good performance, plus a flexible software design.

注意,由于GIL,应用程序实际上并没有同时做两件事,但我们所做的是将数据库上的资源锁放在一个单独的线程中,这样CPU时间就可以在它和用户交互之间切换。CPU时间在线程之间分配。

Multiprocessing is for times when you really do want more than one thing to be done at any given time. Suppose your application needs to connect to 6 databases and perform a complex matrix transformation on each dataset. Putting each job in a separate thread might help a little because when one connection is idle another one could get some CPU time, but the processing would not be done in parallel because the GIL means that you're only ever using the resources of one CPU. By putting each job in a Multiprocessing process, each can run on it's own CPU and run at full efficiency.