我试图理解多处理相对于线程的优势。我知道多处理绕过了全局解释器锁,但是还有什么其他的优势,线程不能做同样的事情吗?
当前回答
Threading's job is to enable applications to be responsive. Suppose you have a database connection and you need to respond to user input. Without threading, if the database connection is busy the application will not be able to respond to the user. By splitting off the database connection into a separate thread you can make the application more responsive. Also because both threads are in the same process, they can access the same data structures - good performance, plus a flexible software design.
注意,由于GIL,应用程序实际上并没有同时做两件事,但我们所做的是将数据库上的资源锁放在一个单独的线程中,这样CPU时间就可以在它和用户交互之间切换。CPU时间在线程之间分配。
Multiprocessing is for times when you really do want more than one thing to be done at any given time. Suppose your application needs to connect to 6 databases and perform a complex matrix transformation on each dataset. Putting each job in a separate thread might help a little because when one connection is idle another one could get some CPU time, but the processing would not be done in parallel because the GIL means that you're only ever using the resources of one CPU. By putting each job in a Multiprocessing process, each can run on it's own CPU and run at full efficiency.
其他回答
线程共享相同的内存空间,以确保两个线程不共享相同的内存位置,因此必须采取特殊的预防措施。CPython解释器使用一种称为GIL的机制来处理这个问题,或全局解释器锁
什么是GIL(我只是想澄清GIL,上面重复了一次)?
在CPython中,全局解释器锁(GIL)是一个互斥锁,用于保护对Python对象的访问,防止多个线程同时执行Python字节码。这个锁是必要的,主要是因为CPython的内存管理不是线程安全的。
对于主要问题,我们可以使用用例,如何进行比较?
1-线程的用例:在GUI程序中,线程可以用来使应用程序具有响应性。例如,在文本编辑程序中,一个线程可以负责记录用户输入,另一个线程可以负责显示文本,第三个线程可以进行拼写检查,等等。在这里,程序必须等待用户交互。这是最大的瓶颈。线程的另一个用例是受IO限制或受网络限制的程序,例如web scraper。
2- Multiprocessing的用例:当程序是CPU密集型的,并且不需要做任何IO或用户交互的情况下,Multiprocessing优于线程。
要了解更多详细信息,请访问此链接和链接,或者您需要深入了解线程访问这里,多处理访问这里
正如问题中提到的,Python中的多处理是实现真正并行的唯一方法。多线程无法实现这一点,因为GIL阻止线程并行运行。
As a consequence, threading may not always be useful in Python, and in fact, may even result in worse performance depending on what you are trying to achieve. For example, if you are performing a CPU-bound task such as decompressing gzip files or 3D-rendering (anything CPU intensive) then threading may actually hinder your performance rather than help. In such a case, you would want to use Multiprocessing as only this method actually runs in parallel and will help distribute the weight of the task at hand. There could be some overhead to this since Multiprocessing involves copying the memory of a script into each subprocess which may cause issues for larger-sized applications.
然而,当您的任务是io绑定时,多线程就变得有用了。例如,如果您的大部分任务涉及等待api调用,那么您将使用多线程,因为为什么不在等待时在另一个线程中启动另一个请求,而不是让您的CPU无所事事。
博士TL;
多线程是并发的,用于io绑定的任务 Multiprocessing实现了真正的并行,用于cpu受限的任务
As I learnd in university most of the answers above are right. In PRACTISE on different platforms (always using python) spawning multiple threads ends up like spawning one process. The difference is the multiple cores share the load instead of only 1 core processing everything at 100%. So if you spawn for example 10 threads on a 4 core pc, you will end up getting only the 25% of the cpus power!! And if u spawn 10 processes u will end up with the cpu processing at 100% (if u dont have other limitations). Im not a expert in all the new technologies. Im answering with own real experience background
关键的优势是隔离。进程崩溃不会导致其他进程崩溃,而线程崩溃可能会对其他线程造成严重破坏。
另一件没有提到的事情是,它取决于你使用的是什么操作系统。在Windows中,进程是昂贵的,所以线程在Windows中会更好,但在unix中,进程比它们的Windows变体更快,所以在unix中使用进程要安全得多,而且生成速度快。
推荐文章
- 将Pandas或Numpy Nan替换为None以用于MysqlDB
- 使用pandas对同一列进行多个聚合
- 使用Python解析HTML
- django MultiValueDictKeyError错误,我如何处理它
- 如何在for循环期间修改列表条目?
- 我如何在Django中创建一个鼻涕虫?
- 没有名为'django.core.urlresolvers'的模块
- 蟒蛇导出环境文件
- Django - makemigrations -未检测到任何更改
- SQLAlchemy:引擎、连接和会话差异
- 在Python Pandas中删除多个列中的所有重复行
- 更改pandas DataFrame中的特定列名
- 将Pandas多索引转换为列
- 熊猫在每组中获得最高的n个记录
- 熊猫数据帧得到每组的第一行