很久以前,我写了一个web-spider,我用多线程使并发请求同时发生。那是在我年轻时使用Python的时候,那时我还不知道GIL以及它给多线程代码带来的麻烦(IE,大多数时候这些东西最终都被序列化了!)……

我想重做这段代码,使其更健壮,性能更好。我基本上有两种方法可以做到这一点:我可以使用2.6+中新的多处理模块,或者我可以使用某种基于反应堆/事件的模型。我宁愿选择后者,因为它更简单,更不容易出错。

所以这个问题涉及到什么样的框架最适合我的需求。以下是我目前所知道的一些选择:

Twisted: The granddaddy of Python reactor frameworks: seems complex and a bit bloated however. Steep learning curve for a small task. Eventlet: From the guys at lindenlab. Greenlet based framework that's geared towards these kinds of tasks. I had a look at the code though and it's not too pretty: non-pep8 compliant, scattered with prints (why do people do this in a framework!?), API seems a little inconsistent. PyEv: Immature, doesn't seem to be anyone using it right now though it is based on libevent so it's got a solid backend. asyncore: From the stdlib: über low-level, seems like a lot of legwork involved just to get something off the ground. tornado: Though this is a server oriented product designed to server dynamic websites it does feature an async HTTP client and a simple ioloop. Looks like it could get the job done but not what it was intended for. [edit: doesn't run on Windows unfortunately, which counts it out for me - its a requirement for me to support this lame platform]

我还遗漏了什么吗?当然,一定有一个库在那里,适合一个简化异步网络库的甜蜜点!

[edit: big thanks to intgr for his pointer to this page. If you scroll to the bottom you will see there is a really nice list of projects that aim to tackle this task in one way or another. It seems actually that things have indeed moved on since the inception of Twisted: people now seem to favour a co-routine based solution rather than a traditional reactor / callback oriented one. The benefits of this approach are clearer more direct code: I've certainly found in the past, especially when working with boost.asio in C++ that callback based code can lead to designs that can be hard-to-follow and are relatively obscure to the untrained eye. Using co-routines allows you to write code that looks a little more synchronous at least. I guess now my task is to work out which one of these many libraries I like the look of and give it a go! Glad I asked now...]

[编辑:可能对任何关注或无意中发现这个问题或在任何意义上关心这个主题的人都感兴趣:我发现了一篇关于这项工作可用工具的当前状态的非常棒的文章]


当前回答

Gevent清除了eventlet。

在api方面,它遵循与标准库相同的约定(特别是线程和多处理模块),这是有意义的。因此,您可以使用熟悉的队列和事件。

它只支持libevent(更新:libev自1.0起)作为反应堆实现,但充分利用了它,具有基于libevent-http的快速WSGI服务器,并通过libevent- DNS解析DNS查询,而不是像大多数其他库那样使用线程池。(更新:自1.0 c-ares用于异步DNS查询;线程池也是一个选项。)

像eventlet一样,它通过使用greenlet使回调和延迟变得不必要。

看看例子:同时下载多个网址,长轮询网络聊天。

其他回答

卡马利亚还没有被提到。它的并发模型基于在收件箱和发件箱之间通过消息传递将组件连接在一起。下面是一个简要的概述。

有一本关于这个主题的好书:Abe Fettig写的《Twisted Network Programming Essentials》。这些示例展示了如何编写非常python化的代码,就我个人而言,它们并不基于臃肿的框架。看看书上的解,如果它们不干净,那我就不知道干净是什么意思。

我唯一的困惑是我对其他框架的困惑,比如Ruby。我担心,它会扩大吗?我不愿意将客户端提交给一个有可伸缩性问题的框架。

Gevent清除了eventlet。

在api方面,它遵循与标准库相同的约定(特别是线程和多处理模块),这是有意义的。因此,您可以使用熟悉的队列和事件。

它只支持libevent(更新:libev自1.0起)作为反应堆实现,但充分利用了它,具有基于libevent-http的快速WSGI服务器,并通过libevent- DNS解析DNS查询,而不是像大多数其他库那样使用线程池。(更新:自1.0 c-ares用于异步DNS查询;线程池也是一个选项。)

像eventlet一样,它通过使用greenlet使回调和延迟变得不必要。

看看例子:同时下载多个网址,长轮询网络聊天。

这些解决方案都不能避免GIL阻止CPU并行的事实——它们只是获得IO并行的更好方法,而线程已经拥有了。如果您认为可以做更好的IO,那么无论如何都要追求其中之一,但如果您的瓶颈是处理结果,除了多处理模块,这里没有任何东西可以帮助您。

你也可以试试Syncless。它是基于协程的(所以它类似于concurrent, Eventlet和gevent)。它实现了对套接字的插入式非阻塞替换。插座,插座。Gethostbyname(等),ssl。SSLSocket、时间。睡觉,然后选择。选择。这是太快了。它需要Stackless Python和libevent。它包含一个用C (Pyrex/Cython)编写的强制Python扩展。