很久以前,我写了一个web-spider,我用多线程使并发请求同时发生。那是在我年轻时使用Python的时候,那时我还不知道GIL以及它给多线程代码带来的麻烦(IE,大多数时候这些东西最终都被序列化了!)……

我想重做这段代码,使其更健壮,性能更好。我基本上有两种方法可以做到这一点:我可以使用2.6+中新的多处理模块,或者我可以使用某种基于反应堆/事件的模型。我宁愿选择后者,因为它更简单,更不容易出错。

所以这个问题涉及到什么样的框架最适合我的需求。以下是我目前所知道的一些选择:

Twisted: The granddaddy of Python reactor frameworks: seems complex and a bit bloated however. Steep learning curve for a small task. Eventlet: From the guys at lindenlab. Greenlet based framework that's geared towards these kinds of tasks. I had a look at the code though and it's not too pretty: non-pep8 compliant, scattered with prints (why do people do this in a framework!?), API seems a little inconsistent. PyEv: Immature, doesn't seem to be anyone using it right now though it is based on libevent so it's got a solid backend. asyncore: From the stdlib: über low-level, seems like a lot of legwork involved just to get something off the ground. tornado: Though this is a server oriented product designed to server dynamic websites it does feature an async HTTP client and a simple ioloop. Looks like it could get the job done but not what it was intended for. [edit: doesn't run on Windows unfortunately, which counts it out for me - its a requirement for me to support this lame platform]

我还遗漏了什么吗?当然,一定有一个库在那里,适合一个简化异步网络库的甜蜜点!

[edit: big thanks to intgr for his pointer to this page. If you scroll to the bottom you will see there is a really nice list of projects that aim to tackle this task in one way or another. It seems actually that things have indeed moved on since the inception of Twisted: people now seem to favour a co-routine based solution rather than a traditional reactor / callback oriented one. The benefits of this approach are clearer more direct code: I've certainly found in the past, especially when working with boost.asio in C++ that callback based code can lead to designs that can be hard-to-follow and are relatively obscure to the untrained eye. Using co-routines allows you to write code that looks a little more synchronous at least. I guess now my task is to work out which one of these many libraries I like the look of and give it a go! Glad I asked now...]

[编辑:可能对任何关注或无意中发现这个问题或在任何意义上关心这个主题的人都感兴趣:我发现了一篇关于这项工作可用工具的当前状态的非常棒的文章]


当前回答

欢迎您来看看PyWorks,它采用了一种完全不同的方法。它允许对象实例在它们自己的线程中运行,并对该对象进行异步函数调用。

只要让一个类继承自任务而不是对象,它是异步的,所有的方法调用都是代理。返回值(如果需要的话)是Future代理。

res = obj.method( args )
# code continues here without waiting for method to finish
do_something_else( )
print "Result = %d" % res # Code will block here, if res not calculated yet

PyWorks可以在http://bitbucket.org/raindog/pyworks上找到

其他回答

有一本关于这个主题的好书:Abe Fettig写的《Twisted Network Programming Essentials》。这些示例展示了如何编写非常python化的代码,就我个人而言,它们并不基于臃肿的框架。看看书上的解,如果它们不干净,那我就不知道干净是什么意思。

我唯一的困惑是我对其他框架的困惑,比如Ruby。我担心,它会扩大吗?我不愿意将客户端提交给一个有可伸缩性问题的框架。

这些解决方案都不能避免GIL阻止CPU并行的事实——它们只是获得IO并行的更好方法,而线程已经拥有了。如果您认为可以做更好的IO,那么无论如何都要追求其中之一,但如果您的瓶颈是处理结果,除了多处理模块,这里没有任何东西可以帮助您。

wizzer是一个使用pyev的小型异步套接字框架。它非常快,主要是因为pyev。它试图提供一个类似的界面,只是做了一些细微的改变。

你也可以试试Syncless。它是基于协程的(所以它类似于concurrent, Eventlet和gevent)。它实现了对套接字的插入式非阻塞替换。插座,插座。Gethostbyname(等),ssl。SSLSocket、时间。睡觉,然后选择。选择。这是太快了。它需要Stackless Python和libevent。它包含一个用C (Pyrex/Cython)编写的强制Python扩展。

欢迎您来看看PyWorks,它采用了一种完全不同的方法。它允许对象实例在它们自己的线程中运行,并对该对象进行异步函数调用。

只要让一个类继承自任务而不是对象,它是异步的,所有的方法调用都是代理。返回值(如果需要的话)是Future代理。

res = obj.method( args )
# code continues here without waiting for method to finish
do_something_else( )
print "Result = %d" % res # Code will block here, if res not calculated yet

PyWorks可以在http://bitbucket.org/raindog/pyworks上找到