很久以前,我写了一个web-spider,我用多线程使并发请求同时发生。那是在我年轻时使用Python的时候,那时我还不知道GIL以及它给多线程代码带来的麻烦(IE,大多数时候这些东西最终都被序列化了!)……

我想重做这段代码,使其更健壮,性能更好。我基本上有两种方法可以做到这一点:我可以使用2.6+中新的多处理模块,或者我可以使用某种基于反应堆/事件的模型。我宁愿选择后者,因为它更简单,更不容易出错。

所以这个问题涉及到什么样的框架最适合我的需求。以下是我目前所知道的一些选择:

Twisted: The granddaddy of Python reactor frameworks: seems complex and a bit bloated however. Steep learning curve for a small task. Eventlet: From the guys at lindenlab. Greenlet based framework that's geared towards these kinds of tasks. I had a look at the code though and it's not too pretty: non-pep8 compliant, scattered with prints (why do people do this in a framework!?), API seems a little inconsistent. PyEv: Immature, doesn't seem to be anyone using it right now though it is based on libevent so it's got a solid backend. asyncore: From the stdlib: über low-level, seems like a lot of legwork involved just to get something off the ground. tornado: Though this is a server oriented product designed to server dynamic websites it does feature an async HTTP client and a simple ioloop. Looks like it could get the job done but not what it was intended for. [edit: doesn't run on Windows unfortunately, which counts it out for me - its a requirement for me to support this lame platform]

我还遗漏了什么吗?当然,一定有一个库在那里,适合一个简化异步网络库的甜蜜点!

[edit: big thanks to intgr for his pointer to this page. If you scroll to the bottom you will see there is a really nice list of projects that aim to tackle this task in one way or another. It seems actually that things have indeed moved on since the inception of Twisted: people now seem to favour a co-routine based solution rather than a traditional reactor / callback oriented one. The benefits of this approach are clearer more direct code: I've certainly found in the past, especially when working with boost.asio in C++ that callback based code can lead to designs that can be hard-to-follow and are relatively obscure to the untrained eye. Using co-routines allows you to write code that looks a little more synchronous at least. I guess now my task is to work out which one of these many libraries I like the look of and give it a go! Glad I asked now...]

[编辑:可能对任何关注或无意中发现这个问题或在任何意义上关心这个主题的人都感兴趣:我发现了一篇关于这项工作可用工具的当前状态的非常棒的文章]


当前回答

wizzer是一个使用pyev的小型异步套接字框架。它非常快,主要是因为pyev。它试图提供一个类似的界面,只是做了一些细微的改变。

其他回答

如果你只是想要一个简化的、轻量级的HTTP请求库,那么我发现Unirest真的很好

你也可以试试Syncless。它是基于协程的(所以它类似于concurrent, Eventlet和gevent)。它实现了对套接字的插入式非阻塞替换。插座,插座。Gethostbyname(等),ssl。SSLSocket、时间。睡觉,然后选择。选择。这是太快了。它需要Stackless Python和libevent。它包含一个用C (Pyrex/Cython)编写的强制Python扩展。

我喜欢并发Python模块,它依赖于Stackless Python微线程或Greenlets来实现轻量级线程。所有阻塞的网络I/O都是通过一个libevent循环透明地实现异步的,因此它应该与真正的异步服务器一样高效。

我认为它在这方面类似于Eventlet。

缺点是它的API与Python的套接字/线程模块有很大的不同;你需要重写你的应用程序的一部分(或写一个兼容性垫片层)

编辑:似乎还有cogen,它与cogen类似,但使用Python 2.5的增强生成器来实现协程,而不是greenlet。这使得它比并发和其他替代方案更具可移植性。网络I/O直接用epoll/kqueue/iocp完成。

wizzer是一个使用pyev的小型异步套接字框架。它非常快,主要是因为pyev。它试图提供一个类似的界面,只是做了一些细微的改变。

有一本关于这个主题的好书:Abe Fettig写的《Twisted Network Programming Essentials》。这些示例展示了如何编写非常python化的代码,就我个人而言,它们并不基于臃肿的框架。看看书上的解,如果它们不干净,那我就不知道干净是什么意思。

我唯一的困惑是我对其他框架的困惑,比如Ruby。我担心,它会扩大吗?我不愿意将客户端提交给一个有可伸缩性问题的框架。