很久以前,我写了一个web-spider,我用多线程使并发请求同时发生。那是在我年轻时使用Python的时候,那时我还不知道GIL以及它给多线程代码带来的麻烦(IE,大多数时候这些东西最终都被序列化了!)……

我想重做这段代码,使其更健壮,性能更好。我基本上有两种方法可以做到这一点:我可以使用2.6+中新的多处理模块,或者我可以使用某种基于反应堆/事件的模型。我宁愿选择后者,因为它更简单,更不容易出错。

所以这个问题涉及到什么样的框架最适合我的需求。以下是我目前所知道的一些选择:

Twisted: The granddaddy of Python reactor frameworks: seems complex and a bit bloated however. Steep learning curve for a small task. Eventlet: From the guys at lindenlab. Greenlet based framework that's geared towards these kinds of tasks. I had a look at the code though and it's not too pretty: non-pep8 compliant, scattered with prints (why do people do this in a framework!?), API seems a little inconsistent. PyEv: Immature, doesn't seem to be anyone using it right now though it is based on libevent so it's got a solid backend. asyncore: From the stdlib: über low-level, seems like a lot of legwork involved just to get something off the ground. tornado: Though this is a server oriented product designed to server dynamic websites it does feature an async HTTP client and a simple ioloop. Looks like it could get the job done but not what it was intended for. [edit: doesn't run on Windows unfortunately, which counts it out for me - its a requirement for me to support this lame platform]

我还遗漏了什么吗?当然,一定有一个库在那里,适合一个简化异步网络库的甜蜜点!

[edit: big thanks to intgr for his pointer to this page. If you scroll to the bottom you will see there is a really nice list of projects that aim to tackle this task in one way or another. It seems actually that things have indeed moved on since the inception of Twisted: people now seem to favour a co-routine based solution rather than a traditional reactor / callback oriented one. The benefits of this approach are clearer more direct code: I've certainly found in the past, especially when working with boost.asio in C++ that callback based code can lead to designs that can be hard-to-follow and are relatively obscure to the untrained eye. Using co-routines allows you to write code that looks a little more synchronous at least. I guess now my task is to work out which one of these many libraries I like the look of and give it a go! Glad I asked now...]

[编辑:可能对任何关注或无意中发现这个问题或在任何意义上关心这个主题的人都感兴趣:我发现了一篇关于这项工作可用工具的当前状态的非常棒的文章]


当前回答

你也可以试试Syncless。它是基于协程的(所以它类似于concurrent, Eventlet和gevent)。它实现了对套接字的插入式非阻塞替换。插座,插座。Gethostbyname(等),ssl。SSLSocket、时间。睡觉,然后选择。选择。这是太快了。它需要Stackless Python和libevent。它包含一个用C (Pyrex/Cython)编写的强制Python扩展。

其他回答

如果你只是想要一个简化的、轻量级的HTTP请求库,那么我发现Unirest真的很好

这些解决方案都不能避免GIL阻止CPU并行的事实——它们只是获得IO并行的更好方法,而线程已经拥有了。如果您认为可以做更好的IO,那么无论如何都要追求其中之一,但如果您的瓶颈是处理结果,除了多处理模块,这里没有任何东西可以帮助您。

我确认syncless的优点。它可以使用libev (libevent更新、更干净、性能更好的版本)。一段时间以前,它没有像libevent那样多的支持,但现在开发过程更先进,syncless非常有用。

扭曲是复杂的,你说得对。扭曲不是臃肿。

如果你看一下这里:http://twistedmatrix.com/trac/browser/trunk/twisted,你会发现一个有组织的、全面的、经过良好测试的internet协议套件,以及用于编写和部署非常复杂的网络应用程序的助手代码。我不会把膨胀和全面混为一谈。

众所周知,Twisted文档乍一看并不是最友好的,我相信这会让很多人望而却步。但如果你肯花时间的话,《Twisted》确实很棒。我这样做了,事实证明这是值得的,我建议其他人也试试。

我喜欢并发Python模块,它依赖于Stackless Python微线程或Greenlets来实现轻量级线程。所有阻塞的网络I/O都是通过一个libevent循环透明地实现异步的,因此它应该与真正的异步服务器一样高效。

我认为它在这方面类似于Eventlet。

缺点是它的API与Python的套接字/线程模块有很大的不同;你需要重写你的应用程序的一部分(或写一个兼容性垫片层)

编辑:似乎还有cogen,它与cogen类似,但使用Python 2.5的增强生成器来实现协程,而不是greenlet。这使得它比并发和其他替代方案更具可移植性。网络I/O直接用epoll/kqueue/iocp完成。