很久以前,我写了一个web-spider,我用多线程使并发请求同时发生。那是在我年轻时使用Python的时候,那时我还不知道GIL以及它给多线程代码带来的麻烦(IE,大多数时候这些东西最终都被序列化了!)……

我想重做这段代码,使其更健壮,性能更好。我基本上有两种方法可以做到这一点:我可以使用2.6+中新的多处理模块,或者我可以使用某种基于反应堆/事件的模型。我宁愿选择后者,因为它更简单,更不容易出错。

所以这个问题涉及到什么样的框架最适合我的需求。以下是我目前所知道的一些选择:

Twisted: The granddaddy of Python reactor frameworks: seems complex and a bit bloated however. Steep learning curve for a small task. Eventlet: From the guys at lindenlab. Greenlet based framework that's geared towards these kinds of tasks. I had a look at the code though and it's not too pretty: non-pep8 compliant, scattered with prints (why do people do this in a framework!?), API seems a little inconsistent. PyEv: Immature, doesn't seem to be anyone using it right now though it is based on libevent so it's got a solid backend. asyncore: From the stdlib: über low-level, seems like a lot of legwork involved just to get something off the ground. tornado: Though this is a server oriented product designed to server dynamic websites it does feature an async HTTP client and a simple ioloop. Looks like it could get the job done but not what it was intended for. [edit: doesn't run on Windows unfortunately, which counts it out for me - its a requirement for me to support this lame platform]

我还遗漏了什么吗?当然,一定有一个库在那里,适合一个简化异步网络库的甜蜜点!

[edit: big thanks to intgr for his pointer to this page. If you scroll to the bottom you will see there is a really nice list of projects that aim to tackle this task in one way or another. It seems actually that things have indeed moved on since the inception of Twisted: people now seem to favour a co-routine based solution rather than a traditional reactor / callback oriented one. The benefits of this approach are clearer more direct code: I've certainly found in the past, especially when working with boost.asio in C++ that callback based code can lead to designs that can be hard-to-follow and are relatively obscure to the untrained eye. Using co-routines allows you to write code that looks a little more synchronous at least. I guess now my task is to work out which one of these many libraries I like the look of and give it a go! Glad I asked now...]

[编辑:可能对任何关注或无意中发现这个问题或在任何意义上关心这个主题的人都感兴趣:我发现了一篇关于这项工作可用工具的当前状态的非常棒的文章]


当前回答

Gevent清除了eventlet。

在api方面,它遵循与标准库相同的约定(特别是线程和多处理模块),这是有意义的。因此,您可以使用熟悉的队列和事件。

它只支持libevent(更新:libev自1.0起)作为反应堆实现,但充分利用了它,具有基于libevent-http的快速WSGI服务器,并通过libevent- DNS解析DNS查询,而不是像大多数其他库那样使用线程池。(更新:自1.0 c-ares用于异步DNS查询;线程池也是一个选项。)

像eventlet一样,它通过使用greenlet使回调和延迟变得不必要。

看看例子:同时下载多个网址,长轮询网络聊天。

其他回答

这些解决方案都不能避免GIL阻止CPU并行的事实——它们只是获得IO并行的更好方法,而线程已经拥有了。如果您认为可以做更好的IO,那么无论如何都要追求其中之一,但如果您的瓶颈是处理结果,除了多处理模块,这里没有任何东西可以帮助您。

Nicholas Piël在他的博客上对这些框架进行了一个非常有趣的比较:非常值得一读!

卡马利亚还没有被提到。它的并发模型基于在收件箱和发件箱之间通过消息传递将组件连接在一起。下面是一个简要的概述。

我确认syncless的优点。它可以使用libev (libevent更新、更干净、性能更好的版本)。一段时间以前,它没有像libevent那样多的支持,但现在开发过程更先进,syncless非常有用。

I've started to use twisted for some things. The beauty of it almost is because it's "bloated." There are connectors for just about any of the main protocols out there. You can have a jabber bot that will take commands and post to an irc server, email them to someone, run a command, read from an NNTP server, and monitor a web page for changes. The bad news is it can do all of that and can make things overly complex for simple tasks like the OP explained. The advantage of python though is you only include what you need. So while the download may be 20mb, you may only include 2mb of libraries (which is still a lot). My biggest complaint with twisted is although they include examples, anything beyond a basic tcp server you're on your own.

虽然不是python解决方案,但我看到node.js最近获得了更多的吸引力。事实上,我考虑过在较小的项目中使用它,但当我听到javascript时,我只是畏缩:)