很久以前,我写了一个web-spider,我用多线程使并发请求同时发生。那是在我年轻时使用Python的时候,那时我还不知道GIL以及它给多线程代码带来的麻烦(IE,大多数时候这些东西最终都被序列化了!)……

我想重做这段代码,使其更健壮,性能更好。我基本上有两种方法可以做到这一点:我可以使用2.6+中新的多处理模块,或者我可以使用某种基于反应堆/事件的模型。我宁愿选择后者,因为它更简单,更不容易出错。

所以这个问题涉及到什么样的框架最适合我的需求。以下是我目前所知道的一些选择:

Twisted: The granddaddy of Python reactor frameworks: seems complex and a bit bloated however. Steep learning curve for a small task. Eventlet: From the guys at lindenlab. Greenlet based framework that's geared towards these kinds of tasks. I had a look at the code though and it's not too pretty: non-pep8 compliant, scattered with prints (why do people do this in a framework!?), API seems a little inconsistent. PyEv: Immature, doesn't seem to be anyone using it right now though it is based on libevent so it's got a solid backend. asyncore: From the stdlib: über low-level, seems like a lot of legwork involved just to get something off the ground. tornado: Though this is a server oriented product designed to server dynamic websites it does feature an async HTTP client and a simple ioloop. Looks like it could get the job done but not what it was intended for. [edit: doesn't run on Windows unfortunately, which counts it out for me - its a requirement for me to support this lame platform]

我还遗漏了什么吗?当然,一定有一个库在那里,适合一个简化异步网络库的甜蜜点!

[edit: big thanks to intgr for his pointer to this page. If you scroll to the bottom you will see there is a really nice list of projects that aim to tackle this task in one way or another. It seems actually that things have indeed moved on since the inception of Twisted: people now seem to favour a co-routine based solution rather than a traditional reactor / callback oriented one. The benefits of this approach are clearer more direct code: I've certainly found in the past, especially when working with boost.asio in C++ that callback based code can lead to designs that can be hard-to-follow and are relatively obscure to the untrained eye. Using co-routines allows you to write code that looks a little more synchronous at least. I guess now my task is to work out which one of these many libraries I like the look of and give it a go! Glad I asked now...]

[编辑:可能对任何关注或无意中发现这个问题或在任何意义上关心这个主题的人都感兴趣:我发现了一篇关于这项工作可用工具的当前状态的非常棒的文章]


当前回答

卡马利亚还没有被提到。它的并发模型基于在收件箱和发件箱之间通过消息传递将组件连接在一起。下面是一个简要的概述。

其他回答

我确认syncless的优点。它可以使用libev (libevent更新、更干净、性能更好的版本)。一段时间以前,它没有像libevent那样多的支持,但现在开发过程更先进,syncless非常有用。

wizzer是一个使用pyev的小型异步套接字框架。它非常快,主要是因为pyev。它试图提供一个类似的界面,只是做了一些细微的改变。

欢迎您来看看PyWorks,它采用了一种完全不同的方法。它允许对象实例在它们自己的线程中运行,并对该对象进行异步函数调用。

只要让一个类继承自任务而不是对象,它是异步的,所有的方法调用都是代理。返回值(如果需要的话)是Future代理。

res = obj.method( args )
# code continues here without waiting for method to finish
do_something_else( )
print "Result = %d" % res # Code will block here, if res not calculated yet

PyWorks可以在http://bitbucket.org/raindog/pyworks上找到

卡马利亚还没有被提到。它的并发模型基于在收件箱和发件箱之间通过消息传递将组件连接在一起。下面是一个简要的概述。

I've started to use twisted for some things. The beauty of it almost is because it's "bloated." There are connectors for just about any of the main protocols out there. You can have a jabber bot that will take commands and post to an irc server, email them to someone, run a command, read from an NNTP server, and monitor a web page for changes. The bad news is it can do all of that and can make things overly complex for simple tasks like the OP explained. The advantage of python though is you only include what you need. So while the download may be 20mb, you may only include 2mb of libraries (which is still a lot). My biggest complaint with twisted is although they include examples, anything beyond a basic tcp server you're on your own.

虽然不是python解决方案,但我看到node.js最近获得了更多的吸引力。事实上,我考虑过在较小的项目中使用它,但当我听到javascript时,我只是畏缩:)