很久以前,我写了一个web-spider,我用多线程使并发请求同时发生。那是在我年轻时使用Python的时候,那时我还不知道GIL以及它给多线程代码带来的麻烦(IE,大多数时候这些东西最终都被序列化了!)……

我想重做这段代码,使其更健壮,性能更好。我基本上有两种方法可以做到这一点:我可以使用2.6+中新的多处理模块,或者我可以使用某种基于反应堆/事件的模型。我宁愿选择后者,因为它更简单,更不容易出错。

所以这个问题涉及到什么样的框架最适合我的需求。以下是我目前所知道的一些选择:

Twisted: The granddaddy of Python reactor frameworks: seems complex and a bit bloated however. Steep learning curve for a small task. Eventlet: From the guys at lindenlab. Greenlet based framework that's geared towards these kinds of tasks. I had a look at the code though and it's not too pretty: non-pep8 compliant, scattered with prints (why do people do this in a framework!?), API seems a little inconsistent. PyEv: Immature, doesn't seem to be anyone using it right now though it is based on libevent so it's got a solid backend. asyncore: From the stdlib: über low-level, seems like a lot of legwork involved just to get something off the ground. tornado: Though this is a server oriented product designed to server dynamic websites it does feature an async HTTP client and a simple ioloop. Looks like it could get the job done but not what it was intended for. [edit: doesn't run on Windows unfortunately, which counts it out for me - its a requirement for me to support this lame platform]

我还遗漏了什么吗?当然,一定有一个库在那里,适合一个简化异步网络库的甜蜜点!

[edit: big thanks to intgr for his pointer to this page. If you scroll to the bottom you will see there is a really nice list of projects that aim to tackle this task in one way or another. It seems actually that things have indeed moved on since the inception of Twisted: people now seem to favour a co-routine based solution rather than a traditional reactor / callback oriented one. The benefits of this approach are clearer more direct code: I've certainly found in the past, especially when working with boost.asio in C++ that callback based code can lead to designs that can be hard-to-follow and are relatively obscure to the untrained eye. Using co-routines allows you to write code that looks a little more synchronous at least. I guess now my task is to work out which one of these many libraries I like the look of and give it a go! Glad I asked now...]

[编辑:可能对任何关注或无意中发现这个问题或在任何意义上关心这个主题的人都感兴趣:我发现了一篇关于这项工作可用工具的当前状态的非常棒的文章]


当前回答

扭曲是复杂的,你说得对。扭曲不是臃肿。

如果你看一下这里:http://twistedmatrix.com/trac/browser/trunk/twisted,你会发现一个有组织的、全面的、经过良好测试的internet协议套件,以及用于编写和部署非常复杂的网络应用程序的助手代码。我不会把膨胀和全面混为一谈。

众所周知,Twisted文档乍一看并不是最友好的,我相信这会让很多人望而却步。但如果你肯花时间的话,《Twisted》确实很棒。我这样做了,事实证明这是值得的,我建议其他人也试试。

其他回答

我喜欢并发Python模块,它依赖于Stackless Python微线程或Greenlets来实现轻量级线程。所有阻塞的网络I/O都是通过一个libevent循环透明地实现异步的,因此它应该与真正的异步服务器一样高效。

我认为它在这方面类似于Eventlet。

缺点是它的API与Python的套接字/线程模块有很大的不同;你需要重写你的应用程序的一部分(或写一个兼容性垫片层)

编辑:似乎还有cogen,它与cogen类似,但使用Python 2.5的增强生成器来实现协程,而不是greenlet。这使得它比并发和其他替代方案更具可移植性。网络I/O直接用epoll/kqueue/iocp完成。

Nicholas Piël在他的博客上对这些框架进行了一个非常有趣的比较:非常值得一读!

我确认syncless的优点。它可以使用libev (libevent更新、更干净、性能更好的版本)。一段时间以前,它没有像libevent那样多的支持,但现在开发过程更先进,syncless非常有用。

I've started to use twisted for some things. The beauty of it almost is because it's "bloated." There are connectors for just about any of the main protocols out there. You can have a jabber bot that will take commands and post to an irc server, email them to someone, run a command, read from an NNTP server, and monitor a web page for changes. The bad news is it can do all of that and can make things overly complex for simple tasks like the OP explained. The advantage of python though is you only include what you need. So while the download may be 20mb, you may only include 2mb of libraries (which is still a lot). My biggest complaint with twisted is although they include examples, anything beyond a basic tcp server you're on your own.

虽然不是python解决方案,但我看到node.js最近获得了更多的吸引力。事实上,我考虑过在较小的项目中使用它,但当我听到javascript时,我只是畏缩:)

欢迎您来看看PyWorks,它采用了一种完全不同的方法。它允许对象实例在它们自己的线程中运行,并对该对象进行异步函数调用。

只要让一个类继承自任务而不是对象,它是异步的,所有的方法调用都是代理。返回值(如果需要的话)是Future代理。

res = obj.method( args )
# code continues here without waiting for method to finish
do_something_else( )
print "Result = %d" % res # Code will block here, if res not calculated yet

PyWorks可以在http://bitbucket.org/raindog/pyworks上找到