很久以前,我写了一个web-spider,我用多线程使并发请求同时发生。那是在我年轻时使用Python的时候,那时我还不知道GIL以及它给多线程代码带来的麻烦(IE,大多数时候这些东西最终都被序列化了!)……

我想重做这段代码,使其更健壮,性能更好。我基本上有两种方法可以做到这一点:我可以使用2.6+中新的多处理模块,或者我可以使用某种基于反应堆/事件的模型。我宁愿选择后者,因为它更简单,更不容易出错。

所以这个问题涉及到什么样的框架最适合我的需求。以下是我目前所知道的一些选择:

Twisted: The granddaddy of Python reactor frameworks: seems complex and a bit bloated however. Steep learning curve for a small task. Eventlet: From the guys at lindenlab. Greenlet based framework that's geared towards these kinds of tasks. I had a look at the code though and it's not too pretty: non-pep8 compliant, scattered with prints (why do people do this in a framework!?), API seems a little inconsistent. PyEv: Immature, doesn't seem to be anyone using it right now though it is based on libevent so it's got a solid backend. asyncore: From the stdlib: über low-level, seems like a lot of legwork involved just to get something off the ground. tornado: Though this is a server oriented product designed to server dynamic websites it does feature an async HTTP client and a simple ioloop. Looks like it could get the job done but not what it was intended for. [edit: doesn't run on Windows unfortunately, which counts it out for me - its a requirement for me to support this lame platform]

我还遗漏了什么吗?当然,一定有一个库在那里,适合一个简化异步网络库的甜蜜点!

[edit: big thanks to intgr for his pointer to this page. If you scroll to the bottom you will see there is a really nice list of projects that aim to tackle this task in one way or another. It seems actually that things have indeed moved on since the inception of Twisted: people now seem to favour a co-routine based solution rather than a traditional reactor / callback oriented one. The benefits of this approach are clearer more direct code: I've certainly found in the past, especially when working with boost.asio in C++ that callback based code can lead to designs that can be hard-to-follow and are relatively obscure to the untrained eye. Using co-routines allows you to write code that looks a little more synchronous at least. I guess now my task is to work out which one of these many libraries I like the look of and give it a go! Glad I asked now...]

[编辑:可能对任何关注或无意中发现这个问题或在任何意义上关心这个主题的人都感兴趣:我发现了一篇关于这项工作可用工具的当前状态的非常棒的文章]


当前回答

I've started to use twisted for some things. The beauty of it almost is because it's "bloated." There are connectors for just about any of the main protocols out there. You can have a jabber bot that will take commands and post to an irc server, email them to someone, run a command, read from an NNTP server, and monitor a web page for changes. The bad news is it can do all of that and can make things overly complex for simple tasks like the OP explained. The advantage of python though is you only include what you need. So while the download may be 20mb, you may only include 2mb of libraries (which is still a lot). My biggest complaint with twisted is although they include examples, anything beyond a basic tcp server you're on your own.

虽然不是python解决方案,但我看到node.js最近获得了更多的吸引力。事实上,我考虑过在较小的项目中使用它,但当我听到javascript时,我只是畏缩:)

其他回答

有一本关于这个主题的好书:Abe Fettig写的《Twisted Network Programming Essentials》。这些示例展示了如何编写非常python化的代码,就我个人而言,它们并不基于臃肿的框架。看看书上的解,如果它们不干净,那我就不知道干净是什么意思。

我唯一的困惑是我对其他框架的困惑,比如Ruby。我担心,它会扩大吗?我不愿意将客户端提交给一个有可伸缩性问题的框架。

我喜欢并发Python模块,它依赖于Stackless Python微线程或Greenlets来实现轻量级线程。所有阻塞的网络I/O都是通过一个libevent循环透明地实现异步的,因此它应该与真正的异步服务器一样高效。

我认为它在这方面类似于Eventlet。

缺点是它的API与Python的套接字/线程模块有很大的不同;你需要重写你的应用程序的一部分(或写一个兼容性垫片层)

编辑:似乎还有cogen,它与cogen类似,但使用Python 2.5的增强生成器来实现协程,而不是greenlet。这使得它比并发和其他替代方案更具可移植性。网络I/O直接用epoll/kqueue/iocp完成。

扭曲是复杂的,你说得对。扭曲不是臃肿。

如果你看一下这里:http://twistedmatrix.com/trac/browser/trunk/twisted,你会发现一个有组织的、全面的、经过良好测试的internet协议套件,以及用于编写和部署非常复杂的网络应用程序的助手代码。我不会把膨胀和全面混为一谈。

众所周知,Twisted文档乍一看并不是最友好的,我相信这会让很多人望而却步。但如果你肯花时间的话,《Twisted》确实很棒。我这样做了,事实证明这是值得的,我建议其他人也试试。

wizzer是一个使用pyev的小型异步套接字框架。它非常快,主要是因为pyev。它试图提供一个类似的界面,只是做了一些细微的改变。

Nicholas Piël在他的博客上对这些框架进行了一个非常有趣的比较:非常值得一读!