很久以前,我写了一个web-spider,我用多线程使并发请求同时发生。那是在我年轻时使用Python的时候,那时我还不知道GIL以及它给多线程代码带来的麻烦(IE,大多数时候这些东西最终都被序列化了!)……

我想重做这段代码,使其更健壮,性能更好。我基本上有两种方法可以做到这一点:我可以使用2.6+中新的多处理模块,或者我可以使用某种基于反应堆/事件的模型。我宁愿选择后者,因为它更简单,更不容易出错。

所以这个问题涉及到什么样的框架最适合我的需求。以下是我目前所知道的一些选择:

Twisted: The granddaddy of Python reactor frameworks: seems complex and a bit bloated however. Steep learning curve for a small task. Eventlet: From the guys at lindenlab. Greenlet based framework that's geared towards these kinds of tasks. I had a look at the code though and it's not too pretty: non-pep8 compliant, scattered with prints (why do people do this in a framework!?), API seems a little inconsistent. PyEv: Immature, doesn't seem to be anyone using it right now though it is based on libevent so it's got a solid backend. asyncore: From the stdlib: über low-level, seems like a lot of legwork involved just to get something off the ground. tornado: Though this is a server oriented product designed to server dynamic websites it does feature an async HTTP client and a simple ioloop. Looks like it could get the job done but not what it was intended for. [edit: doesn't run on Windows unfortunately, which counts it out for me - its a requirement for me to support this lame platform]

我还遗漏了什么吗?当然,一定有一个库在那里,适合一个简化异步网络库的甜蜜点!

[edit: big thanks to intgr for his pointer to this page. If you scroll to the bottom you will see there is a really nice list of projects that aim to tackle this task in one way or another. It seems actually that things have indeed moved on since the inception of Twisted: people now seem to favour a co-routine based solution rather than a traditional reactor / callback oriented one. The benefits of this approach are clearer more direct code: I've certainly found in the past, especially when working with boost.asio in C++ that callback based code can lead to designs that can be hard-to-follow and are relatively obscure to the untrained eye. Using co-routines allows you to write code that looks a little more synchronous at least. I guess now my task is to work out which one of these many libraries I like the look of and give it a go! Glad I asked now...]

[编辑:可能对任何关注或无意中发现这个问题或在任何意义上关心这个主题的人都感兴趣:我发现了一篇关于这项工作可用工具的当前状态的非常棒的文章]


当前回答

我不会把Twisted说成是臃肿的,但你很难理解它。我有很长一段时间都没有真正地沉浸在学习中,因为我总是想要一些更容易完成“小任务”的东西。

然而,现在我已经与它一起工作了一些,我不得不说,所有的电池都包括在内是非常好的。

我使用过的所有其他异步库最终都没有它们看起来那么成熟。Twisted的事件循环是可靠的。

我不太确定如何解决陡峭的扭曲的学习曲线。如果有人能把它分叉并清理一些东西,比如移除所有向后兼容的麻烦和死掉的项目,这可能会有所帮助。但我想这就是成熟软件的本质。

其他回答

卡马利亚还没有被提到。它的并发模型基于在收件箱和发件箱之间通过消息传递将组件连接在一起。下面是一个简要的概述。

扭曲是复杂的,你说得对。扭曲不是臃肿。

如果你看一下这里:http://twistedmatrix.com/trac/browser/trunk/twisted,你会发现一个有组织的、全面的、经过良好测试的internet协议套件,以及用于编写和部署非常复杂的网络应用程序的助手代码。我不会把膨胀和全面混为一谈。

众所周知,Twisted文档乍一看并不是最友好的,我相信这会让很多人望而却步。但如果你肯花时间的话,《Twisted》确实很棒。我这样做了,事实证明这是值得的,我建议其他人也试试。

你也可以试试Syncless。它是基于协程的(所以它类似于concurrent, Eventlet和gevent)。它实现了对套接字的插入式非阻塞替换。插座,插座。Gethostbyname(等),ssl。SSLSocket、时间。睡觉,然后选择。选择。这是太快了。它需要Stackless Python和libevent。它包含一个用C (Pyrex/Cython)编写的强制Python扩展。

如果你只是想要一个简化的、轻量级的HTTP请求库,那么我发现Unirest真的很好

Nicholas Piël在他的博客上对这些框架进行了一个非常有趣的比较:非常值得一读!