我尝试了python请求库文档中提供的示例。

使用async.map(rs),我获得了响应代码,但我想获得所请求的每个页面的内容。例如,这是行不通的:

out = async.map(rs)
print out[0].content

当前回答

不幸的是,据我所知,请求库不具备执行异步请求的能力。您可以在请求周围包装async/await语法,但这将使底层请求的同步性不会降低。如果您想要真正的异步请求,则必须使用其他提供异步请求的工具。其中一个解决方案是aiohttp (Python 3.5.3+)。根据我在Python 3.7 async/await语法中使用它的经验,它工作得很好。下面我写了执行n个web请求的三个实现

使用Python请求库的纯同步请求(sync_requests_get_all) 同步请求(async_requests_get_all),使用Python 3.7中包装的Python请求库async/await语法和asyncio 一个真正的异步实现(async_aiohttp_get_all), Python aiohttp库封装在Python 3.7 async/await语法和asyncio中

"""
Tested in Python 3.5.10
"""

import time
import asyncio
import requests
import aiohttp

from asgiref import sync

def timed(func):
    """
    records approximate durations of function calls
    """
    def wrapper(*args, **kwargs):
        start = time.time()
        print('{name:<30} started'.format(name=func.__name__))
        result = func(*args, **kwargs)
        duration = "{name:<30} finished in {elapsed:.2f} seconds".format(
            name=func.__name__, elapsed=time.time() - start
        )
        print(duration)
        timed.durations.append(duration)
        return result
    return wrapper

timed.durations = []


@timed
def sync_requests_get_all(urls):
    """
    performs synchronous get requests
    """
    # use session to reduce network overhead
    session = requests.Session()
    return [session.get(url).json() for url in urls]


@timed
def async_requests_get_all(urls):
    """
    asynchronous wrapper around synchronous requests
    """
    session = requests.Session()
    # wrap requests.get into an async function
    def get(url):
        return session.get(url).json()
    async_get = sync.sync_to_async(get)

    async def get_all(urls):
        return await asyncio.gather(*[
            async_get(url) for url in urls
        ])
    # call get_all as a sync function to be used in a sync context
    return sync.async_to_sync(get_all)(urls)

@timed
def async_aiohttp_get_all(urls):
    """
    performs asynchronous get requests
    """
    async def get_all(urls):
        async with aiohttp.ClientSession() as session:
            async def fetch(url):
                async with session.get(url) as response:
                    return await response.json()
            return await asyncio.gather(*[
                fetch(url) for url in urls
            ])
    # call get_all as a sync function to be used in a sync context
    return sync.async_to_sync(get_all)(urls)


if __name__ == '__main__':
    # this endpoint takes ~3 seconds to respond,
    # so a purely synchronous implementation should take
    # little more than 30 seconds and a purely asynchronous
    # implementation should take little more than 3 seconds.
    urls = ['https://postman-echo.com/delay/3']*10

    async_aiohttp_get_all(urls)
    async_requests_get_all(urls)
    sync_requests_get_all(urls)
    print('----------------------')
    [print(duration) for duration in timed.durations]

在我的机器上,这是输出:

async_aiohttp_get_all          started
async_aiohttp_get_all          finished in 3.20 seconds
async_requests_get_all         started
async_requests_get_all         finished in 30.61 seconds
sync_requests_get_all          started
sync_requests_get_all          finished in 30.59 seconds
----------------------
async_aiohttp_get_all          finished in 3.20 seconds
async_requests_get_all         finished in 30.61 seconds
sync_requests_get_all          finished in 30.59 seconds

其他回答

from threading import Thread

threads=list()

for requestURI in requests:
    t = Thread(target=self.openURL, args=(requestURI,))
    t.start()
    threads.append(t)

for thread in threads:
    thread.join()

...

def openURL(self, requestURI):
    o = urllib2.urlopen(requestURI, timeout = 600)
    o...

你可以使用httpx。

import httpx

async def get_async(url):
    async with httpx.AsyncClient() as client:
        return await client.get(url)

urls = ["http://google.com", "http://wikipedia.org"]

# Note that you need an async context to use `await`.
await asyncio.gather(*map(get_async, urls))

如果你想要一个函数式语法,gamla库将其包装到get_async中。

然后你就可以


await gamla.map(gamla.get_async(10))(["http://google.com", "http://wikipedia.org"])

10是超时时间,单位是秒。

(声明:我是作者)

我赞同上述使用HTTPX的建议,但我经常以不同的方式使用它,所以我补充了我的答案。

我个人使用asyncio.run(在Python 3.7中引入)而不是asyncio。收集,也更喜欢aiostream方法,它可以与asyncio和httpx结合使用。

就像我刚刚发布的这个例子一样,这种风格对于异步处理一组url很有帮助,尽管(常见的)错误发生了。我特别喜欢这种风格如何阐明响应处理发生在哪里,以及如何简化错误处理(我发现异步调用倾向于提供更多的错误处理)。

发布一个简单的异步发出一堆请求的例子更容易,但通常您还想处理响应内容(用它计算一些东西,可能引用您请求的URL要处理的原始对象)。

这种方法的核心是:

async with httpx.AsyncClient(timeout=timeout) as session:
    ws = stream.repeat(session)
    xs = stream.zip(ws, stream.iterate(urls))
    ys = stream.starmap(xs, fetch, ordered=False, task_limit=20)
    process = partial(process_thing, things=things, pbar=pbar, verbose=verbose)
    zs = stream.map(ys, process)
    return await zs

地点:

Process_thing是一个异步响应内容处理函数 things是输入列表(URL字符串的URL生成器来自于此),例如对象/字典列表 Pbar是一个进度条(例如tqdm.tqdm)[可选但有用]

所有这些都放在一个async_fetch_urlset异步函数中,然后通过调用一个名为fetch_things的同步“顶级”函数来运行,该函数运行协程[这是async函数返回的内容]并管理事件循环:

def fetch_things(urls, things, pbar=None, verbose=False):
    return asyncio.run(async_fetch_urlset(urls, things, pbar, verbose))

由于作为输入传递的列表(这里是things)可以就地修改,因此可以有效地获得返回的输出(就像我们从同步函数调用中习惯的那样)

Async现在是一个独立的模块:grequests。

请看这里:https://github.com/kennethreitz/grequests

还有:通过Python发送多个HTTP请求的理想方法?

安装:

$ pip install grequests

用法:

建立一个堆栈:

import grequests

urls = [
    'http://www.heroku.com',
    'http://tablib.org',
    'http://httpbin.org',
    'http://python-requests.org',
    'http://kennethreitz.com'
]

rs = (grequests.get(u) for u in urls)

发送堆栈

grequests.map(rs)

结果如下所示

[<Response [200]>, <Response [200]>, <Response [200]>, <Response [200]>, <Response [200]>]

grequest似乎没有设置并发请求的限制,即当多个请求被发送到同一个服务器时。

我已经使用python请求异步调用github的gist API有一段时间了。

举个例子,请看下面的代码:

https://github.com/davidthewatson/flasgist/blob/master/views.py#L60-72

这种风格的python可能不是最清晰的例子,但我可以向您保证代码是有效的。如果这让你感到困惑,请告诉我,我会记录下来。