我尝试了python请求库文档中提供的示例。

使用async.map(rs),我获得了响应代码,但我想获得所请求的每个页面的内容。例如,这是行不通的:

out = async.map(rs)
print out[0].content

当前回答

上面的答案都没有帮助我,因为他们假设你有一个预定义的请求列表,而在我的情况下,我需要能够侦听请求和异步响应(类似于它在nodejs中的工作方式)。

def handle_finished_request(r, **kwargs):
    print(r)


# while True:
def main():
    while True:
        address = listen_to_new_msg()  # based on your server

        # schedule async requests and run 'handle_finished_request' on response
        req = grequests.get(address, timeout=1, hooks=dict(response=handle_finished_request))
        job = grequests.send(req)  # does not block! for more info see https://stackoverflow.com/a/16016635/10577976


main()

handle_finished_request回调函数将在收到响应时被调用。注意:由于某些原因,超时(或无响应)在这里不会触发错误

这个简单的循环可以触发异步请求,类似于它在nodejs服务器中的工作方式

其他回答

我知道这已经关闭了一段时间,但我认为推广另一种基于请求库的异步解决方案可能是有用的。

list_of_requests = ['http://moop.com', 'http://doop.com', ...]

from simple_requests import Requests
for response in Requests().swarm(list_of_requests):
    print response.content

文档在这里:http://pythonhosted.org/simple-requests/

我测试了两个请求——未来请求和请求请求。Grequests速度更快,但会带来猴子补丁和依赖关系的其他问题。请求-期货比请求慢几倍。我决定编写自己的请求,并简单地将请求包装到ThreadPoolExecutor中,它几乎和grequest一样快,但没有外部依赖。

import requests
import concurrent.futures

def get_urls():
    return ["url1","url2"]

def load_url(url, timeout):
    return requests.get(url, timeout = timeout)

with concurrent.futures.ThreadPoolExecutor(max_workers=20) as executor:

    future_to_url = {executor.submit(load_url, url, 10): url for url in     get_urls()}
    for future in concurrent.futures.as_completed(future_to_url):
        url = future_to_url[future]
        try:
            data = future.result()
        except Exception as exc:
            resp_err = resp_err + 1
        else:
            resp_ok = resp_ok + 1

Note

下面的答案不适用于v0.13.0+请求。在写完这个问题之后,异步功能被移到了请求中。但是,您可以用下面的请求替换请求,它应该可以工作。

我保留这个答案,以反映最初的问题,即使用请求< v0.13.0。


异步完成多个任务。异步映射你必须:

为每个对象(任务)定义一个函数 将该函数作为事件钩子添加到请求中 调用异步。映射到所有请求/操作的列表上

例子:

from requests import async
# If using requests > v0.13.0, use
# from grequests import async

urls = [
    'http://python-requests.org',
    'http://httpbin.org',
    'http://python-guide.org',
    'http://kennethreitz.com'
]

# A simple task to do to each response object
def do_something(response):
    print response.url

# A list to hold our things to do via async
async_list = []

for u in urls:
    # The "hooks = {..." part is where you define what you want to do
    # 
    # Note the lack of parentheses following do_something, this is
    # because the response will be used as the first argument automatically
    action_item = async.get(u, hooks = {'response' : do_something})

    # Add the task to our list of things to do via async
    async_list.append(action_item)

# Do our list of things to do via async
async.map(async_list)

你可以使用httpx。

import httpx

async def get_async(url):
    async with httpx.AsyncClient() as client:
        return await client.get(url)

urls = ["http://google.com", "http://wikipedia.org"]

# Note that you need an async context to use `await`.
await asyncio.gather(*map(get_async, urls))

如果你想要一个函数式语法,gamla库将其包装到get_async中。

然后你就可以


await gamla.map(gamla.get_async(10))(["http://google.com", "http://wikipedia.org"])

10是超时时间,单位是秒。

(声明:我是作者)

不幸的是,据我所知,请求库不具备执行异步请求的能力。您可以在请求周围包装async/await语法,但这将使底层请求的同步性不会降低。如果您想要真正的异步请求,则必须使用其他提供异步请求的工具。其中一个解决方案是aiohttp (Python 3.5.3+)。根据我在Python 3.7 async/await语法中使用它的经验,它工作得很好。下面我写了执行n个web请求的三个实现

使用Python请求库的纯同步请求(sync_requests_get_all) 同步请求(async_requests_get_all),使用Python 3.7中包装的Python请求库async/await语法和asyncio 一个真正的异步实现(async_aiohttp_get_all), Python aiohttp库封装在Python 3.7 async/await语法和asyncio中

"""
Tested in Python 3.5.10
"""

import time
import asyncio
import requests
import aiohttp

from asgiref import sync

def timed(func):
    """
    records approximate durations of function calls
    """
    def wrapper(*args, **kwargs):
        start = time.time()
        print('{name:<30} started'.format(name=func.__name__))
        result = func(*args, **kwargs)
        duration = "{name:<30} finished in {elapsed:.2f} seconds".format(
            name=func.__name__, elapsed=time.time() - start
        )
        print(duration)
        timed.durations.append(duration)
        return result
    return wrapper

timed.durations = []


@timed
def sync_requests_get_all(urls):
    """
    performs synchronous get requests
    """
    # use session to reduce network overhead
    session = requests.Session()
    return [session.get(url).json() for url in urls]


@timed
def async_requests_get_all(urls):
    """
    asynchronous wrapper around synchronous requests
    """
    session = requests.Session()
    # wrap requests.get into an async function
    def get(url):
        return session.get(url).json()
    async_get = sync.sync_to_async(get)

    async def get_all(urls):
        return await asyncio.gather(*[
            async_get(url) for url in urls
        ])
    # call get_all as a sync function to be used in a sync context
    return sync.async_to_sync(get_all)(urls)

@timed
def async_aiohttp_get_all(urls):
    """
    performs asynchronous get requests
    """
    async def get_all(urls):
        async with aiohttp.ClientSession() as session:
            async def fetch(url):
                async with session.get(url) as response:
                    return await response.json()
            return await asyncio.gather(*[
                fetch(url) for url in urls
            ])
    # call get_all as a sync function to be used in a sync context
    return sync.async_to_sync(get_all)(urls)


if __name__ == '__main__':
    # this endpoint takes ~3 seconds to respond,
    # so a purely synchronous implementation should take
    # little more than 30 seconds and a purely asynchronous
    # implementation should take little more than 3 seconds.
    urls = ['https://postman-echo.com/delay/3']*10

    async_aiohttp_get_all(urls)
    async_requests_get_all(urls)
    sync_requests_get_all(urls)
    print('----------------------')
    [print(duration) for duration in timed.durations]

在我的机器上,这是输出:

async_aiohttp_get_all          started
async_aiohttp_get_all          finished in 3.20 seconds
async_requests_get_all         started
async_requests_get_all         finished in 30.61 seconds
sync_requests_get_all          started
sync_requests_get_all          finished in 30.59 seconds
----------------------
async_aiohttp_get_all          finished in 3.20 seconds
async_requests_get_all         finished in 30.61 seconds
sync_requests_get_all          finished in 30.59 seconds