我尝试了python请求库文档中提供的示例。

使用async.map(rs),我获得了响应代码,但我想获得所请求的每个页面的内容。例如,这是行不通的:

out = async.map(rs)
print out[0].content

当前回答

我测试了两个请求——未来请求和请求请求。Grequests速度更快,但会带来猴子补丁和依赖关系的其他问题。请求-期货比请求慢几倍。我决定编写自己的请求,并简单地将请求包装到ThreadPoolExecutor中,它几乎和grequest一样快,但没有外部依赖。

import requests
import concurrent.futures

def get_urls():
    return ["url1","url2"]

def load_url(url, timeout):
    return requests.get(url, timeout = timeout)

with concurrent.futures.ThreadPoolExecutor(max_workers=20) as executor:

    future_to_url = {executor.submit(load_url, url, 10): url for url in     get_urls()}
    for future in concurrent.futures.as_completed(future_to_url):
        url = future_to_url[future]
        try:
            data = future.result()
        except Exception as exc:
            resp_err = resp_err + 1
        else:
            resp_ok = resp_ok + 1

其他回答

我知道这已经关闭了一段时间,但我认为推广另一种基于请求库的异步解决方案可能是有用的。

list_of_requests = ['http://moop.com', 'http://doop.com', ...]

from simple_requests import Requests
for response in Requests().swarm(list_of_requests):
    print response.content

文档在这里:http://pythonhosted.org/simple-requests/

上面的答案都没有帮助我,因为他们假设你有一个预定义的请求列表,而在我的情况下,我需要能够侦听请求和异步响应(类似于它在nodejs中的工作方式)。

def handle_finished_request(r, **kwargs):
    print(r)


# while True:
def main():
    while True:
        address = listen_to_new_msg()  # based on your server

        # schedule async requests and run 'handle_finished_request' on response
        req = grequests.get(address, timeout=1, hooks=dict(response=handle_finished_request))
        job = grequests.send(req)  # does not block! for more info see https://stackoverflow.com/a/16016635/10577976


main()

handle_finished_request回调函数将在收到响应时被调用。注意:由于某些原因,超时(或无响应)在这里不会触发错误

这个简单的循环可以触发异步请求,类似于它在nodejs服务器中的工作方式

我测试了两个请求——未来请求和请求请求。Grequests速度更快,但会带来猴子补丁和依赖关系的其他问题。请求-期货比请求慢几倍。我决定编写自己的请求,并简单地将请求包装到ThreadPoolExecutor中,它几乎和grequest一样快,但没有外部依赖。

import requests
import concurrent.futures

def get_urls():
    return ["url1","url2"]

def load_url(url, timeout):
    return requests.get(url, timeout = timeout)

with concurrent.futures.ThreadPoolExecutor(max_workers=20) as executor:

    future_to_url = {executor.submit(load_url, url, 10): url for url in     get_urls()}
    for future in concurrent.futures.as_completed(future_to_url):
        url = future_to_url[future]
        try:
            data = future.result()
        except Exception as exc:
            resp_err = resp_err + 1
        else:
            resp_ok = resp_ok + 1

你可以使用httpx。

import httpx

async def get_async(url):
    async with httpx.AsyncClient() as client:
        return await client.get(url)

urls = ["http://google.com", "http://wikipedia.org"]

# Note that you need an async context to use `await`.
await asyncio.gather(*map(get_async, urls))

如果你想要一个函数式语法,gamla库将其包装到get_async中。

然后你就可以


await gamla.map(gamla.get_async(10))(["http://google.com", "http://wikipedia.org"])

10是超时时间,单位是秒。

(声明:我是作者)

声明:下面的代码为每个函数创建了不同的线程。

这对于某些情况可能是有用的,因为它使用起来更简单。但要知道,它不是异步的,但使用多线程会给人一种异步的错觉,尽管decorator建议这样做。

可以使用以下装饰器在函数执行完成后给出回调,回调必须处理函数返回的数据。

请注意,在函数被修饰后,它将返回一个Future对象。

import asyncio

## Decorator implementation of async runner !!
def run_async(callback, loop=None):
    if loop is None:
        loop = asyncio.get_event_loop()

    def inner(func):
        def wrapper(*args, **kwargs):
            def __exec():
                out = func(*args, **kwargs)
                callback(out)
                return out

            return loop.run_in_executor(None, __exec)

        return wrapper

    return inner

实现示例:

urls = ["https://google.com", "https://facebook.com", "https://apple.com", "https://netflix.com"]
loaded_urls = []  # OPTIONAL, used for showing realtime, which urls are loaded !!


def _callback(resp):
    print(resp.url)
    print(resp)
    loaded_urls.append((resp.url, resp))  # OPTIONAL, used for showing realtime, which urls are loaded !!


# Must provide a callback function, callback func will be executed after the func completes execution
# Callback function will accept the value returned by the function.
@run_async(_callback)
def get(url):
    return requests.get(url)


for url in urls:
    get(url)

如果你想看到实时加载的url,你可以在最后添加以下代码:

while True:
    print(loaded_urls)
    if len(loaded_urls) == len(urls):
        break