python请求超时。获得完整的响应

我正在收集网站列表上的统计数据，为了简单起见，我正在使用请求。这是我的代码:

data=[]
websites=['http://google.com', 'http://bbc.co.uk']
for w in websites:
    r= requests.get(w, verify=False)
    data.append( (r.url, len(r.content), r.elapsed.total_seconds(), str([(l.status_code, l.url) for l in r.history]), str(r.headers.items()), str(r.cookies.items())) )

现在，我想要请求。10秒后进入超时，这样循环就不会卡住。

这个问题以前也很有趣，但没有一个答案是干净的。

我听说可能不使用请求是一个好主意，但我应该如何得到请求提供的好东西(元组中的那些)。

当前回答

尽管问题是关于请求的，但我发现使用pycurl CURLOPT_TIMEOUT或CURLOPT_TIMEOUT_MS很容易做到这一点。

不需要线程或信号:

import pycurl
import StringIO

url = 'http://www.example.com/example.zip'
timeout_ms = 1000
raw = StringIO.StringIO()
c = pycurl.Curl()
c.setopt(pycurl.TIMEOUT_MS, timeout_ms)  # total timeout in milliseconds
c.setopt(pycurl.WRITEFUNCTION, raw.write)
c.setopt(pycurl.NOSIGNAL, 1)
c.setopt(pycurl.URL, url)
c.setopt(pycurl.HTTPGET, 1)
try:
    c.perform()
except pycurl.error:
    traceback.print_exc() # error generated on timeout
    pass # or just pass if you don't want to print the error

2016-10-28 09:46:11

其他回答

设置超时参数:

r = requests.get(w, verify=False, timeout=10) # 10 seconds

2.25.1版的更改

如果读取之间的连接或延迟超过10秒，上面的代码将导致对requests.get()的调用超时。见:https://requests.readthedocs.io/en/stable/user/advanced/超时

2014-02-23 08:58:04

我相信你可以使用多处理，而不依赖于第三方包:

import multiprocessing
import requests

def call_with_timeout(func, args, kwargs, timeout):
    manager = multiprocessing.Manager()
    return_dict = manager.dict()

    # define a wrapper of `return_dict` to store the result.
    def function(return_dict):
        return_dict['value'] = func(*args, **kwargs)

    p = multiprocessing.Process(target=function, args=(return_dict,))
    p.start()

    # Force a max. `timeout` or wait for the process to finish
    p.join(timeout)

    # If thread is still active, it didn't finish: raise TimeoutError
    if p.is_alive():
        p.terminate()
        p.join()
        raise TimeoutError
    else:
        return return_dict['value']

call_with_timeout(requests.get, args=(url,), kwargs={'timeout': 10}, timeout=60)

传递给kwargs的超时是从服务器获取任何响应的超时，参数timeout是获取完整响应的超时。

2015-10-28 19:24:21

如果你使用选项stream=True，你可以这样做:

r = requests.get(
    'http://url_to_large_file',
    timeout=1,  # relevant only for underlying socket
    stream=True)

with open('/tmp/out_file.txt'), 'wb') as f:
    start_time = time.time()
    for chunk in r.iter_content(chunk_size=1024):
        if chunk:  # filter out keep-alive new chunks
            f.write(chunk)
        if time.time() - start_time > 8:
            raise Exception('Request took longer than 8s')

该解决方案不需要信号或多处理。

2018-04-24 13:17:41

我想到了一个更直接的解决方案，虽然很难看，但能解决真正的问题。它是这样的:

resp = requests.get(some_url, stream=True)
resp.raw._fp.fp._sock.settimeout(read_timeout)
# This will load the entire response even though stream is set
content = resp.content

你可以在这里阅读完整的解释

2015-12-03 18:29:58

最大的问题是，如果无法建立连接，请求包会等待太长时间，并阻塞程序的其余部分。

有几种方法来解决这个问题，但当我寻找类似请求的联机程序时，我找不到任何东西。这就是为什么我为请求构建了一个名为reqto(“请求超时”)的包装器，它支持来自请求的所有标准方法的适当超时。

pip install reqto

语法与请求相同

import reqto

response = reqto.get(f'https://pypi.org/pypi/reqto/json',timeout=1)
# Will raise an exception on Timeout
print(response)

此外，还可以设置自定义超时函数

def custom_function(parameter):
    print(parameter)


response = reqto.get(f'https://pypi.org/pypi/reqto/json',timeout=5,timeout_function=custom_function,timeout_args="Timeout custom function called")
#Will call timeout_function instead of raising an exception on Timeout
print(response)

重要的注意事项是导入行

import reqto

由于monkey_patch在后台运行，需要比所有其他导入更早地导入请求，线程等。

2022-03-18 04:01:40

python请求超时。获得完整的响应

推荐文章

最新文章

标签