我正在收集网站列表上的统计数据,为了简单起见,我正在使用请求。这是我的代码:

data=[]
websites=['http://google.com', 'http://bbc.co.uk']
for w in websites:
    r= requests.get(w, verify=False)
    data.append( (r.url, len(r.content), r.elapsed.total_seconds(), str([(l.status_code, l.url) for l in r.history]), str(r.headers.items()), str(r.cookies.items())) )
 

现在,我想要请求。10秒后进入超时,这样循环就不会卡住。

这个问题以前也很有趣,但没有一个答案是干净的。

我听说可能不使用请求是一个好主意,但我应该如何得到请求提供的好东西(元组中的那些)。


当前回答

其他答案大多不正确

尽管有这么多的答案,我相信这个帖子仍然缺乏一个合适的解决方案,而且没有现有的答案可以提供一个合理的方法来做一些简单而明显的事情。

我们首先要说的是,截至2022年,仅凭请求仍然绝对无法正确地做到这一点。这是库开发人员有意识的设计决定。

利用超时参数的解决方案根本不能完成它们想要做的事情。事实上,乍一看,它“似乎”起作用纯粹是偶然的:

timeout参数与请求的总执行时间完全没有关系。它只是控制底层套接字接收任何数据之前可以通过的最大时间量。以5秒的超时为例,服务器也可以每4秒发送1字节的数据,这完全没问题,但对您的帮助不大。

带有stream和iter_content的答案稍好一些,但它们仍然不能覆盖请求中的所有内容。在发送响应头之前,您实际上不会从iter_content中接收到任何内容,这也属于相同的问题——即使您使用1字节作为iter_content的块大小,读取完整的响应头可能需要完全任意的时间,并且您永远无法实际到达从iter_content中读取任何响应体的位置。

下面是一些完全打破超时和基于流的方法的示例。都试试。不管你使用哪种方法,它们都是无限期地挂着的。

server.py

import socket
import time

server = socket.socket()

server.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, True)
server.bind(('127.0.0.1', 8080))

server.listen()

while True:
    try:
        sock, addr = server.accept()
        print('Connection from', addr)
        sock.send(b'HTTP/1.1 200 OK\r\n')

        # Send some garbage headers very slowly but steadily.
        # Never actually complete the response.

        while True:
            sock.send(b'a')
            time.sleep(1)
    except:
        pass

demo1.py

import requests

requests.get('http://localhost:8080')

demo2.py

import requests

requests.get('http://localhost:8080', timeout=5)

demo3.py

import requests

requests.get('http://localhost:8080', timeout=(5, 5))

demo4.py

import requests

with requests.get('http://localhost:8080', timeout=(5, 5), stream=True) as res:
    for chunk in res.iter_content(1):
        break

正确的解决方法

My approach utilizes Python's sys.settrace function. It is dead simple. You do not need to use any external libraries or turn your code upside down. Unlike most other answers, this actually guarantees that the code executes in specified time. Be aware that you still need to specify the timeout parameter, as settrace only concerns Python code. Actual socket reads are external syscalls which are not covered by settrace, but are covered by the timeout parameter. Due to this fact, the exact time limit is not TOTAL_TIMEOUT, but a value which is explained in comments below.

import requests
import sys
import time

# This function serves as a "hook" that executes for each Python statement
# down the road. There may be some performance penalty, but as downloading
# a webpage is mostly I/O bound, it's not going to be significant.

def trace_function(frame, event, arg):
    if time.time() - start > TOTAL_TIMEOUT:
        raise Exception('Timed out!') # Use whatever exception you consider appropriate.

    return trace_function

# The following code will terminate at most after TOTAL_TIMEOUT + the highest
# value specified in `timeout` parameter of `requests.get`.
# In this case 10 + 6 = 16 seconds.
# For most cases though, it's gonna terminate no later than TOTAL_TIMEOUT.

TOTAL_TIMEOUT = 10

start = time.time()

sys.settrace(trace_function)

try:
    res = requests.get('http://localhost:8080', timeout=(3, 6)) # Use whatever timeout values you consider appropriate.
except:
    raise
finally:
    sys.settrace(None) # Remove the time constraint and continue normally.

# Do something with the response

浓缩

import requests, sys, time

TOTAL_TIMEOUT = 10

def trace_function(frame, event, arg):
    if time.time() - start > TOTAL_TIMEOUT:
        raise Exception('Timed out!')

    return trace_function

start = time.time()
sys.settrace(trace_function)

try:
    res = requests.get('http://localhost:8080', timeout=(3, 6))
except:
    raise
finally:
    sys.settrace(None)

就是这样!

其他回答

要创建超时,您可以使用信号。

解决这个案子最好的办法可能是

设置一个异常作为告警信号的处理程序 延迟十秒发出警报信号 在try-except-finally块中调用函数。 如果函数超时,则到达except块。 在finally块中,你中止了警报,所以它不会在以后发出信号。

下面是一些示例代码:

import signal
from time import sleep

class TimeoutException(Exception):
    """ Simple Exception to be called on timeouts. """
    pass

def _timeout(signum, frame):
    """ Raise an TimeoutException.

    This is intended for use as a signal handler.
    The signum and frame arguments passed to this are ignored.

    """
    # Raise TimeoutException with system default timeout message
    raise TimeoutException()

# Set the handler for the SIGALRM signal:
signal.signal(signal.SIGALRM, _timeout)
# Send the SIGALRM signal in 10 seconds:
signal.alarm(10)

try:    
    # Do our code:
    print('This will take 11 seconds...')
    sleep(11)
    print('done!')
except TimeoutException:
    print('It timed out!')
finally:
    # Abort the sending of the SIGALRM signal:
    signal.alarm(0)

这里有一些注意事项:

它不是线程安全的,信号总是传递到主线程,所以你不能把它放在任何其他线程中。 在调度信号和执行实际代码之后会有一个轻微的延迟。这意味着示例即使只休眠了10秒也会超时。

但是,这些都在标准python库中!除了sleep函数导入,它只是一个导入。如果你要在很多地方使用超时,你可以很容易地把TimeoutException, _timeout和singaling放在一个函数中,然后调用它。或者你可以创建一个装饰器,并把它放在函数上,请看下面链接的答案。

你也可以将它设置为“上下文管理器”,这样你就可以在with语句中使用它:

import signal
class Timeout():
    """ Timeout for use with the `with` statement. """

    class TimeoutException(Exception):
        """ Simple Exception to be called on timeouts. """
        pass

    def _timeout(signum, frame):
        """ Raise an TimeoutException.

        This is intended for use as a signal handler.
        The signum and frame arguments passed to this are ignored.

        """
        raise Timeout.TimeoutException()

    def __init__(self, timeout=10):
        self.timeout = timeout
        signal.signal(signal.SIGALRM, Timeout._timeout)

    def __enter__(self):
        signal.alarm(self.timeout)

    def __exit__(self, exc_type, exc_value, traceback):
        signal.alarm(0)
        return exc_type is Timeout.TimeoutException

# Demonstration:
from time import sleep

print('This is going to take maximum 10 seconds...')
with Timeout(10):
    sleep(15)
    print('No timeout?')
print('Done')

这种上下文管理器方法的一个可能的缺点是,您无法知道代码是否实际超时。

资料来源及推荐阅读:

关于信号的文档 这是@David Narayan对暂停的回答。他以装饰者的身份组织了上面的代码。

尝试这个请求的超时和错误处理:

import requests
try: 
    url = "http://google.com"
    r = requests.get(url, timeout=10)
except requests.exceptions.Timeout as e: 
    print e

其他答案大多不正确

尽管有这么多的答案,我相信这个帖子仍然缺乏一个合适的解决方案,而且没有现有的答案可以提供一个合理的方法来做一些简单而明显的事情。

我们首先要说的是,截至2022年,仅凭请求仍然绝对无法正确地做到这一点。这是库开发人员有意识的设计决定。

利用超时参数的解决方案根本不能完成它们想要做的事情。事实上,乍一看,它“似乎”起作用纯粹是偶然的:

timeout参数与请求的总执行时间完全没有关系。它只是控制底层套接字接收任何数据之前可以通过的最大时间量。以5秒的超时为例,服务器也可以每4秒发送1字节的数据,这完全没问题,但对您的帮助不大。

带有stream和iter_content的答案稍好一些,但它们仍然不能覆盖请求中的所有内容。在发送响应头之前,您实际上不会从iter_content中接收到任何内容,这也属于相同的问题——即使您使用1字节作为iter_content的块大小,读取完整的响应头可能需要完全任意的时间,并且您永远无法实际到达从iter_content中读取任何响应体的位置。

下面是一些完全打破超时和基于流的方法的示例。都试试。不管你使用哪种方法,它们都是无限期地挂着的。

server.py

import socket
import time

server = socket.socket()

server.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, True)
server.bind(('127.0.0.1', 8080))

server.listen()

while True:
    try:
        sock, addr = server.accept()
        print('Connection from', addr)
        sock.send(b'HTTP/1.1 200 OK\r\n')

        # Send some garbage headers very slowly but steadily.
        # Never actually complete the response.

        while True:
            sock.send(b'a')
            time.sleep(1)
    except:
        pass

demo1.py

import requests

requests.get('http://localhost:8080')

demo2.py

import requests

requests.get('http://localhost:8080', timeout=5)

demo3.py

import requests

requests.get('http://localhost:8080', timeout=(5, 5))

demo4.py

import requests

with requests.get('http://localhost:8080', timeout=(5, 5), stream=True) as res:
    for chunk in res.iter_content(1):
        break

正确的解决方法

My approach utilizes Python's sys.settrace function. It is dead simple. You do not need to use any external libraries or turn your code upside down. Unlike most other answers, this actually guarantees that the code executes in specified time. Be aware that you still need to specify the timeout parameter, as settrace only concerns Python code. Actual socket reads are external syscalls which are not covered by settrace, but are covered by the timeout parameter. Due to this fact, the exact time limit is not TOTAL_TIMEOUT, but a value which is explained in comments below.

import requests
import sys
import time

# This function serves as a "hook" that executes for each Python statement
# down the road. There may be some performance penalty, but as downloading
# a webpage is mostly I/O bound, it's not going to be significant.

def trace_function(frame, event, arg):
    if time.time() - start > TOTAL_TIMEOUT:
        raise Exception('Timed out!') # Use whatever exception you consider appropriate.

    return trace_function

# The following code will terminate at most after TOTAL_TIMEOUT + the highest
# value specified in `timeout` parameter of `requests.get`.
# In this case 10 + 6 = 16 seconds.
# For most cases though, it's gonna terminate no later than TOTAL_TIMEOUT.

TOTAL_TIMEOUT = 10

start = time.time()

sys.settrace(trace_function)

try:
    res = requests.get('http://localhost:8080', timeout=(3, 6)) # Use whatever timeout values you consider appropriate.
except:
    raise
finally:
    sys.settrace(None) # Remove the time constraint and continue normally.

# Do something with the response

浓缩

import requests, sys, time

TOTAL_TIMEOUT = 10

def trace_function(frame, event, arg):
    if time.time() - start > TOTAL_TIMEOUT:
        raise Exception('Timed out!')

    return trace_function

start = time.time()
sys.settrace(trace_function)

try:
    res = requests.get('http://localhost:8080', timeout=(3, 6))
except:
    raise
finally:
    sys.settrace(None)

就是这样!

连接超时是请求等待客户端建立到远程机器的连接(对应于套接字上的connect()调用)的秒数。将连接超时设置为略大于3的倍数是一个很好的实践,3是默认的TCP数据包重传窗口。

一旦客户端连接到服务器并发送了HTTP请求,读超时就开始了。它是客户端等待服务器发送响应的秒数。(具体来说,它是客户端在从服务器发送字节之间等待的秒数。在99.9%的情况下,这是服务器发送第一个字节之前的时间)。

如果您为超时指定了一个值,则该超时值将应用于连接超时和读取超时。像下图:

r = requests.get('https://github.com', timeout=5)

如果你想分别设置connect和read的值,请指定一个元组:

r = requests.get('https://github.com', timeout=(3.05, 27))

如果远程服务器非常慢,您可以告诉Requests永远等待响应,方法是将None作为超时值,然后检索一杯咖啡。

r = requests.get('https://github.com', timeout=None)

https://docs.python-requests.org/en/latest/user/advanced/#timeouts

我使用请求2.2.1和eventlet不适合我。相反,我可以使用gevent超时代替,因为gevent在我的服务中用于gunicorn。

import gevent
import gevent.monkey
gevent.monkey.patch_all(subprocess=True)
try:
    with gevent.Timeout(5):
        ret = requests.get(url)
        print ret.status_code, ret.content
except gevent.timeout.Timeout as e:
    print "timeout: {}".format(e.message)

请注意geevent .timeout. timeout不会被常规异常处理捕获。 所以要么显式地捕获getevent。timeout。timeout 或者传入一个不同的异常,像这样使用:with gevent。Timeout(5, requests.exceptions.Timeout):尽管在引发此异常时没有传递任何消息。