其他答案大多不正确
尽管有这么多的答案,我相信这个帖子仍然缺乏一个合适的解决方案,而且没有现有的答案可以提供一个合理的方法来做一些简单而明显的事情。
我们首先要说的是,截至2022年,仅凭请求仍然绝对无法正确地做到这一点。这是库开发人员有意识的设计决定。
利用超时参数的解决方案根本不能完成它们想要做的事情。事实上,乍一看,它“似乎”起作用纯粹是偶然的:
timeout参数与请求的总执行时间完全没有关系。它只是控制底层套接字接收任何数据之前可以通过的最大时间量。以5秒的超时为例,服务器也可以每4秒发送1字节的数据,这完全没问题,但对您的帮助不大。
带有stream和iter_content的答案稍好一些,但它们仍然不能覆盖请求中的所有内容。在发送响应头之前,您实际上不会从iter_content中接收到任何内容,这也属于相同的问题——即使您使用1字节作为iter_content的块大小,读取完整的响应头可能需要完全任意的时间,并且您永远无法实际到达从iter_content中读取任何响应体的位置。
下面是一些完全打破超时和基于流的方法的示例。都试试。不管你使用哪种方法,它们都是无限期地挂着的。
server.py
import socket
import time
server = socket.socket()
server.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, True)
server.bind(('127.0.0.1', 8080))
server.listen()
while True:
try:
sock, addr = server.accept()
print('Connection from', addr)
sock.send(b'HTTP/1.1 200 OK\r\n')
# Send some garbage headers very slowly but steadily.
# Never actually complete the response.
while True:
sock.send(b'a')
time.sleep(1)
except:
pass
demo1.py
import requests
requests.get('http://localhost:8080')
demo2.py
import requests
requests.get('http://localhost:8080', timeout=5)
demo3.py
import requests
requests.get('http://localhost:8080', timeout=(5, 5))
demo4.py
import requests
with requests.get('http://localhost:8080', timeout=(5, 5), stream=True) as res:
for chunk in res.iter_content(1):
break
正确的解决方法
My approach utilizes Python's sys.settrace function. It is dead simple. You do not need to use any external libraries or turn your code upside down. Unlike most other answers, this actually guarantees that the code executes in specified time. Be aware that you still need to specify the timeout parameter, as settrace only concerns Python code. Actual socket reads are external syscalls which are not covered by settrace, but are covered by the timeout parameter. Due to this fact, the exact time limit is not TOTAL_TIMEOUT, but a value which is explained in comments below.
import requests
import sys
import time
# This function serves as a "hook" that executes for each Python statement
# down the road. There may be some performance penalty, but as downloading
# a webpage is mostly I/O bound, it's not going to be significant.
def trace_function(frame, event, arg):
if time.time() - start > TOTAL_TIMEOUT:
raise Exception('Timed out!') # Use whatever exception you consider appropriate.
return trace_function
# The following code will terminate at most after TOTAL_TIMEOUT + the highest
# value specified in `timeout` parameter of `requests.get`.
# In this case 10 + 6 = 16 seconds.
# For most cases though, it's gonna terminate no later than TOTAL_TIMEOUT.
TOTAL_TIMEOUT = 10
start = time.time()
sys.settrace(trace_function)
try:
res = requests.get('http://localhost:8080', timeout=(3, 6)) # Use whatever timeout values you consider appropriate.
except:
raise
finally:
sys.settrace(None) # Remove the time constraint and continue normally.
# Do something with the response
浓缩
import requests, sys, time
TOTAL_TIMEOUT = 10
def trace_function(frame, event, arg):
if time.time() - start > TOTAL_TIMEOUT:
raise Exception('Timed out!')
return trace_function
start = time.time()
sys.settrace(trace_function)
try:
res = requests.get('http://localhost:8080', timeout=(3, 6))
except:
raise
finally:
sys.settrace(None)
就是这样!