我正在收集网站列表上的统计数据,为了简单起见,我正在使用请求。这是我的代码:
data=[]
websites=['http://google.com', 'http://bbc.co.uk']
for w in websites:
r= requests.get(w, verify=False)
data.append( (r.url, len(r.content), r.elapsed.total_seconds(), str([(l.status_code, l.url) for l in r.history]), str(r.headers.items()), str(r.cookies.items())) )
现在,我想要请求。10秒后进入超时,这样循环就不会卡住。
这个问题以前也很有趣,但没有一个答案是干净的。
我听说可能不使用请求是一个好主意,但我应该如何得到请求提供的好东西(元组中的那些)。
尽管问题是关于请求的,但我发现使用pycurl CURLOPT_TIMEOUT或CURLOPT_TIMEOUT_MS很容易做到这一点。
不需要线程或信号:
import pycurl
import StringIO
url = 'http://www.example.com/example.zip'
timeout_ms = 1000
raw = StringIO.StringIO()
c = pycurl.Curl()
c.setopt(pycurl.TIMEOUT_MS, timeout_ms) # total timeout in milliseconds
c.setopt(pycurl.WRITEFUNCTION, raw.write)
c.setopt(pycurl.NOSIGNAL, 1)
c.setopt(pycurl.URL, url)
c.setopt(pycurl.HTTPGET, 1)
try:
c.perform()
except pycurl.error:
traceback.print_exc() # error generated on timeout
pass # or just pass if you don't want to print the error
尽管问题是关于请求的,但我发现使用pycurl CURLOPT_TIMEOUT或CURLOPT_TIMEOUT_MS很容易做到这一点。
不需要线程或信号:
import pycurl
import StringIO
url = 'http://www.example.com/example.zip'
timeout_ms = 1000
raw = StringIO.StringIO()
c = pycurl.Curl()
c.setopt(pycurl.TIMEOUT_MS, timeout_ms) # total timeout in milliseconds
c.setopt(pycurl.WRITEFUNCTION, raw.write)
c.setopt(pycurl.NOSIGNAL, 1)
c.setopt(pycurl.URL, url)
c.setopt(pycurl.HTTPGET, 1)
try:
c.perform()
except pycurl.error:
traceback.print_exc() # error generated on timeout
pass # or just pass if you don't want to print the error
设置stream=True并使用r.iter_content(1024)。是的,eventlet。我就是不喜欢超时。
try:
start = time()
timeout = 5
with get(config['source']['online'], stream=True, timeout=timeout) as r:
r.raise_for_status()
content = bytes()
content_gen = r.iter_content(1024)
while True:
if time()-start > timeout:
raise TimeoutError('Time out! ({} seconds)'.format(timeout))
try:
content += next(content_gen)
except StopIteration:
break
data = content.decode().split('\n')
if len(data) in [0, 1]:
raise ValueError('Bad requests data')
except (exceptions.RequestException, ValueError, IndexError, KeyboardInterrupt,
TimeoutError) as e:
print(e)
with open(config['source']['local']) as f:
data = [line.strip() for line in f.readlines()]
讨论在这里https://redd.it/80kp1h