我试图获得应用商店>业务的内容:

import requests
from lxml import html

page = requests.get("https://itunes.apple.com/in/genre/ios-business/id6000?mt=8")
tree = html.fromstring(page.text)

flist = []
plist = []
for i in range(0, 100):
    app = tree.xpath("//div[@class='column first']/ul/li/a/@href")
    ap = app[0]
    page1 = requests.get(ap)

当我尝试(0,2)的范围,它工作,但当我把范围在100,它显示这个错误:

Traceback (most recent call last):
  File "/home/preetham/Desktop/eg.py", line 17, in <module>
    page1 = requests.get(ap)
  File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 55, in get
    return request('get', url, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 44, in request
    return session.request(method=method, url=url, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 383, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 486, in send
    r = adapter.send(request, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests/adapters.py", line 378, in send
    raise ConnectionError(e)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='itunes.apple.com', port=443): Max retries exceeded with url: /in/app/adobe-reader/id469337564?mt=8 (Caused by <class 'socket.gaierror'>: [Errno -2] Name or service not known)

当前回答

在公司环境中指定代理为我解决了这个问题。

page = requests.get("http://www.google.com:80", proxies={"http": "http://111.233.225.166:1234"})

完整的错误是:

requests.exceptions.ConnectionError: httpconnectionpool (host='www.google.com', port=80): Max retries exceeded with url: /(由NewConnectionError(': Failed to establish a new connection: [WinError 10060]连接尝试失败,因为被连接的一方在一段时间后没有正确响应,或已建立的连接失败,因为连接的主机未能响应'))

其他回答

即使在安装pyopenssl和尝试各种python版本后,我也无法在Windows上工作(而它在mac上工作得很好),所以我切换到urllib,它可以在python 3.6(从python .org)和3.7 (anaconda)上工作

import urllib 
from urllib.request import urlopen
html = urlopen("http://pythonscraping.com/pages/page1.html")
contents = html.read()
print(contents)

我有类似的问题,但下面的代码为我工作。

url = <some REST url>    
page = requests.get(url, verify=False)

verify=False禁用SSL验证。Try和catch可以像往常一样添加。

加上我自己的经验:

r = requests.get(download_url)

当我试图下载url中指定的文件时。

错误在于

HTTPSConnectionPool(host, port=443): Max retries exceeded with url (Caused by SSLError(SSLError("bad handshake: Error([('SSL routines', 'tls_process_server_certificate', 'certificate verify failed')])")))

我通过在函数中添加verify = False来纠正它,如下所示:

r = requests.get(download_url + filename)
open(filename, 'wb').write(r.content)

在公司环境中指定代理为我解决了这个问题。

page = requests.get("http://www.google.com:80", proxies={"http": "http://111.233.225.166:1234"})

完整的错误是:

requests.exceptions.ConnectionError: httpconnectionpool (host='www.google.com', port=80): Max retries exceeded with url: /(由NewConnectionError(': Failed to establish a new connection: [WinError 10060]连接尝试失败,因为被连接的一方在一段时间后没有正确响应,或已建立的连接失败,因为连接的主机未能响应'))

只需使用请求功能:

import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry


session = requests.Session()
retry = Retry(connect=3, backoff_factor=0.5)
adapter = HTTPAdapter(max_retries=retry)
session.mount('http://', adapter)
session.mount('https://', adapter)

session.get(url)

这将获取URL并重试3次,以防出现requests.exceptions.ConnectionError。Backoff_factor将有助于在尝试之间应用延迟,以避免在定期请求配额的情况下再次失败。

看一下urllib3.util.retry。重试,它有许多选项来简化重试。