我试图获得应用商店>业务的内容:

import requests
from lxml import html

page = requests.get("https://itunes.apple.com/in/genre/ios-business/id6000?mt=8")
tree = html.fromstring(page.text)

flist = []
plist = []
for i in range(0, 100):
    app = tree.xpath("//div[@class='column first']/ul/li/a/@href")
    ap = app[0]
    page1 = requests.get(ap)

当我尝试(0,2)的范围,它工作,但当我把范围在100,它显示这个错误:

Traceback (most recent call last):
  File "/home/preetham/Desktop/eg.py", line 17, in <module>
    page1 = requests.get(ap)
  File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 55, in get
    return request('get', url, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 44, in request
    return session.request(method=method, url=url, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 383, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 486, in send
    r = adapter.send(request, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests/adapters.py", line 378, in send
    raise ConnectionError(e)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='itunes.apple.com', port=443): Max retries exceeded with url: /in/app/adobe-reader/id469337564?mt=8 (Caused by <class 'socket.gaierror'>: [Errno -2] Name or service not known)

当前回答

为这个请求添加报头。

headers={
'Referer': 'https://itunes.apple.com',
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.142 Safari/537.36'
}

requests.get(ap, headers=headers)

其他回答

我正在用Gauge编写一个测试,我也遇到了这个错误,这是因为我试图在没有激活VPN的情况下请求内部URL。

为这个请求添加报头。

headers={
'Referer': 'https://itunes.apple.com',
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.142 Safari/537.36'
}

requests.get(ap, headers=headers)

实现异常处理总是好的。它不仅有助于避免脚本意外退出,还有助于记录错误和信息通知。当使用Python请求时,我更喜欢捕获这样的异常:

    try:
        res = requests.get(adress,timeout=30)
    except requests.ConnectionError as e:
        print("OOPS!! Connection Error. Make sure you are connected to Internet. Technical Details given below.\n")
        print(str(e))            
        renewIPadress()
        continue
    except requests.Timeout as e:
        print("OOPS!! Timeout Error")
        print(str(e))
        renewIPadress()
        continue
    except requests.RequestException as e:
        print("OOPS!! General Error")
        print(str(e))
        renewIPadress()
        continue
    except KeyboardInterrupt:
        print("Someone closed the program")

这里的renewIPadress()是一个用户定义函数,它可以在IP地址被阻塞时更改IP地址。你可以不用这个函数。

在我的例子中,我在python脚本中部署了一些docker容器,然后调用其中一个部署的服务。当我在调用服务之前添加一些延迟时,错误被修复。我认为它需要时间来准备接受连接。

from time import sleep
#deploy containers
#get URL of the container
sleep(5)
response = requests.get(url,verify=False)
print(response.json())

在公司环境中指定代理为我解决了这个问题。

page = requests.get("http://www.google.com:80", proxies={"http": "http://111.233.225.166:1234"})

完整的错误是:

requests.exceptions.ConnectionError: httpconnectionpool (host='www.google.com', port=80): Max retries exceeded with url: /(由NewConnectionError(': Failed to establish a new connection: [WinError 10060]连接尝试失败,因为被连接的一方在一段时间后没有正确响应,或已建立的连接失败,因为连接的主机未能响应'))