我试图获得应用商店>业务的内容:

import requests
from lxml import html

page = requests.get("https://itunes.apple.com/in/genre/ios-business/id6000?mt=8")
tree = html.fromstring(page.text)

flist = []
plist = []
for i in range(0, 100):
    app = tree.xpath("//div[@class='column first']/ul/li/a/@href")
    ap = app[0]
    page1 = requests.get(ap)

当我尝试(0,2)的范围,它工作,但当我把范围在100,它显示这个错误:

Traceback (most recent call last):
  File "/home/preetham/Desktop/eg.py", line 17, in <module>
    page1 = requests.get(ap)
  File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 55, in get
    return request('get', url, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 44, in request
    return session.request(method=method, url=url, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 383, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 486, in send
    r = adapter.send(request, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests/adapters.py", line 378, in send
    raise ConnectionError(e)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='itunes.apple.com', port=443): Max retries exceeded with url: /in/app/adobe-reader/id469337564?mt=8 (Caused by <class 'socket.gaierror'>: [Errno -2] Name or service not known)

当前回答

实现异常处理总是好的。它不仅有助于避免脚本意外退出,还有助于记录错误和信息通知。当使用Python请求时,我更喜欢捕获这样的异常:

    try:
        res = requests.get(adress,timeout=30)
    except requests.ConnectionError as e:
        print("OOPS!! Connection Error. Make sure you are connected to Internet. Technical Details given below.\n")
        print(str(e))            
        renewIPadress()
        continue
    except requests.Timeout as e:
        print("OOPS!! Timeout Error")
        print(str(e))
        renewIPadress()
        continue
    except requests.RequestException as e:
        print("OOPS!! General Error")
        print(str(e))
        renewIPadress()
        continue
    except KeyboardInterrupt:
        print("Someone closed the program")

这里的renewIPadress()是一个用户定义函数,它可以在IP地址被阻塞时更改IP地址。你可以不用这个函数。

其他回答

这里发生的是itunes服务器拒绝你的连接(你在短时间内从同一个ip地址发送了太多的请求)

url: /in/app/adobe-reader/id469337564?太= 8

错误跟踪是误导性的,它应该是类似于“无法建立连接,因为目标机器主动拒绝它”。

关于python有一个问题。请求库在Github,看看这里

为了克服这个问题(与其说是一个问题,不如说是误导调试跟踪),你应该像这样捕捉连接相关的异常:

try:
    page1 = requests.get(ap)
except requests.exceptions.ConnectionError:
    r.status_code = "Connection refused"

另一种克服这个问题的方法是,如果你使用足够的时间间隔来发送请求到服务器,这可以通过python中的sleep(timeinsec)函数来实现(不要忘记导入sleep)

from time import sleep

所有的请求都是很棒的python库,希望能解决你的问题。

就这么做,

将下面的代码粘贴到page = requests.get(url)的位置:

import time

page = ''
while page == '':
    try:
        page = requests.get(url)
        break
    except:
        print("Connection refused by the server..")
        print("Let me sleep for 5 seconds")
        print("ZZzzzz...")
        time.sleep(5)
        print("Was a nice sleep, now let me continue...")
        continue

不客气:)

首先我运行run.py文件,然后我运行unit_test.py文件,它为我工作

我的情况比较特殊。我试了上面的答案,没有一个管用。我突然想,是不是和我的网络代理有关?你知道,我在中国大陆,如果没有代理,我无法访问像谷歌这样的网站。然后我关掉了网络代理,问题就解决了。

即使在安装pyopenssl和尝试各种python版本后,我也无法在Windows上工作(而它在mac上工作得很好),所以我切换到urllib,它可以在python 3.6(从python .org)和3.7 (anaconda)上工作

import urllib 
from urllib.request import urlopen
html = urlopen("http://pythonscraping.com/pages/page1.html")
contents = html.read()
print(contents)