我想刮取无限滚动实现的页面的所有数据。下面的python代码可以工作。

for i in range(100):
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    time.sleep(5)

这意味着每当我向下滚动到底部时,我都需要等待5秒,这通常足以让页面完成加载新生成的内容。但是,这可能并不省时。页面可能在5秒内完成新内容的加载。如何在每次向下滚动时检测页面是否完成了新内容的加载?如果我能检测到这一点,一旦我知道页面完成加载,我就可以再次向下滚动以查看更多内容。这样更节省时间。


当前回答

webdriver将在默认情况下通过.get()方法等待页面加载。

正如你可能正在寻找一些特定的元素@user227215所说的,你应该使用WebDriverWait来等待位于你页面中的元素:

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException

browser = webdriver.Firefox()
browser.get("url")
delay = 3 # seconds
try:
    myElem = WebDriverWait(browser, delay).until(EC.presence_of_element_located((By.ID, 'IdOfMyElement')))
    print "Page is ready!"
except TimeoutException:
    print "Loading took too much time!"

我用它来检查提醒。您可以使用任何其他类型方法来查找定位器。

编辑1:

I should mention that the webdriver will wait for a page to load by default. It does not wait for loading inside frames or for ajax requests. It means when you use .get('url'), your browser will wait until the page is completely loaded and then go to the next command in the code. But when you are posting an ajax request, webdriver does not wait and it's your responsibility to wait an appropriate amount of time for the page or a part of page to load; so there is a module named expected_conditions.

其他回答

试图将find_element_by_id传递给构造函数的presence_of_element_locate(如已接受的答案所示)会引发NoSuchElementException异常。我不得不在fragles的评论中使用语法:

from selenium import webdriver
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

driver = webdriver.Firefox()
driver.get('url')
timeout = 5
try:
    element_present = EC.presence_of_element_located((By.ID, 'element_id'))
    WebDriverWait(driver, timeout).until(element_present)
except TimeoutException:
    print "Timed out waiting for page to load"

这与文档中的示例相匹配。这里是By文档的链接。

Selenium无法检测页面是否完全加载,但javascript可以。我建议你试试这个。

from selenium.webdriver.support.ui import WebDriverWait
WebDriverWait(driver, 100).until(lambda driver: driver.execute_script('return document.readyState') == 'complete')

这将执行javascript代码,而不是使用python,因为javascript可以检测页面何时完全加载,它将显示“完成”。这个代码的意思是在100秒内,继续尝试这个文档。readyState直到complete显示。

nono = driver.current_url
driver.find_element(By.XPATH,"//button[@value='Send']").click()
  while driver.current_url == nono:
      pass
print("page loaded.")

从硒/ webdriver /支持/ wait.py

driver = ...
from selenium.webdriver.support.wait import WebDriverWait
element = WebDriverWait(driver, 10).until(
    lambda x: x.find_element_by_id("someId"))

把WebDriverWait放在While循环中并捕获异常如何?

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException

browser = webdriver.Firefox()
browser.get("url")
delay = 3 # seconds
while True:
    try:
        WebDriverWait(browser, delay).until(EC.presence_of_element_located(browser.find_element_by_id('IdOfMyElement')))
        print "Page is ready!"
        break # it will break from the loop once the specific element will be present. 
    except TimeoutException:
        print "Loading took too much time!-Try again"