等待页面加载Selenium WebDriver for Python

我想刮取无限滚动实现的页面的所有数据。下面的python代码可以工作。

for i in range(100):
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    time.sleep(5)

这意味着每当我向下滚动到底部时，我都需要等待5秒，这通常足以让页面完成加载新生成的内容。但是，这可能并不省时。页面可能在5秒内完成新内容的加载。如何在每次向下滚动时检测页面是否完成了新内容的加载?如果我能检测到这一点，一旦我知道页面完成加载，我就可以再次向下滚动以查看更多内容。这样更节省时间。

当前回答

我挣扎了一点，让这个工作，因为它没有为我工作的预期。任何还在努力让它工作的人，可以检查一下。

我想等待一个元素出现在网页上，然后再继续我的操作。

我们可以使用WebDriverWait(driver, 10,1).until()，但catch是until()期望一个函数，它可以执行一段时间的超时提供(在我们的情况下是10)每1秒。所以保持它如下对我有用。

element_found = wait_for_element.until(lambda x: x.find_element_by_class_name("MY_ELEMENT_CLASS_NAME").is_displayed())

下面是until()在幕后所做的事情

def until(self, method, message=''):
        """Calls the method provided with the driver as an argument until the \
        return value is not False."""
        screen = None
        stacktrace = None

        end_time = time.time() + self._timeout
        while True:
            try:
                value = method(self._driver)
                if value:
                    return value
            except self._ignored_exceptions as exc:
                screen = getattr(exc, 'screen', None)
                stacktrace = getattr(exc, 'stacktrace', None)
            time.sleep(self._poll)
            if time.time() > end_time:
                break
        raise TimeoutException(message, screen, stacktrace)

2021-09-06 07:05:37

其他回答

你可以通过这个函数简单地做到这一点:

def page_is_loading(driver):
    while True:
        x = driver.execute_script("return document.readyState")
        if x == "complete":
            return True
        else:
            yield False

当你想在页面加载完成后做一些事情时，你可以使用:

Driver = webdriver.Firefox(options=Options, executable_path='geckodriver.exe')
Driver.get("https://www.google.com/")

while not page_is_loading(Driver):
    continue

Driver.execute_script("alert('page is loaded')")

2020-07-10 08:23:18

Selenium无法检测页面是否完全加载，但javascript可以。我建议你试试这个。

from selenium.webdriver.support.ui import WebDriverWait
WebDriverWait(driver, 100).until(lambda driver: driver.execute_script('return document.readyState') == 'complete')

这将执行javascript代码，而不是使用python，因为javascript可以检测页面何时完全加载，它将显示“完成”。这个代码的意思是在100秒内，继续尝试这个文档。readyState直到complete显示。

2022-07-19 10:28:38

webdriver将在默认情况下通过.get()方法等待页面加载。

正如你可能正在寻找一些特定的元素@user227215所说的，你应该使用WebDriverWait来等待位于你页面中的元素:

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException

browser = webdriver.Firefox()
browser.get("url")
delay = 3 # seconds
try:
    myElem = WebDriverWait(browser, delay).until(EC.presence_of_element_located((By.ID, 'IdOfMyElement')))
    print "Page is ready!"
except TimeoutException:
    print "Loading took too much time!"

我用它来检查提醒。您可以使用任何其他类型方法来查找定位器。

编辑1:

I should mention that the webdriver will wait for a page to load by default. It does not wait for loading inside frames or for ajax requests. It means when you use .get('url'), your browser will wait until the page is completely loaded and then go to the next command in the code. But when you are posting an ajax request, webdriver does not wait and it's your responsibility to wait an appropriate amount of time for the page or a part of page to load; so there is a module named expected_conditions.

2014-10-25 21:44:05

另外，您可以检查DOM是否没有更多的修改，而不是向下滚动100次(在页面底部是AJAX惰性加载的情况下)

def scrollDown(driver, value):
    driver.execute_script("window.scrollBy(0,"+str(value)+")")

# Scroll down the page
def scrollDownAllTheWay(driver):
    old_page = driver.page_source
    while True:
        logging.debug("Scrolling loop")
        for i in range(2):
            scrollDown(driver, 500)
            time.sleep(2)
        new_page = driver.page_source
        if new_page != old_page:
            old_page = new_page
        else:
            break
    return True

2017-07-09 16:18:52

你试过driver.implicitly_wait吗?它就像驱动程序的一个设置，所以你只在会话中调用它一次，它基本上告诉驱动程序等待给定的时间，直到每个命令都可以执行。

driver = webdriver.Chrome()
driver.implicitly_wait(10)

因此，如果您设置等待时间为10秒，它将尽快执行命令，等待10秒后才放弃。我在类似的滚动场景中使用过这个，所以我不明白为什么它在您的情况下不起作用。希望这对你有帮助。

为了能够修复这个答案，我必须添加新的文本。确保在implicitly_wait中使用小写“w”。

2018-05-13 04:36:48

等待页面加载Selenium WebDriver for Python

推荐文章

最新文章

标签