admin管理员组

文章数量:1024166

I am making a scraper for web.archive site. The scraper should open the following link and:

  • scroll down the page
  • scrape the information
  • go to the next page
  • repeat.

The thing is I can't make the code click the next page over and over until there is nothing to click. The current code clicks it once, and when it gets to page 2 it doesn't go to page 3.

Here is the minimal code that represents the pagination:

from selenium import webdriver
from selenium.webdrivermon.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from seleniummon.exceptions import TimeoutException
import time

URL = "://www.coinpeople/forum/64-new-member-information-and-welcome33/"

driver = webdriver.Chrome()
driver.get(URL)

time.sleep(5)

driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

time.sleep(2)

while True:
    try:
        next_page = WebDriverWait(driver, 10).until(
            EC.element_to_be_clickable((By.XPATH, '//*[@id="elPagination_30a5535a7a65933469c1ef7e81dc96e0_806040338"]/li[9]/a'))
        )
        next_page.click()
    except TimeoutException:
            print("No more pages to click.")
            break 


driver.quit()

I have tried the following:

next_page = WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.CSS_SELECTOR, f'("a[rel='next']")')))

and

driver.execute_script("document.querySelector('[rel=next]').click();")

The thing is the XPATH changes for every page. Here is an example of the XPATH for the first 3 pages:

//*[@id="elPagination_30a5535a7a65933469c1ef7e81dc96e0_806040338"]/li[9]/a
//*[@id="elPagination_f4be6b25f47a268e848f4596d5b1e3a6_66137219"]/li[10]/a
//*[@id="elPagination_728dd0ae3583cbafa454d08121ab9841_298843258"]/li[11]/a

What can I do so the program goes through all pages?

I am making a scraper for web.archive. site. The scraper should open the following link and:

  • scroll down the page
  • scrape the information
  • go to the next page
  • repeat.

The thing is I can't make the code click the next page over and over until there is nothing to click. The current code clicks it once, and when it gets to page 2 it doesn't go to page 3.

Here is the minimal code that represents the pagination:

from selenium import webdriver
from selenium.webdrivermon.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from seleniummon.exceptions import TimeoutException
import time

URL = "https://web.archive./web/20230203123249/https://www.coinpeople/forum/64-new-member-information-and-welcome33/"

driver = webdriver.Chrome()
driver.get(URL)

time.sleep(5)

driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

time.sleep(2)

while True:
    try:
        next_page = WebDriverWait(driver, 10).until(
            EC.element_to_be_clickable((By.XPATH, '//*[@id="elPagination_30a5535a7a65933469c1ef7e81dc96e0_806040338"]/li[9]/a'))
        )
        next_page.click()
    except TimeoutException:
            print("No more pages to click.")
            break 


driver.quit()

I have tried the following:

next_page = WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.CSS_SELECTOR, f'("a[rel='next']")')))

and

driver.execute_script("document.querySelector('[rel=next]').click();")

The thing is the XPATH changes for every page. Here is an example of the XPATH for the first 3 pages:

//*[@id="elPagination_30a5535a7a65933469c1ef7e81dc96e0_806040338"]/li[9]/a
//*[@id="elPagination_f4be6b25f47a268e848f4596d5b1e3a6_66137219"]/li[10]/a
//*[@id="elPagination_728dd0ae3583cbafa454d08121ab9841_298843258"]/li[11]/a

What can I do so the program goes through all pages?

Share Improve this question edited Nov 19, 2024 at 10:32 NewUser asked Nov 19, 2024 at 10:32 NewUserNewUser 36 bronze badges
Add a comment  | 

1 Answer 1

Reset to default 0

This worked in the end:

driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

I am making a scraper for web.archive site. The scraper should open the following link and:

  • scroll down the page
  • scrape the information
  • go to the next page
  • repeat.

The thing is I can't make the code click the next page over and over until there is nothing to click. The current code clicks it once, and when it gets to page 2 it doesn't go to page 3.

Here is the minimal code that represents the pagination:

from selenium import webdriver
from selenium.webdrivermon.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from seleniummon.exceptions import TimeoutException
import time

URL = "://www.coinpeople/forum/64-new-member-information-and-welcome33/"

driver = webdriver.Chrome()
driver.get(URL)

time.sleep(5)

driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

time.sleep(2)

while True:
    try:
        next_page = WebDriverWait(driver, 10).until(
            EC.element_to_be_clickable((By.XPATH, '//*[@id="elPagination_30a5535a7a65933469c1ef7e81dc96e0_806040338"]/li[9]/a'))
        )
        next_page.click()
    except TimeoutException:
            print("No more pages to click.")
            break 


driver.quit()

I have tried the following:

next_page = WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.CSS_SELECTOR, f'("a[rel='next']")')))

and

driver.execute_script("document.querySelector('[rel=next]').click();")

The thing is the XPATH changes for every page. Here is an example of the XPATH for the first 3 pages:

//*[@id="elPagination_30a5535a7a65933469c1ef7e81dc96e0_806040338"]/li[9]/a
//*[@id="elPagination_f4be6b25f47a268e848f4596d5b1e3a6_66137219"]/li[10]/a
//*[@id="elPagination_728dd0ae3583cbafa454d08121ab9841_298843258"]/li[11]/a

What can I do so the program goes through all pages?

I am making a scraper for web.archive. site. The scraper should open the following link and:

  • scroll down the page
  • scrape the information
  • go to the next page
  • repeat.

The thing is I can't make the code click the next page over and over until there is nothing to click. The current code clicks it once, and when it gets to page 2 it doesn't go to page 3.

Here is the minimal code that represents the pagination:

from selenium import webdriver
from selenium.webdrivermon.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from seleniummon.exceptions import TimeoutException
import time

URL = "https://web.archive./web/20230203123249/https://www.coinpeople/forum/64-new-member-information-and-welcome33/"

driver = webdriver.Chrome()
driver.get(URL)

time.sleep(5)

driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

time.sleep(2)

while True:
    try:
        next_page = WebDriverWait(driver, 10).until(
            EC.element_to_be_clickable((By.XPATH, '//*[@id="elPagination_30a5535a7a65933469c1ef7e81dc96e0_806040338"]/li[9]/a'))
        )
        next_page.click()
    except TimeoutException:
            print("No more pages to click.")
            break 


driver.quit()

I have tried the following:

next_page = WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.CSS_SELECTOR, f'("a[rel='next']")')))

and

driver.execute_script("document.querySelector('[rel=next]').click();")

The thing is the XPATH changes for every page. Here is an example of the XPATH for the first 3 pages:

//*[@id="elPagination_30a5535a7a65933469c1ef7e81dc96e0_806040338"]/li[9]/a
//*[@id="elPagination_f4be6b25f47a268e848f4596d5b1e3a6_66137219"]/li[10]/a
//*[@id="elPagination_728dd0ae3583cbafa454d08121ab9841_298843258"]/li[11]/a

What can I do so the program goes through all pages?

Share Improve this question edited Nov 19, 2024 at 10:32 NewUser asked Nov 19, 2024 at 10:32 NewUserNewUser 36 bronze badges
Add a comment  | 

1 Answer 1

Reset to default 0

This worked in the end:

driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

本文标签: pythonHow to handle dynamic pagination in Selenium where XPath changes for each pageStack Overflow