admin管理员组文章数量:1024166
I am making a scraper for web.archive site. The scraper should open the following link and:
- scroll down the page
- scrape the information
- go to the next page
- repeat.
The thing is I can't make the code click the next page over and over until there is nothing to click. The current code clicks it once, and when it gets to page 2 it doesn't go to page 3.
Here is the minimal code that represents the pagination:
from selenium import webdriver
from selenium.webdrivermon.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from seleniummon.exceptions import TimeoutException
import time
URL = "://www.coinpeople/forum/64-new-member-information-and-welcome33/"
driver = webdriver.Chrome()
driver.get(URL)
time.sleep(5)
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(2)
while True:
try:
next_page = WebDriverWait(driver, 10).until(
EC.element_to_be_clickable((By.XPATH, '//*[@id="elPagination_30a5535a7a65933469c1ef7e81dc96e0_806040338"]/li[9]/a'))
)
next_page.click()
except TimeoutException:
print("No more pages to click.")
break
driver.quit()
I have tried the following:
next_page = WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.CSS_SELECTOR, f'("a[rel='next']")')))
and
driver.execute_script("document.querySelector('[rel=next]').click();")
The thing is the XPATH changes for every page. Here is an example of the XPATH for the first 3 pages:
//*[@id="elPagination_30a5535a7a65933469c1ef7e81dc96e0_806040338"]/li[9]/a
//*[@id="elPagination_f4be6b25f47a268e848f4596d5b1e3a6_66137219"]/li[10]/a
//*[@id="elPagination_728dd0ae3583cbafa454d08121ab9841_298843258"]/li[11]/a
What can I do so the program goes through all pages?
I am making a scraper for web.archive. site. The scraper should open the following link and:
- scroll down the page
- scrape the information
- go to the next page
- repeat.
The thing is I can't make the code click the next page over and over until there is nothing to click. The current code clicks it once, and when it gets to page 2 it doesn't go to page 3.
Here is the minimal code that represents the pagination:
from selenium import webdriver
from selenium.webdrivermon.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from seleniummon.exceptions import TimeoutException
import time
URL = "https://web.archive./web/20230203123249/https://www.coinpeople/forum/64-new-member-information-and-welcome33/"
driver = webdriver.Chrome()
driver.get(URL)
time.sleep(5)
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(2)
while True:
try:
next_page = WebDriverWait(driver, 10).until(
EC.element_to_be_clickable((By.XPATH, '//*[@id="elPagination_30a5535a7a65933469c1ef7e81dc96e0_806040338"]/li[9]/a'))
)
next_page.click()
except TimeoutException:
print("No more pages to click.")
break
driver.quit()
I have tried the following:
next_page = WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.CSS_SELECTOR, f'("a[rel='next']")')))
and
driver.execute_script("document.querySelector('[rel=next]').click();")
The thing is the XPATH changes for every page. Here is an example of the XPATH for the first 3 pages:
//*[@id="elPagination_30a5535a7a65933469c1ef7e81dc96e0_806040338"]/li[9]/a
//*[@id="elPagination_f4be6b25f47a268e848f4596d5b1e3a6_66137219"]/li[10]/a
//*[@id="elPagination_728dd0ae3583cbafa454d08121ab9841_298843258"]/li[11]/a
What can I do so the program goes through all pages?
Share Improve this question edited Nov 19, 2024 at 10:32 NewUser asked Nov 19, 2024 at 10:32 NewUserNewUser 36 bronze badges1 Answer
Reset to default 0This worked in the end:
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
I am making a scraper for web.archive site. The scraper should open the following link and:
- scroll down the page
- scrape the information
- go to the next page
- repeat.
The thing is I can't make the code click the next page over and over until there is nothing to click. The current code clicks it once, and when it gets to page 2 it doesn't go to page 3.
Here is the minimal code that represents the pagination:
from selenium import webdriver
from selenium.webdrivermon.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from seleniummon.exceptions import TimeoutException
import time
URL = "://www.coinpeople/forum/64-new-member-information-and-welcome33/"
driver = webdriver.Chrome()
driver.get(URL)
time.sleep(5)
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(2)
while True:
try:
next_page = WebDriverWait(driver, 10).until(
EC.element_to_be_clickable((By.XPATH, '//*[@id="elPagination_30a5535a7a65933469c1ef7e81dc96e0_806040338"]/li[9]/a'))
)
next_page.click()
except TimeoutException:
print("No more pages to click.")
break
driver.quit()
I have tried the following:
next_page = WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.CSS_SELECTOR, f'("a[rel='next']")')))
and
driver.execute_script("document.querySelector('[rel=next]').click();")
The thing is the XPATH changes for every page. Here is an example of the XPATH for the first 3 pages:
//*[@id="elPagination_30a5535a7a65933469c1ef7e81dc96e0_806040338"]/li[9]/a
//*[@id="elPagination_f4be6b25f47a268e848f4596d5b1e3a6_66137219"]/li[10]/a
//*[@id="elPagination_728dd0ae3583cbafa454d08121ab9841_298843258"]/li[11]/a
What can I do so the program goes through all pages?
I am making a scraper for web.archive. site. The scraper should open the following link and:
- scroll down the page
- scrape the information
- go to the next page
- repeat.
The thing is I can't make the code click the next page over and over until there is nothing to click. The current code clicks it once, and when it gets to page 2 it doesn't go to page 3.
Here is the minimal code that represents the pagination:
from selenium import webdriver
from selenium.webdrivermon.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from seleniummon.exceptions import TimeoutException
import time
URL = "https://web.archive./web/20230203123249/https://www.coinpeople/forum/64-new-member-information-and-welcome33/"
driver = webdriver.Chrome()
driver.get(URL)
time.sleep(5)
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(2)
while True:
try:
next_page = WebDriverWait(driver, 10).until(
EC.element_to_be_clickable((By.XPATH, '//*[@id="elPagination_30a5535a7a65933469c1ef7e81dc96e0_806040338"]/li[9]/a'))
)
next_page.click()
except TimeoutException:
print("No more pages to click.")
break
driver.quit()
I have tried the following:
next_page = WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.CSS_SELECTOR, f'("a[rel='next']")')))
and
driver.execute_script("document.querySelector('[rel=next]').click();")
The thing is the XPATH changes for every page. Here is an example of the XPATH for the first 3 pages:
//*[@id="elPagination_30a5535a7a65933469c1ef7e81dc96e0_806040338"]/li[9]/a
//*[@id="elPagination_f4be6b25f47a268e848f4596d5b1e3a6_66137219"]/li[10]/a
//*[@id="elPagination_728dd0ae3583cbafa454d08121ab9841_298843258"]/li[11]/a
What can I do so the program goes through all pages?
Share Improve this question edited Nov 19, 2024 at 10:32 NewUser asked Nov 19, 2024 at 10:32 NewUserNewUser 36 bronze badges1 Answer
Reset to default 0This worked in the end:
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
本文标签: pythonHow to handle dynamic pagination in Selenium where XPath changes for each pageStack Overflow
版权声明:本文标题:python - How to handle dynamic pagination in Selenium where XPath changes for each page? - Stack Overflow 内容由热心网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://it.en369.cn/questions/1745568340a2156585.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论