浏览器自动化检测对抗：修改navigator.webdriver属性的底层实现-369IT编程

admin管理员组
文章数量:1030666

浏览器自动化检测对抗：修改navigator.webdriver属性的底层实现

爬虫代理

一、背景介绍：你被自动化检测拒之门外了吗？

在使用 Selenium 或 Playwright 等浏览器自动化工具爬取数据时，经常会遇到「被检测」问题，尤其像 Amazon 这样反爬策略严密的网站。常见的检测机制之一就是检查 JavaScript 中的 navigator.webdriver 属性：

代码语言：javascript代码运行次数：0运行复制

console.log(navigator.webdriver); // true：表明是自动化工具

因此，本文将带你深入了解如何在浏览器中底层修改该属性，并结合代理、Cookie、User-Agent 技术，实现一个能顺利爬取 Amazon 网站商品信息的反检测爬虫。

二、环境准备

1. 安装依赖

代码语言：bash复制

pip install undetected-chromedriver selenium requests

我们使用 undetected-chromedriver 代替原生 Selenium 驱动，内置多种反检测机制，更适合应对大型网站的反爬。

2. 爬虫代理信息（请替换为你的真实账户信息）

代码语言：python代码运行次数：0运行复制

# 配置代理 亿牛云爬虫代理 www.16yun
proxy_host = "proxy.16yun"
proxy_port = "8010"
proxy_user = "16YUN"
proxy_pass = "16IP"

三、核心步骤

✅ 第一步：配置无痕浏览器并隐藏 webdriver

代码语言：python代码运行次数：0运行复制

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdrivermon.by import By
import time

def create_stealth_driver(proxy_host, proxy_port, proxy_user, proxy_pass, user_agent, cookies):
    options = Options()
    options.add_argument(f"user-agent={user_agent}")
    options.add_argument("--disable-blink-features=AutomationControlled")
    
    # 配置爬虫代理
    options.add_argument(f'--proxy-server=http://{proxy_user}:{proxy_pass}@{proxy_host}:{proxy_port}')
    
    # 启动无头浏览器（也可以关闭无头方便调试）
    # options.add_argument('--headless')
    
    # 创建驱动
    import undetected_chromedriver as uc
    driver = uc.Chrome(options=options)
    
    # 修改 webdriver 属性（核心）
    driver.execute_script("Object.defineProperty(navigator, 'webdriver', {get: () => undefined})")
    
    # 设置 cookie
    driver.get(";)
    for cookie in cookies:
        driver.add_cookie(cookie)
    
    return driver

✅ 第二步：模拟搜索关键词并采集信息

代码语言：python代码运行次数：0运行复制

def scrape_amazon(keyword):
    user_agent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"
    cookies = []  # 可以从浏览器复制一组，也可通过登录获取
    
    driver = create_stealth_driver(proxy_host, proxy_port, proxy_user, proxy_pass, user_agent, cookies)
    
    driver.get(f"={keyword}")
    time.sleep(3)
    
    products = driver.find_elements(By.XPATH, "//div[@data-component-type='s-search-result']")
    
    for product in products[:10]:  # 只取前10条数据举例
        try:
            title = product.find_element(By.TAG_NAME, "h2").text
            price_whole = product.find_element(By.CLASS_NAME, "a-price-whole").text
            price_frac = product.find_element(By.CLASS_NAME, "a-price-fraction").text
            price = f"{price_whole}.{price_frac}"
            reviews = product.find_element(By.XPATH, ".//span[@class='a-size-base']").text
            
            print(f"名称: {title}")
            print(f"价格: ${price}")
            print(f"评论: {reviews}")
            print("=" * 30)
        except Exception as e:
            continue

    driver.quit()

四、完整代码汇总

代码语言：python代码运行次数：0运行复制

# 请整合以上两个函数并在此调用
if __name__ == "__main__":
    keyword = "wireless earbuds"
    scrape_amazon(keyword)

五、常见错误分析

错误提示	原因	解决方案
`seleniummon.exceptions.WebDriverException`	驱动不匹配	使用 `undetected-chromedriver` 自动管理版本
网页元素找不到	页面尚未完全加载	加 `time.sleep()` 或 WebDriverWait
显示“访问过于频繁”	IP 被封	更换代理 IP，使用优质高匿代理
无法设置 Cookie	页面未打开或未加载完成	先访问目标页面，再添加 Cookie

六、总结与提升

本文以 Amazon 网站为例，讲解了如何通过底层 JS 技巧对抗自动化检测，关键点在于：

使用 undetected-chromedriver 替代传统 Selenium；
修改 navigator.webdriver 属性隐藏自动化痕迹；
配合代理、User-Agent 和 Cookie 构建可信环境；
页面加载等待 + XPath 精准提取实现结构化采集。

浏览器自动化检测对抗：修改navigator.webdriver属性的底层实现

爬虫代理

一、背景介绍：你被自动化检测拒之门外了吗？

代码语言：javascript代码运行次数：0运行复制

console.log(navigator.webdriver); // true：表明是自动化工具

二、环境准备

1. 安装依赖

代码语言：bash复制

pip install undetected-chromedriver selenium requests

我们使用 undetected-chromedriver 代替原生 Selenium 驱动，内置多种反检测机制，更适合应对大型网站的反爬。

2. 爬虫代理信息（请替换为你的真实账户信息）

代码语言：python代码运行次数：0运行复制

# 配置代理 亿牛云爬虫代理 www.16yun
proxy_host = "proxy.16yun"
proxy_port = "8010"
proxy_user = "16YUN"
proxy_pass = "16IP"

三、核心步骤

✅ 第一步：配置无痕浏览器并隐藏 webdriver

代码语言：python代码运行次数：0运行复制

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdrivermon.by import By
import time

def create_stealth_driver(proxy_host, proxy_port, proxy_user, proxy_pass, user_agent, cookies):
    options = Options()
    options.add_argument(f"user-agent={user_agent}")
    options.add_argument("--disable-blink-features=AutomationControlled")
    
    # 配置爬虫代理
    options.add_argument(f'--proxy-server=http://{proxy_user}:{proxy_pass}@{proxy_host}:{proxy_port}')
    
    # 启动无头浏览器（也可以关闭无头方便调试）
    # options.add_argument('--headless')
    
    # 创建驱动
    import undetected_chromedriver as uc
    driver = uc.Chrome(options=options)
    
    # 修改 webdriver 属性（核心）
    driver.execute_script("Object.defineProperty(navigator, 'webdriver', {get: () => undefined})")
    
    # 设置 cookie
    driver.get(";)
    for cookie in cookies:
        driver.add_cookie(cookie)
    
    return driver

✅ 第二步：模拟搜索关键词并采集信息

代码语言：python代码运行次数：0运行复制

def scrape_amazon(keyword):
    user_agent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"
    cookies = []  # 可以从浏览器复制一组，也可通过登录获取
    
    driver = create_stealth_driver(proxy_host, proxy_port, proxy_user, proxy_pass, user_agent, cookies)
    
    driver.get(f"={keyword}")
    time.sleep(3)
    
    products = driver.find_elements(By.XPATH, "//div[@data-component-type='s-search-result']")
    
    for product in products[:10]:  # 只取前10条数据举例
        try:
            title = product.find_element(By.TAG_NAME, "h2").text
            price_whole = product.find_element(By.CLASS_NAME, "a-price-whole").text
            price_frac = product.find_element(By.CLASS_NAME, "a-price-fraction").text
            price = f"{price_whole}.{price_frac}"
            reviews = product.find_element(By.XPATH, ".//span[@class='a-size-base']").text
            
            print(f"名称: {title}")
            print(f"价格: ${price}")
            print(f"评论: {reviews}")
            print("=" * 30)
        except Exception as e:
            continue

    driver.quit()

四、完整代码汇总

代码语言：python代码运行次数：0运行复制

# 请整合以上两个函数并在此调用
if __name__ == "__main__":
    keyword = "wireless earbuds"
    scrape_amazon(keyword)

五、常见错误分析

错误提示	原因	解决方案
`seleniummon.exceptions.WebDriverException`	驱动不匹配	使用 `undetected-chromedriver` 自动管理版本
网页元素找不到	页面尚未完全加载	加 `time.sleep()` 或 WebDriverWait
显示“访问过于频繁”	IP 被封	更换代理 IP，使用优质高匿代理
无法设置 Cookie	页面未打开或未加载完成	先访问目标页面，再添加 Cookie

六、总结与提升

本文以 Amazon 网站为例，讲解了如何通过底层 JS 技巧对抗自动化检测，关键点在于：

使用 undetected-chromedriver 替代传统 Selenium；
修改 navigator.webdriver 属性隐藏自动化痕迹；
配合代理、User-Agent 和 Cookie 构建可信环境；
页面加载等待 + XPath 精准提取实现结构化采集。

本文标签：浏览器自动化检测对抗修改navigatorwebdriver属性的底层实现

版权声明：本文标题：浏览器自动化检测对抗：修改navigator.webdriver属性的底层实现内容由热心网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：http://it.en369.cn/jiaocheng/1747681063a2203203.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

369IT编程

浏览器自动化检测对抗：修改navigator.webdriver属性的底层实现

浏览器自动化检测对抗：修改navigator.webdriver属性的底层实现

一、背景介绍：你被自动化检测拒之门外了吗？

二、环境准备

1. 安装依赖

2. 爬虫代理信息（请替换为你的真实账户信息）

三、核心步骤

✅ 第一步：配置无痕浏览器并隐藏 webdriver

✅ 第二步：模拟搜索关键词并采集信息

四、完整代码汇总

五、常见错误分析

六、总结与提升

浏览器自动化检测对抗：修改navigator.webdriver属性的底层实现

一、背景介绍：你被自动化检测拒之门外了吗？

二、环境准备

1. 安装依赖

2. 爬虫代理信息（请替换为你的真实账户信息）

三、核心步骤

✅ 第一步：配置无痕浏览器并隐藏 webdriver

✅ 第二步：模拟搜索关键词并采集信息

四、完整代码汇总

五、常见错误分析

六、总结与提升

更多相关文章

浏览器自动化检测对抗：修改navigator.webdriver属性的底层实现

发表评论

推荐文章

vhdl - Combinational logic warning - Stack Overflow

roadmap.sh：开发者职业成长的终极指南与路线图

2款简洁好用的在线代码变量命名利器，让命名不再烦恼！

java实现chatGPT SDK

Redis下载及安装(windows版)

热门文章

search - Noindex, nofollow stuck on homepage

javascript - How to check a digital signature from broswer with PHP - Stack Overflow

javascript - backbonejs get Models for a list of ids - Stack Overflow

区块链DApp技术架构

聊聊四种实时通信技术：长轮询、短轮询、WebSocket 和 SSE

精选5款基于.NET开源、免费、功能强大的CMS内容管理系统

语音合成突破：F5R

【梯度提升专题】XGBoost、Adaboost、CatBoost预测合集：抗乳腺癌药物优化、信贷风控、比特币应用

[c语言日寄]空间复杂度

【数据结构】邻接表 vs 邻接矩阵：5大核心优势解析与稀疏图存储优化指南

最新文章

我体验完刚发布的Claude3.5，发现最强的是这个新功能。

看完了钉钉新发布的「AI搜索」，让我觉得，真香。

被AI改造后的meme梗图，已经变成了我看不懂的模样。

Figma也开始卷AI了，设计师又要完蛋了？

我测了12个小时的RunwayGen3，发现这就是AI视频的No.1。

程序员刚毕业，先去大厂镀金还是先去小厂攒经验？

万象2008清空boss账户密码

【Tools】GitBook简明教程

oracle exadata celldisk 闪存盘受损导致性能下降

SDUT 2138 图结构练习——BFSDFS——判断可达性

javascript - Type &#39;undefined&#39; is not assignable to type &#39;menuItemProps[]&#39; - Stack Overflow

javascript - VS 2015 Angular 2 import modules cannot be resolved - Stack Overflow

javascript - Get the JSON objects that are not present in another array - Stack Overflow

javascript - How to dismiss a phonegap notification programmatically - Stack Overflow

c - Solaris 10 make Error code 1 Fatal Error when trying to build python 2.7.16 - Stack Overflow

javascript - Type 'undefined' is not assignable to type 'menuItemProps[]' - Stack Overflow