python 模拟微信浏览器请求_python爬虫:使用Selenium模拟浏览器行为-369IT编程

admin管理员组
文章数量:1026318

前几天有位微信读者问我一个爬虫的问题，就是在爬去百度贴吧首页的热门动态下面的图片的时候，爬取的图片总是爬取不完整，比首页看到的少。原因他也大概分析了下，就是后面的图片是动态加载的。他的问题就是这部分动态加载的图片该怎么爬取到。

分析

他的代码比较简单，主要有以下的步骤：使用BeautifulSoup库，打开百度贴吧的首页地址，再解析得到id为new_list标签底下的img标签，最后将img标签的图片保存下来。

headers = {

'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36'

}

data=requests.get("https://tieba.baidu/index.html",headers=headers)

html=BeautifulSoup(data.text,'lxml')

前面提到过，有部分图片是动态加载的，那么首先我们得弄清楚，这部分图片是怎么动态加载的。在浏览器中打开百度贴吧的首页，可以明显的看到，在往下滚动滚动条的时候，当滚动到底部的时候，滚动条缩短了，并向上移动了一段距离。这个现象也正是有DOM元素动态的添加到了html文档的一个表现。动态加载数据无非就是ajax请求，而ajax本质上就是XMLHttpRequest请求(简称xhr)。在谷歌浏览器中，我们可以通过开发者工具的network面板来监测xhr请求。

刚打开首页时的xhr请求，这里的请求都和要爬取的

分析

headers = {

'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36'

}

data=requests.get("https://tieba.baidu/index.html",headers=headers)

html=BeautifulSoup(data.text,'lxml')

刚打开首页时的xhr请求，这里的请求都和要爬取的

本文标签：浏览器爬虫 python selenium

版权声明：本文标题：python 模拟微信浏览器请求_python爬虫:使用Selenium模拟浏览器行为内容由热心网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：http://it.en369.cn/jiaocheng/1741133969a1834710.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

更多相关文章

369IT编程

python 模拟微信浏览器请求_python爬虫:使用Selenium模拟浏览器行为

更多相关文章

Zotero下载安装！浏览器以及翻译插件分享！

vue-devtools安装教程 （Chrome浏览器）

Edge浏览器 安装 插件Adblock Plus失败

基于浏览器扩展程序脚本和IDM提高下载速度

谷歌chrome浏览器安装json插件

如何下载与Chrome浏览器的版本相匹配的ChromeDriver

mac已经安装了flash，为啥浏览器提示未安装？ 这里有答案。

IDM下载器插件 让浏览器不在限速

解决CentOS7无法安装Chrome浏览器

Chrome浏览器安装Adblock Plus插件拦截广告

浏览器的选择建议，按照这些建议选，总能找到合适的

安装谷歌浏览器，提示电脑已经安装了更高版本的浏览器的解决方法

适用于Chrome类浏览器的喜马拉雅音频下载插件

在windows 64bit系统上安装python NLTK

windows10下的浏览器userAgent

浏览器主页被360篡改

实用软件|什么？浏览器原来还有“外挂”模式！

各个浏览器的详细信息-前端必须知道的知识

QQ浏览器

用公司的WiFi会有可能被监控到上网记录吗？（浏览器搜索记录会被监控吗?）

发表评论

推荐文章

wp query - How to order posts on each different category?

javascript - Vite creating its own node_modules in workspace instead of using monorepo - Stack Overflow

html - trigger event if checkbox is checked javascript - Stack Overflow

azure - Change CAP_CPU and MAX_CPU - Stack Overflow

mongodb - $lookup accross 2 collections for ID exists takes too much time on Mongo - Stack Overflow

热门文章

plugins - WordPress Thickbox Navigation Issue in Gallery

javascript - how to implement multiple scrollable sticky headers in react native - Stack Overflow

Javascript, Websockets and XMPP client. How to make them work together? - Stack Overflow

javascript - how to test if babel works and my plugins are executed - Stack Overflow

javascript - How to detect non-visible keys (ENTER, F1, SHIFT) altogether using JS or jQuery? - Stack Overflow

How to specify maven properies when using mvnd? - Stack Overflow

javascript - JQuery datatablejs turkish character search problem - Stack Overflow

javascript - Calculating percent coverage of 1000m raster by 30m raster - Stack Overflow

categories - List subcategories of a specific product category (adapting from posts to products taxonomy)

javascript - How to fetch image from Website url and store all images in folder in PC? - Stack Overflow

最新文章

windows设置断电重启开机后自动输入锁屏密码登录

Windows系统设置开机默认开启数字小键盘

Windows11 开机自动同步时间（开机时间不更新问题）

windows配置开机自启动软件或脚本

【Redis】Windows设置Redis为开机自启动

程序员刚毕业，先去大厂镀金还是先去小厂攒经验？

万象2008清空boss账户密码

【Tools】GitBook简明教程

oracle exadata celldisk 闪存盘受损导致性能下降

SDUT 2138 图结构练习——BFSDFS——判断可达性

javascript - empty object returned with a bundled js library with webpack - Stack Overflow

Creating a login in javascript - array comparison - Stack Overflow

javascript - Object doesn&#39;t support property or method &#39;getElementsById&#39; in IE11 - Stack Overflow

problem with my my theme &quot;search.php&quot;

vhdl - Combinational logic warning - Stack Overflow

vue-devtools安装教程（Chrome浏览器）

Edge浏览器安装插件Adblock Plus失败

mac已经安装了flash，为啥浏览器提示未安装？这里有答案。

IDM下载器插件让浏览器不在限速

javascript - Object doesn't support property or method 'getElementsById' in IE11 - Stack Overflow

problem with my my theme "search.php"