分布式爬虫(5)：微博数据爬取

最新推荐文章于 2024-08-31 17:39:44 发布

aikunjiao3421

最新推荐文章于 2024-08-31 17:39:44 发布

阅读量214

点赞数

文章标签：爬虫 python javascript ViewUI

原文链接：http://www.cnblogs.com/bigdata-stone/p/9861479.html

版权

一、使用Selenium+Phantoms来抓取数据

　　　　1.登录：最重要的是设置User-Agent，否则无法转跳链接　

from selenium.webdriver.common.desired_capability import DesiredCapabilities
user_agent=(
　　"Mozilla/5.0()"

)

　　　　2.输入用户名和密码：

<input id="loginname"
type="text"
class="W input" maxlength="128"
autocomplete="off"
action-data="text=........"
name="username"
node-type="username" 
tabindex="1">

　　　　(1)为了与微博内容交互，需要用到javascript

　　　　　　相关的javascript代码：

　　　　　　document.getElementById('loginname').value='abc'

　　　　　　document.getElementsByName('password')[0].value='abc'

　　　　　　通过Selenium提供的send_keys来进行传递value

　　　　　　driver.find_element_by_id('loginname').send_keys(username)

　　　　　　driver.find_element_by_name('password').send_keys(password)

二、微博接口分析

三、直接调用微博API来抓取

四、表单及登录

转载于:https://www.cnblogs.com/bigdata-stone/p/9861479.html

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

aikunjiao3421

关注关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
分布式爬虫(5)：微博数据爬取

一、使用Selenium+Phantoms来抓取数据　　　　1.登录：最重要的是设置User-Agent，否则无法转跳链接　from selenium.webdriver.common.desired_capability import DesiredCapabilitiesuser_agent=(　　"Mozilla/5.0()")　　　　2.输入用户名和密码：...
复制链接

扫一扫