python爬虫selenium和bs4_python爬虫――selenium+bs4爬取选股宝‘利好‘or’利空'股票信息...

最新推荐文章于 2024-05-18 19:39:35 发布

郴江郑明兰

最新推荐文章于 2024-05-18 19:39:35 发布

阅读量212

点赞数

文章标签： python爬虫selenium和bs4

本文链接：https://blog.csdn.net/weixin_29495899/article/details/114446349

版权

一.前言。(1)我个人比较喜欢先看结果，再看内容，so,结果如图:

(2)信息抓取自选股宝https://xuangubao.cn/(我这里设定抓取加载20页，下面只列举几个)：

(3)本次主要应用到了Python：

Selenium模拟浏览器行为；

BeautifulSoup4进行剖析:

(4)运行环境or编译软件：

Selenium 3.12.0

BeautifulSoup4.6

二.实战

from bs4 import BeautifulSoup import re from selenium.webdriver import Firefox from selenium.webdriver.firefox.options import Options

def gethtml(str): options = Options() options.add_argument('-headless') driver = Firefox()#火狐浏览器 driver.get(str) for a in range(0,20):#动态加载20页的网页数据 loadmore = driver.find_element_by_xpath("//span[@class='home-news-footer-loadmore']").click()#模拟鼠标点击“点击加载更多”

a=driver.page_source#获取到页面源码。 driver.quit()#关闭浏览器 return a

模拟点击“点击加载更多”，设置点击20次。O(∩_∩)O哈哈~

(3)信息提取；def getinfor(lst,html_str,str_type,str_char): soup = BeautifulSoup(html_str,'html.parser') bu = soup.find_all(class_=str_type)#搜索‘利好’or‘利空’所在直接标签: for date in bu: bu_name = date.parent.parent.find_all(class_="stock-group-item")#利好’or‘利空’所在信息块有股票信息才继续 if not bu_name == []: print() date_=date.parent.parent.parent.parent.parent# date_month=date_.find(class_="news-item-timeline-date-month")#月 print(date_month.string,end='/') date_day=date_.find(class_="news-item-timeline-date-day")#日 print(date_day.string,end='日/') date_time=date_.find(class_=re.compile("news-item-timeline-time .*")).get_text()#时间 date_time_=re.compile(r'[0-9]{1,2}:[0-9]{1,2}').search(''.join(date_time)) print(date_time_.group(),end='/') print(str_char, end=' ') for a in bu_name: stock_name=a.find(class_="stock-group-item-name")#股票名字 print(stock_name.string, end='[') stock_name = a.find(class_="stock-group-item-rate")#指数 print(stock_name.string, end='] ') print()

解析：1》先定位‘利好’(‘利空’)，通过所在标签的属性class="bullish-and-bear bullish"(利空为class="bullish-and-bear bear")

2》搜索有股票才继续(如'焦作万方'),因为有些没有。

3》通过date_=date.parent.parent.parent.parent.parent定位到总

,在里面可以用find()方法定位所要信息所在标签。

(4)主方法调用；def main1(): stock_list_url = 'https://xuangubao.cn' stock_info_url = 'https://gupiao.baidu.com/stock/' stock_url = gethtml(stock_list_url) getinfor(stock_url,'bullish-and-bear bullish', ' 利好：') getinfor(stock_url,'bullish-and-bear bear',' 利空：')

三.总结。(1)对python爬虫有了一定了解。

(2)对相关库有一定认识，尤其是在安装库的时候，真的不是pip install ***就完是的了。

(3)接触pychar,知道了pychar的一些基本使用。

(4)这次是第一次爬虫，主要是应老师要求【黑脸】，要学的还有很多，简单爬取一些信息，没有明确的目的。欢迎各位朋友一起交流啊。有问题的，欢迎指出。

郴江郑明兰

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python爬虫selenium和bs4_python爬虫――selenium+bs4爬取选股宝‘利好‘or’利空'股票信息...

一.前言。(1)我个人比较喜欢先看结果，再看内容，so,结果如图:(2)信息抓取自选股宝https://xuangubao.cn/(我这里设定抓取加载20页，下面只列举几个)：(3)本次主要应用到了Python：Selenium模拟浏览器行为；BeautifulSoup4进行剖析:(4)运行环境or编译软件：Selenium 3.12.0BeautifulSoup4.6二.实战from bs4 i...
复制链接

扫一扫

python爬虫selenium和bs4_python爬虫――selenium+bs4爬取选股宝‘利好‘or’利空'股票信息...

“相关推荐”对你有帮助么？