纯小白Python爬取东方财富网研报内容并通过机器学习的SVM模型进行文本分析（四）

蔬菜味的牛牛

于 2019-05-31 23:00:50 发布

阅读量6.1k

点赞数 12

分类专栏： Python爬虫和机器学习

本文链接：https://blog.csdn.net/qq_43826034/article/details/90724205

版权

本文介绍了如何使用Python基础、正则表达式和selenium库爬取东方财富网的研报链接及内容，并整合成一个表格。后续章节将利用机器学习的SVM模型进行文本分析。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

学习了爬虫基础、编码、正则表达式、selenium库之后，就可以正式进行我们的爬取了

一、爬取研报链接

# -*- coding:utf-8 -*-
import time
from selenium import webdriver#selenium库需要环境配置
import pandas as pd
date=[]
rating=[]
rating_change=[]
institution_name=[]
report_url=[]
driver=webdriver.Firefox()#模拟浏览器进行访问
driver.get("http://data.eastmoney.com/report/465yb_1.html#pageAnchor")#动态页面
def scrapy():
    for i in range(1,2):
        def get_data():
            date1=driver.find_elements_by_xpath("//div[@id='dt_1']//li[@class='date']")
            #通过xpath语法来爬取标签名[@属性名=""]
            #//代表之前为任意值
            for i in date1:
                date2=i.text#获取文本内容
                date.append(date2)
        time.sleep(2)#暂停一下，要不然会被封
        def get_institution():
            institution_name1=driver.find_elements_by_xpath("//div[@id='dt_1']//li[@cl

最低0.47元/天解锁文章