pandas的基础应用2

最新推荐文章于 2024-09-02 13:41:54 发布

赖东东不错学长

最新推荐文章于 2024-09-02 13:41:54 发布

阅读量282

点赞数 4

文章标签： pandas 自动化 excel python

本文链接：https://blog.csdn.net/tz021010/article/details/141656705

版权

前言

对上一个文章问题四进行升级

一、问题四叙述

加载数据：
- 从 沪深300成份.xlsx 文件中加载数据，这个文件包含了沪深300指数成分股的信息。
- 从 A股新闻查询结果.xlsx 文件中加载数据，这个文件包含了之前筛选出的A股相关新闻的数据。
数据匹配：
- 使用 isin() 函数来匹配 A股新闻查询结果.xlsx 文件中 ShortName 列的数据与 沪深300成份.xlsx 文件中 证券名称 列的数据。
- 这一步的目的是找出哪些A股新闻中的证券是沪深300指数的成分股。
统计数据：
- 对匹配后的结果进行统计，计算每个证券出现的次数。
- 创建一个新的 DataFrame，其中包含每个匹配证券的名称 (ShortName) 及其出现的次数 (MatchCount)。
保存结果：
- 将匹配的新闻结果和匹配次数分别保存到一个新创建的 Excel 文件的不同工作表中。
- 新的 Excel 文件命名为 沪深300成份匹配结果_with统计.xlsx。

问题四的代码实现了上述逻辑，最终输出的是一个包含两个工作表的 Excel 文件，一个工作表 (Matched News) 包含所有匹配的新闻结果，另一个工作表 (Match Counts) 包含每个证券的匹配次数。

二、步骤步骤

1.步骤1

这部分代码读取了两个 Excel 文件，一个是 沪深300成份.xlsx，另一个是 A股新闻查询结果.xlsx。

# 问题四
hs300_df = pd.read_excel('沪深300成份.xlsx')
a_share_df = pd.read_excel('A股新闻查询结果.xlsx')

2.步骤2

这部分代码根据 沪深300成份.xlsx 文件中的 证券名称 列，从 A股新闻查询结果.xlsx 文件中筛选出匹配的行。

# Match the securities in the HS300 component list with those in the A-share news results
matched_results = a_share_df[a_share_df['ShortName'].isin(hs300_df['证券名称'])]

3.步骤3

这部分代码对匹配的结果进行分组计数，生成一个新的 DataFrame，记录每个证券的匹配次数。

# Count the number of matched results for each security
match_counts = matched_results['ShortName'].value_counts().reset_index()
match_counts.columns = ['ShortName', 'MatchCount']

4.步骤4

这部分代码将匹配结果和匹配次数保存到一个新的 Excel 文件中，文件名为 沪深300成份匹配结果_with统计.xlsx，每个数据集被写入不同的工作表。

# Save the matched results and match counts to a new Excel file
output_path = '沪深300成份匹配结果_with统计.xlsx'
with pd.ExcelWriter(output_path) as writer:
    matched_results.to_excel(writer, sheet_name='Matched News', index=False)
    match_counts.to_excel(writer, sheet_name='Match Counts', index=False)

5.步骤5

最后，打印输出文件路径。

# Display the output file path
print(output_path)

总结

代码完整（看上次文章结合）：

import pandas as pd
import os
# # 问题一
# df1 = pd.read_csv('News_Security.csv')  # 如果数据是CSV格式
# a_stock_news = df1[df1['SecurityType'] == 'A股']
# selected_columns = a_stock_news[['DeclareDate', 'Title', 'NewsID','ShortName']]
# excel_filename = 'A股新闻查询结果.xlsx'
# selected_columns.to_excel(excel_filename, index=False, engine='openpyxl')
# print(f'查询结果已保存到 {excel_filename}')
# # 问题二
# df2 = pd.read_csv('News_NewsInfo1.csv',on_bad_lines='skip')
# eastmoney_count = df2[df2['NewsSource'].str.contains("东方财富网", na=False)].shape[0]
# results_df = pd.DataFrame({
#     'Result': [eastmoney_count]  # 将结果保存在DataFrame中
# })
# excel_filename = '东方财富网.xlsx'
# results_df.to_excel(excel_filename, index=False, engine='openpyxl')
#
# print(eastmoney_count)
# # 问题三
# df3 = pd.read_csv('TRD.csv')
# df4= pd.read_csv('AF_Actual.csv')
# combined_df = pd.concat([df3, df4], ignore_index=True)
# combined_df.to_csv('日度收益与分析师预测的每股收益与市盈率.csv', index=False)
# # 问题四
# Load the two Excel files
hs300_df = pd.read_excel('沪深300成份.xlsx')
a_share_df = pd.read_excel('A股新闻查询结果.xlsx')

# Match the securities in the HS300 component list with those in the A-share news results
matched_results = a_share_df[a_share_df['ShortName'].isin(hs300_df['证券名称'])]

# Count the number of matched results for each security
match_counts = matched_results['ShortName'].value_counts().reset_index()
match_counts.columns = ['ShortName', 'MatchCount']

# Save the matched results and match counts to a new Excel file
output_path = '沪深300成份匹配结果_with统计.xlsx'
with pd.ExcelWriter(output_path) as writer:
    matched_results.to_excel(writer, sheet_name='Matched News', index=False)
    match_counts.to_excel(writer, sheet_name='Match Counts', index=False)

# Display the output file path
output_path

work.py 文件包含了四个问题的解决方案，其中问题一至问题三被注释掉，未执行。问题四是主要的操作，它读取了两个 Excel 文件，进行了匹配和计数操作，并将结果保存在一个新的 Excel 文件中。

赖东东不错学长

关注

4
点赞
踩
9

收藏

觉得还不错? 一键收藏
0
评论
pandas的基础应用2

对上一个文章问题四进行升级work.py文件包含了四个问题的解决方案，其中问题一至问题三被注释掉，未执行。问题四是主要的操作，它读取了两个 Excel 文件，进行了匹配和计数操作，并将结果保存在一个新的 Excel 文件中。
复制链接

扫一扫