Pandas常见操作

最新推荐文章于 2024-08-03 15:50:30 发布

hkss

最新推荐文章于 2024-08-03 15:50:30 发布

阅读量468

点赞数 1

分类专栏： Python 文章标签： dataframe 数据处理

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/wys578/article/details/90510769

版权

Python 专栏收录该内容

8 篇文章 0 订阅

订阅专栏

# Pandas常用功能小结：

查看表信息：df.shape

读取：pd.Dateframe([{},{}]), pd.readsql()

查询：df.where() df.query()

判断逻辑： ‘&’‘and’ ‘|’‘or’ .isin()

替换：replace() filno

筛选：loc() iloc()

合并：merge

移动：shift

删除：去重：.drop_duplicates

数据转换：tolist() to_cidt()

设置索引：set_index() reindex()

官网DataFrame 【点击】

# 返回表的行/列数

df.shape

# 判断为空

df.empty

# numpy & pandas 相互转换

## dataframe转化成array

df=df.values

## array转化成dataframe

import pandas as pd

df = pd.DataFrame(df)

# df.shape[0] # 查看表的行数

# 生成df

# df --> lis_dics

minute_net_inflow_info_dic_list = minute_net_inflow_info_df.to_dict('records')

# 设置索引

# base_info_df = base_info_df.set_index('code', drop=True)

# base_info_df = base_info_df.reindex(index=['code'])

# 选取数据

df.iloc[:2, :].to_dict(orient='records') # 提取前两行，所有列，并转换成字典

data.loc[['Ohio', 'Utah'], ['two', 'four']] # 使用行名和列名选取

### 条件查询

查询：【1】【2】【3】【逻辑、条件】【4】

# 条件查询，更改指定字段的值

new_df.loc[(new_df['code'].map(lambda d: d[:3])).isin(['688']), ['category']] = 14906

# 条件删除

df.drop(df[(df.score < 50) & (df.score > 20)].index, inplace=True)

数据提取：https://blog.csdn.net/qq_41797451/article/details/80542060

inc_max_stock = new_df[new_df['rise_fall_rate'] == new_df['rise_fall_rate'].max()]['name'].values[0]

# 按索引列提取

new_index_list = list(set(index_list)) # 去重

bad_rec = bad_rec.iloc[new_index_list, :] # 提取达标数据

# 任何列

bad_rec.loc[df['cloumn_name']].isin(some_values) #

## 条件筛选

__del_codes = list(_del_codes)

_df_old_adjust_del = _df_old_adjust[~_df_old_adjust.index.isin(__del_codes)]

new_df = base_info_df[base_info_df['code'].isin(str_code_list)]

new_df = base_info_df.loc[str_code_list, :]

new_df = new_df[~(new_df['code'].map(lambda d: d[:3])).isin(['688'])] # 过滤掉创业板

## 保存成excel

new_user_info_df.to_excel(os.path.join(REPORT_PATH, '{}.xlsx'.format(now)))

## 分组

grouped = df.groupby(by = ['code','category'])

pd.options.display.max_columns = None

grouplist = []

for code,group in grouped:

fixs = Fix()

b = group.apply(lambda x: fixs.func(x), axis=1)

b['amount'] = b['amount'].cumsum()

b['volume'] = b['volume'].cumsum()

grouplist.append(b)

df.dropna(axis=0, how='any', inplace=True) # 塞选删除含有空值的行， how='all'：所有为空才删除

basic_info = basic_info.where(basic_info.notna(), None) # 将NaN强制转成None

# 去重

df.drop_duplicates(["trader"], keep="last", inplace=True)

df = df.drop_duplicates(subset=['code', 'industry'], keep=False) # 去重

df['hello'] = df['hello'].replace({np.nan:None})

关于值替换：

# 将NaN替换成0

__tmp_df['cash'] = __tmp_df['cash'].fillna(0)

# 将指定值替换成NaN

df[df.isin(['0000-00-00', '0000-00-00 00:00:00', '', 0.0])] = np.nan

df= df.where(df.notna(), None) # 将nan值替换成None

df.replace('目标值', '结果值', inplace=True)

timing.index = pd.to_datetime(timing['date'])

df = df[['befor_date', 'date', 'status', 'ups', 'close', 'open']].dropna()

df = df.rename(columns={'befor_date': 'befor_day', 'status': 'niubear'}) # 改列名

# 移动运算

df['column_name'] = df['balance'].rolling(window=ma).mean() # 移动求平均

# 取精度值

df['s'].round(4)

# 排序

pd.sort_values("xxx", inplace=True)

# 合并 pandas dataframe的合并（append, merge, concat）

_count_res_df = pd.merge(_code_name_res_df, _count_res_df, on='code')

# 设置df全量输出

pd.set_option('display.max_colwidth',500)

pandas.set_option('display.max_rows',None)

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。