pandas常用语法

OSurer

已于 2022-11-06 12:58:55 修改

阅读量1.4k

点赞数

分类专栏： Python 文章标签： python pandas

于 2020-11-06 16:58:33 首次发布

本文链接：https://blog.csdn.net/wq_ocean_/article/details/109535145

版权

Python 专栏收录该内容

28 篇文章 77 订阅

订阅专栏

据ndarray创建dateframe：

 df = pd.DataFrame(para, columns=('interval_hour', 'interval_sec', 'pre',
                                         'tem', 'WIN_S_Avg_10mi', 'WIN_D_Avg_10mi', 'RHU', 'pwv', 'ztd', 'PRE_1h'))

根据列标题合并dataframe：

# reunit data
df_merge = pd.merge(df_none_PRE_1h, df_arc, on="interval_hour", how="left")
# 按不同列标题合并
df_merge = pd.merge(ChinaMetList, igraList,left_on='id',right_on='wmo', how="left")
# throw mismatching epoch out
df_merge = df_merge.dropna(axis=0, how='any')

#按多个键合并
df_merge = pd.merge(ztd_panda, ztd_qxj, on=['site', 'date'], how="left")

删除指定列中为NaN的行

merge3 = merge3.dropna(subset=['id_cmonoc']).reset_index(drop=True)
df = df.dropna(subset=('pressure', 'temperature', 'dewpoint'), how='any').reset_index(drop=True)

获取列标题并转为列表

header = df.columns.tolist()

DataFram转numpy

array = np.array(df)

深拷贝

DataFrame.copy(deep=True)

新建空表按列表添加行

igraSites = pd.DataFrame(columns=header)
igraSites.loc[len(igraSites)] = site     //site是一个list

//loc表示按索引获取行，因为df索引从0开始，len表示新的一行

更改列类型

igraSites[['lat', 'lon', 'alt']] = igraSites[['lat', 'lon', 'alt']].astype('float')

查找列值最大值的行

minIndex = igraSites['distance'].idxmin()  //查找索引
minRow = igraSites.loc[minIndex]    //获取行

读取txt文件

df = pd.read_table(path, sep='\t', header=None)
//sep 表示分隔符，\t表制表符，一般为换行符，存在于行尾
//header=None 表示第一行不作为表头
//

快速导入以空格分隔的txt文件

import pandas as pd
header = list(range(10))
df = pd.read_table('cmonocSits.txt' ,header=None, names = header,delim_whitespace=True)
#names 表示列标题
#假如列过多不清楚header中列标题的数量时，可以先用下列方式查看，再添加列标题，注意新列名的长度必须与旧列名一致
out = pd.read_table(fpath, header=None, delim_whitespace=True)
print(out .columns)


header = ['site', 'mjd', 'total year', 'year_doy_time', 'ztd', 'nepo', 'flag','nobs']
        out = pd.read_table(fpath, header=None, names=header,dtype={'year_doy_time': str},skiprows=3,delim_whitespace=True)
        out['year_doy_time'] = pd.to_datetime(out['year_doy_time'], format='%Y.%j%H%M')

# dtype 指定导入列的类型
# skiprows 跳过的行数
# pd.to_datetime处理时间字符串

读取与存储csv/txt/xlsx

# csv
df.to_csv( 'igra2-station-list.csv',index=None)
igraSites = pd.read_csv('igra2-station-list.csv')

# txt
df_cma_fillter.to_csv('df_cma.txt',sep='\t', index=None,header=0)

# xlsx
df = pd.read_excel(fpath, header=None, names=header, index=False, skiprows=headnum)

遍历行或列

#遍历行
for index, row in df.iterrows():
    print (row["lat_x"], row["lat_y"])

#遍历列
for index, item in commonSet.iteritems():
    print(item )

新增/删除列

#新增单列
df['c1'] = None
#新增多列
df[['c1','c2']] == None  #报错 KeyError: "['c1','c2'] not in index"
使用concat
pd.concat([df, pd.DataFrame(columns=['c1','c2'])]) #该方法不能只能增加列的位置
使用reindex来重排和增加列名
df.reindex(columns=['c1','c2'])   

#删除列
del curr_data['date']

重命名列

# 直接修改columns，注意与原列数量一致
date.columns = ['Year', 'Month', 'Day','Hour']

# 自由更改
df= df.rename(columns = {"old_name": "new_name"})

按索引给列赋值

    df_merge.loc[index,['sitename', 'lat_z', 'lon_z', 'distance']] = minIndex,minLat,minLon,minDis

重置索引从0开始

df.index = range(len(df))

取某列作为索引

out = out.set_index('列名')

行列索引互换

curr_df = inf.to_frame().stack().unstack(level=0)

复制N行

curr_df = curr_df.append([row]*5)

创建时间格式的列

curr_df['date'] = pd.to_datetime(curr_df[['year','month','day','hour']])
#curr_df表示时间年月日时的列标题必须为'year','month','day','hour'的小写或混合大小写

将字符串的列转为时间格式

date['date'] = pd.to_datetime(date['date'], format="%Y%m%d_%H%M%S")
更多格式用法：https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior

按一定步长移动时间

 curr_df['date'] = curr_df['date_bj'] - pd.Timedelta(8, unit='h')

根据条件将数值二值化

match['obs'] = (match['precipitation'] >= typeThreshold).astype('int')
//
match['obs'][match['obs']>= typeThreshold]=1
match['obs'][match['obs']< typeThreshold]=0
//
match['obs'].mask(match['obs'] >=  typeThreshold, 1, inplace=True)
match['obs'].mask(match['obs'] < typeThreshold, 0, inplace=True)

填充NAN

 df = df.fillna(0)

反转单列

coor.latitude = coor.latitude.values[::-1]
coor.longitude = coor.longitude.values[::-1]

替换所有逗号为空

df = df.replace(r',', '',regex = True)
df['name'] = df['name'] .replace(r',', '',regex = True)

OSurer

关注

0
点赞
踩
9

收藏

觉得还不错? 一键收藏
1
评论
pandas常用语法

pandas根据ndarray创建dateframe： df = pd.DataFrame(para, columns=('interval_hour', 'interval_sec', 'pre', 'tem', 'WIN_S_Avg_10mi', 'WIN_D_Avg_10mi', 'RHU', 'pwv', 'ztd', 'PRE_1h'))根据列标题合并dataframe：# reunit data
复制链接

扫一扫

专栏目录