第4章 变形
import numpy as np
import pandas as pd
df = pd. read_csv( 'data/table.csv' )
df. head( )
School Class ID Gender Address Height Weight Math Physics 0 S_1 C_1 1101 M street_1 173 63 34.0 A+ 1 S_1 C_1 1102 F street_2 192 73 32.5 B+ 2 S_1 C_1 1103 M street_2 186 82 87.2 B+ 3 S_1 C_1 1104 F street_2 167 81 80.4 B- 4 S_1 C_1 1105 F street_4 159 64 84.8 B+
一、透视表
1. pivot
一般状态下,数据在DataFrame会以压缩(stacked)状态存放,例如上面的Gender,两个类别被叠在一列中,pivot函数可将某一列作为新的cols:
df. pivot( index= 'ID' , columns= 'Gender' , values= 'Height' ) . head( )
df. pivot( index= 'ID' , columns= 'Class' , values= 'Physics' ) . head( )
Class C_1 C_2 C_3 C_4 ID 1101 A+ NaN NaN NaN 1102 B+ NaN NaN NaN 1103 B+ NaN NaN NaN 1104 B- NaN NaN NaN 1105 B+ NaN NaN NaN
然而pivot函数具有很强的局限性,除了功能上较少之外,还不允许values中出现重复的行列索引对(pair),例如下面的语句就会报错:
因此,更多的时候会选择使用强大的pivot_table函数
2. pivot_table
首先,再现上面的操作:
df. pivot_table( index= 'ID' , columns= 'Gender' , values= 'Height' ) . head( )
Gender F M ID 1101 NaN 173.0 1102 192.0 NaN 1103 NaN 186.0 1104 167.0 NaN 1105 159.0 NaN
df. pivot_table( index= 'School' , columns= 'Gender' , values= 'Height' )
Gender F M School S_1 173.125000 178.714286 S_2 173.727273 172.000000
由于功能更多,速度上自然是比不上原来的pivot函数:
% timeit df. pivot( index= 'ID' , columns= 'Gender' , values= 'Height' )
% timeit pd. pivot_table( df, index= 'ID' , columns= 'Gender' , values= 'Height' )
% timeit pd. crosstab( index= df[ 'ID' ] , columns= df[ 'Gender' ] )
1.67 ms ± 124 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
6.73 ms ± 91.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
8.21 ms ± 213 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Pandas中提供了各种选项,下面介绍常用参数:
① aggfunc:对组内进行聚合统计,可传入各类函数,默认为’mean’
pd. pivot_table( df, index= 'School' , columns= 'Gender' , values= 'Height' , aggfunc= [ 'mean' , 'sum' ] ) . head( )
pd. pivot_table( df, index= 'School' , columns= 'Class' , values= 'Height' , aggfunc= [ 'mean' , 'sum' , 'std' ] ) . head( )
mean sum std Class C_1 C_2 C_3 C_4 C_1 C_2 C_3 C_4 C_1 C_2 C_3 C_4 School S_1 175.4 170.6 181.2 NaN 877.0 853.0 906.0 NaN 13.538833 11.523888 13.386560 NaN S_2 164.2 180.0 173.8 173.8 821.0 900.0 869.0 869.0 7.395945 16.000000 14.342245 17.326281
② margins:汇总边际状态
pd. pivot_table( df, index= 'School' , columns= 'Gender' , values= 'Height' , aggfunc= [ 'mean' , 'sum' ] , margins= True ) . head( )
pd. pivot_table( df, index= 'School' , columns= 'Gender' , values= 'Height' , aggfunc= [ 'mean' , 'sum' ] , margins= True , margins_name= 'margins' ) . head( )
mean sum Gender F M margins F M margins School S_1 173.125000 178.714286 175.733333 1385 1251 2636 S_2 173.727273 172.000000 172.950000 1911 1548 3459 margins 173.473684 174.937500 174.142857 3296 2799 6095
③ 行、列、值都可以为多级
pd. pivot_table( df, index= [ 'School' , 'Class' ] , columns= [ 'Gender' , 'Address' ] , values= [ 'Height' , 'Math' ] )
Height ... Math Gender F M ... F M Address street_1 street_2 street_4 street_5 street_6 street_7 street_1 street_2 street_4 street_5 ... street_4 street_5 street_6 street_7 street_1 street_2 street_4 street_5 street_6 street_7 School Class S_1 C_1 NaN 179.5 159.0 NaN NaN NaN 173.0 186.0 NaN NaN ... 84.8 NaN NaN NaN 34.0 87.2 NaN NaN NaN NaN C_2 NaN NaN 176.0 162.0 167.0 NaN NaN NaN NaN 188.0 ... 63.5 33.8 68.40 NaN NaN NaN NaN 97.0 58.8 NaN C_3 175.0 NaN NaN 187.0 NaN NaN NaN 195.0 161.0 NaN ... NaN 61.7 NaN NaN NaN 85.2 31.50 NaN NaN 49.7 S_2 C_1 NaN NaN NaN 159.0 161.0 NaN NaN NaN 163.5 NaN ... NaN 72.2 50.60 NaN NaN NaN 43.35 NaN NaN 83.3 C_2 NaN NaN NaN NaN NaN 188.5 175.0 NaN 155.0 193.0 ... NaN NaN NaN 76.95 47.2 NaN 73.80 39.1 NaN NaN C_3 NaN NaN 157.0 NaN 164.0 190.0 NaN NaN 187.0 171.0 ... 72.3 NaN 95.50 65.90 NaN NaN 48.90 32.7 NaN NaN C_4 NaN 176.0 NaN NaN 175.5 NaN NaN NaN NaN NaN ... NaN NaN 53.65 NaN NaN NaN NaN NaN NaN 48.7
7 rows × 24 columns
3. crosstab(交叉表)
交叉表是一种特殊的透视表,典型的用途如分组统计,如现在想要统计关于街道和性别分组的频数:
df. head( )
School Class ID Gender Address Height Weight Math Physics 0 S_1 C_1 1101 M street_1 173 63 34.0 A+ 1 S_1 C_1 1102 F street_2 192 73 32.5 B+ 2 S_1 C_1 1103 M street_2 186 82 87.2 B+ 3 S_1 C_1 1104 F street_2 167 81 80.4 B- 4 S_1 C_1 1105 F street_4 159 64 84.8 B+
pd. crosstab( index= df[ 'Address' ] , columns= df[ 'Gender' ] )
pd. crosstab( index= df[ 'School' ] , columns= df[ 'Class' ] )
Class C_1 C_2 C_3 C_4 School S_1 5 5 5 0 S_2 5 5 5 5
交叉表的功能也很强大(但目前还不支持多级分组),下面说明一些重要参数:
① values和aggfunc:分组对某些数据进行聚合操作,这两个参数必须成对出现
pd. crosstab( index= df[ 'School' ] , columns= df[ 'Class' ] , values= df[ 'Math' ] , aggfunc= 'mean' , normalize= 'index' )
Class C_1 C_2 C_3 C_4 School S_1 0.333508 0.336227 0.330266 0.000000 S_2 0.245823 0.263622 0.264713 0.225842
pd. crosstab( index= df[ 'Address' ] , columns= df[ 'Gender' ] ,
values= np. random. randint( 1 , 20 , df. shape[ 0 ] ) , aggfunc= 'min' )
Gender F M Address street_1 13 12 street_2 5 6 street_4 5 2 street_5 3 14 street_6 2 6 street_7 1 4
② 除了边际参数margins外,还引入了normalize参数,可选’all’,‘index’,'columns’参数值
pd. crosstab( index= df[ 'Address' ] , columns= df[ 'Gender' ] , normalize= 'all' , margins= True )
Gender F M All Address street_1 0.028571 0.057143 0.085714 street_2 0.114286 0.057143 0.171429 street_4 0.085714 0.142857 0.228571 street_5 0.085714 0.085714 0.171429 street_6 0.142857 0.028571 0.171429 street_7 0.085714 0.085714 0.171429 All 0.542857 0.457143 1.000000
二、其他变形方法
1. melt
melt函数可以认为是pivot函数的逆操作,将unstacked状态的数据,压缩成stacked,使“宽”的DataFrame变“窄”
df_m = df[ [ 'ID' , 'Gender' , 'Math' ] ]
df_m. head( )
ID Gender Math 0 1101 M 34.0 1 1102 F 32.5 2 1103 M 87.2 3 1104 F 80.4 4 1105 F 84.8
df. pivot( index= 'ID' , columns= 'Gender' , values= 'Math' ) . head( )
Gender F M ID 1101 NaN 34.0 1102 32.5 NaN 1103 NaN 87.2 1104 80.4 NaN 1105 84.8 NaN
melt函数中的id_vars表示需要保留的列,value_vars表示需要stack的一组列
pivoted= df. pivot( index= 'ID' , columns= 'Gender' , values= 'Math' )
result= pivoted. reset_index( ) . melt( id_vars= [ 'ID' ] , value_vars= [ 'F' , 'M' ] , value_name= 'Math' ) . dropna( ) . set_index( 'ID' ) . sort_index( ) . head( )
result
Gender Math ID 1101 M 34.0 1102 F 32.5 1103 M 87.2 1104 F 80.4 1105 F 84.8
pivoted = df. pivot( index= 'ID' , columns= 'Gender' , values= 'Math' )
result = pivoted. reset_index( ) . melt( id_vars= [ 'ID' ] , value_vars= [ 'F' , 'M' ] , value_name= 'Math' ) \
. dropna( ) . set_index( 'ID' ) . sort_index( )
result. equals( df_m. set_index( 'ID' ) )
True
2. 压缩与展开
(1)stack:这是最基础的变形函数,总共只有两个参数:level和dropna
df_s = pd. pivot_table( df, index= [ 'Class' , 'ID' ] , columns= 'Gender' , values= [ 'Height' , 'Weight' ] )
df_s. groupby( 'Class' ) . head( 2 )
Height Weight Gender F M F M Class ID C_1 1101 NaN 173.0 NaN 63.0 1102 192.0 NaN 73.0 NaN C_2 1201 NaN 188.0 NaN 68.0 1202 176.0 NaN 94.0 NaN C_3 1301 NaN 161.0 NaN 68.0 1302 175.0 NaN 57.0 NaN C_4 2401 192.0 NaN 62.0 NaN 2402 NaN 166.0 NaN 82.0
df_stacked = df_s. stack( )
df_stacked. groupby( 'Class' ) . head( 2 )
Height Weight Class ID Gender C_1 1101 M 173.0 63.0 1102 F 192.0 73.0 C_2 1201 M 188.0 68.0 1202 F 176.0 94.0 C_3 1301 M 161.0 68.0 1302 F 175.0 57.0 C_4 2401 F 192.0 62.0 2402 M 166.0 82.0
stack函数可以看做将横向的索引放到纵向,因此功能类似与melt,参数level可指定变化的列索引是哪一层(或哪几层,需要列表)
df_stacked = df_s. stack( 1 )
df_stacked. groupby( 'Class' ) . head( 2 )
Height Weight Class ID Gender C_1 1101 M 173.0 63.0 1102 F 192.0 73.0 C_2 1201 M 188.0 68.0 1202 F 176.0 94.0 C_3 1301 M 161.0 68.0 1302 F 175.0 57.0 C_4 2401 F 192.0 62.0 2402 M 166.0 82.0
df_stacked= df_s. stack( 0 )
df_stacked. groupby( 'Class' ) . head( 2 )
Gender F M Class ID C_1 1101 Height NaN 173.0 Weight NaN 63.0 C_2 1201 Height NaN 188.0 Weight NaN 68.0 C_3 1301 Height NaN 161.0 Weight NaN 68.0 C_4 2401 Height 192.0 NaN Weight 62.0 NaN
(2) unstack:stack的逆函数,功能上类似于pivot_table
df_stacked. head( )
Gender F M Class ID C_1 1101 Height NaN 173.0 Weight NaN 63.0 1102 Height 192.0 NaN Weight 73.0 NaN 1103 Height NaN 186.0
result= df_stacked. unstack( ) . swaplevel( 1 , 0 , axis= 1 ) . sort_index( axis= 1 )
result
result. groupby( 'Class' ) . head( 2 )
Height Weight Gender F M F M Class ID C_1 1101 NaN 173.0 NaN 63.0 1102 192.0 NaN 73.0 NaN C_2 1201 NaN 188.0 NaN 68.0 1202 176.0 NaN 94.0 NaN C_3 1301 NaN 161.0 NaN 68.0 1302 175.0 NaN 57.0 NaN C_4 2401 192.0 NaN 62.0 NaN 2402 NaN 166.0 NaN 82.0
三、哑变量与因子化
1. Dummy Variable(哑变量)
这里主要介绍get_dummies函数,其功能主要是进行one-hot编码:
df_d = df[ [ 'Class' , 'Gender' , 'Weight' ] ]
df_d. head( )
Class Gender Weight 0 C_1 M 63 1 C_1 F 73 2 C_1 M 82 3 C_1 F 81 4 C_1 F 64
现在希望将上面的表格前两列转化为哑变量,并加入第三列Weight数值:
pd. get_dummies( df_d[ [ 'Class' , 'Gender' ] ] ) . join( df_d[ 'Weight' ] ) . head( )
Class_C_1 Class_C_2 Class_C_3 Class_C_4 Gender_F Gender_M Weight 0 1 0 0 0 0 1 63 1 1 0 0 0 1 0 73 2 1 0 0 0 0 1 82 3 1 0 0 0 1 0 81 4 1 0 0 0 1 0 64
2. factorize方法
该方法主要用于自然数编码,并且缺失值会被记做-1,其中sort参数表示是否排序后赋值
codes, uniques = pd. factorize( [ 'b' , None , 'a' , 'c' , 'b' ] , sort= True )
display( codes)
display( uniques)
array([ 1, -1, 0, 2, 1], dtype=int64)
array(['a', 'b', 'c'], dtype=object)
四、问题与练习
1. 问题
【问题一】 上面提到了许多变形函数,如melt/crosstab/pivot/pivot_table/stack/unstack函数,请总结它们各自的使用特点。
melt &unstack 将unstacked状态的数据,压缩成stacked,使“宽”的DataFrame变“窄” pivot&pivot_table&stack pivot函数可将某一列作为新的cols crosstab 交叉表是一种特殊的透视表,典型的用途如分组统计
【问题二】 变形函数和多级索引是什么关系?哪些变形函数会使得索引维数变化?具体如何变化?
【问题三】 请举出一个除了上文提过的关于哑变量方法的例子。
pd.get_dummies(df_d[[‘Class’,‘Gender’]]),感觉有的时候需要把一些类别特征化成数字特征的时候可以用
【问题四】 使用完stack后立即使用unstack一定能保证变化结果与原始表完全一致吗?
不一定,要看具体操作的时候stack跟unstack操作时level参数
【问题五】 透视表中涉及了三个函数,请分别使用它们完成相同的目标(任务自定)并比较哪个速度最快。
%timeit df.pivot(index=‘ID’,columns=‘Gender’,values=‘Height’)
%timeit pd.pivot_table(df,index=‘ID’,columns=‘Gender’,values=‘Height’)
%timeit pd.crosstab(index=df[‘ID’],columns=df[‘Gender’])
1.67 ms ± 124 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
6.73 ms ± 91.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
8.21 ms ± 213 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
【问题六】 既然melt起到了stack的功能,为什么再设计stack函数?
stack 不用把那一列里面具体有哪些类写出来感觉对于类别不知道或者很多的情况下就需要用到stack
2. 练习
【练习一】 继续使用上一章的药物数据集:
pd. read_csv( 'data/Drugs.csv' ) . head( )
YYYY State COUNTY SubstanceName DrugReports 0 2010 VA ACCOMACK Propoxyphene 1 1 2010 OH ADAMS Morphine 9 2 2010 PA ADAMS Methadone 2 3 2010 VA ALEXANDRIA CITY Heroin 5 4 2010 PA ALLEGHENY Hydromorphone 5
(a) 现在请你将数据表转化成如下形态,每行需要显示每种药物在每个地区的10年至17年的变化情况,且前三列需要排序:
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-MPkKSaM4-1588087666791)(picture/drug_pic.png)]
df= pd. read_csv( 'data/Drugs.csv' , index_col= [ 'State' , 'COUNTY' , 'SubstanceName' ] ) . sort_index( )
df. head( )
YYYY DrugReports State COUNTY SubstanceName KY ADAIR Buprenorphine 2011 3 Buprenorphine 2012 5 Buprenorphine 2013 4 Buprenorphine 2014 27 Buprenorphine 2015 5
result= pd. pivot_table( df, index= [ 'State' , 'COUNTY' , 'SubstanceName' ] , columns= 'YYYY' , values= 'DrugReports' , fill_value= '-' ) . reset_index( ) . rename_axis( columns= { 'YYYY' : '' } )
result. head( )
State COUNTY SubstanceName 2010 2011 2012 2013 2014 2015 2016 2017 0 KY ADAIR Buprenorphine - 3 5 4 27 5 7 10 1 KY ADAIR Codeine - - 1 - - - - 1 2 KY ADAIR Fentanyl - - 1 - - - - - 3 KY ADAIR Heroin - - 1 2 - 1 - 2 4 KY ADAIR Hydrocodone 6 9 10 10 9 7 11 3
答案
df = pd. read_csv( 'data/Drugs.csv' , index_col= [ 'State' , 'COUNTY' ] ) . sort_index( )
df. head( )
YYYY SubstanceName DrugReports State COUNTY KY ADAIR 2010 Methadone 1 ADAIR 2010 Hydrocodone 6 ADAIR 2011 Oxycodone 4 ADAIR 2011 Buprenorphine 3 ADAIR 2011 Morphine 2
result = pd. pivot_table( df, index= [ 'State' , 'COUNTY' , 'SubstanceName' ]
, columns= 'YYYY'
, values= 'DrugReports' , fill_value= '-' ) . reset_index( ) . rename_axis( columns= { 'YYYY' : '' } )
result. head( )
State COUNTY SubstanceName 2010 2011 2012 2013 2014 2015 2016 2017 0 KY ADAIR Buprenorphine - 3 5 4 27 5 7 10 1 KY ADAIR Codeine - - 1 - - - - 1 2 KY ADAIR Fentanyl - - 1 - - - - - 3 KY ADAIR Heroin - - 1 2 - 1 - 2 4 KY ADAIR Hydrocodone 6 9 10 10 9 7 11 3
(b) 现在请将(a)中的结果恢复到原数据表,并通过equal函数检验初始表与新的结果是否一致(返回True)
pivoted= result. rename_axis( columns= { '' : 'YYYY' } )
melted= pivoted. melt( id_vars= [ 'State' , 'COUNTY' , 'SubstanceName' ] , value_vars= [ 2010 , 2011 , 2012 , 2013 , 2014 , 2015 , 2016 , 2017 ] , value_name= 'DrugReports' ) . set_index( [ 'State' , 'COUNTY' , 'SubstanceName' ] ) . dropna( ) . sort_index( )
melted= melted. query( 'DrugReports!="-"' )
melted. head( )
melted[ 'DrugReports' ] = melted[ 'DrugReports' ] . astype( np. int64)
melted[ 'YYYY' ] = melted[ 'YYYY' ] . astype( np. int64)
type ( melted[ 'DrugReports' ] [ 0 ] )
type ( melted[ 'YYYY' ] [ 0 ] )
type ( df[ 'YYYY' ] [ 0 ] )
melted. equals( df)
答案
result_melted = result. melt( id_vars= result. columns[ : 3 ] , value_vars= result. columns[ - 8 : ]
, var_name= 'YYYY' , value_name= 'DrugReports' ) . query( 'DrugReports != "-"' )
result2 = result_melted. sort_values( by= [ 'State' , 'COUNTY' , 'YYYY'
, 'SubstanceName' ] ) . reset_index( ) . drop( columns= 'index' )
cols = list ( result2. columns)
a, b = cols. index( 'SubstanceName' ) , cols. index( 'YYYY' )
cols[ b] , cols[ a] = cols[ a] , cols[ b]
result2 = result2[ cols] . astype( { 'DrugReports' : 'int' , 'YYYY' : 'int' } )
result2. head( )
State COUNTY YYYY SubstanceName DrugReports 0 KY ADAIR 2010 Hydrocodone 6 1 KY ADAIR 2010 Methadone 1 2 KY ADAIR 2011 Buprenorphine 3 3 KY ADAIR 2011 Hydrocodone 9 4 KY ADAIR 2011 Morphine 2
df_tidy = df. reset_index( ) . sort_values( by= result2. columns[ : 4 ] . tolist( ) ) . reset_index( ) . drop( columns= 'index' )
df_tidy. head( )
State COUNTY YYYY SubstanceName DrugReports 0 KY ADAIR 2010 Hydrocodone 6 1 KY ADAIR 2010 Methadone 1 2 KY ADAIR 2011 Buprenorphine 3 3 KY ADAIR 2011 Hydrocodone 9 4 KY ADAIR 2011 Morphine 2
df_tidy. equals( result2)
False
【练习二】 现有一份关于某地区地震情况的数据集,请解决如下问题:
pd. read_csv( 'data/Earthquake.csv' ) . head( )
日期 时间 维度 经度 方向 距离 深度 烈度 0 2003.05.20 12:17:44 AM 39.04 40.38 west 0.1 10.0 0.0 1 2007.08.01 12:03:08 AM 40.79 30.09 west 0.1 5.2 4.0 2 1978.05.07 12:41:37 AM 38.58 27.61 south_west 0.1 0.0 0.0 3 1997.03.22 12:31:45 AM 39.47 36.44 south_west 0.1 10.0 0.0 4 2000.04.02 12:57:38 AM 40.80 30.24 south_west 0.1 7.0 0.0
(a) 现在请你将数据表转化成如下形态,将方向列展开,并将距离、深度和烈度三个属性压缩:
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-9PvoGaCn-1588087666793)(picture/earthquake_pic.png)] 方案一
答案
df = pd. read_csv( 'data/Earthquake.csv' )
df = df. sort_values( by= df. columns. tolist( ) [ : 3 ] ) . sort_index( axis= 1 ) . reset_index( ) . drop( columns= 'index' )
df. head( )
方向 日期 时间 深度 烈度 经度 维度 距离 0 south_east 1912.08.09 12:29:00 AM 16.0 6.7 27.2 40.6 4.3 1 south_west 1912.08.10 12:23:00 AM 15.0 6.0 27.1 40.6 2.0 2 south_west 1912.08.10 12:30:00 AM 15.0 5.2 27.1 40.6 2.0 3 south_east 1912.08.11 12:19:04 AM 30.0 4.9 27.2 40.6 4.3 4 south_west 1912.08.11 12:20:00 AM 15.0 4.5 27.1 40.6 2.0
result = pd. pivot_table( df, index= [ '日期' , '时间' , '维度' , '经度' ]
, columns= '方向'
, values= [ '烈度' , '深度' , '距离' ] , fill_value= '-' ) . stack( level= 0 ) . rename_axis( index= { None : '地震参数' } )
result. head( 6 )
方向 east north north_east north_west south south_east south_west west 日期 时间 维度 经度 地震参数 1912.08.09 12:29:00 AM 40.6 27.2 深度 - - - - - 16 - - 烈度 - - - - - 6.7 - - 距离 - - - - - 4.3 - - 1912.08.10 12:23:00 AM 40.6 27.1 深度 - - - - - - 15 - 烈度 - - - - - - 6 - 距离 - - - - - - 2 -
(b) 现在请将(a)中的结果恢复到原数据表,并通过equal函数检验初始表与新的结果是否一致(返回True)
result= result. rename_axis( index= { '地震参数' : None } )
df_result= result. unstack( ) . stack( 0 ) [ ( result. unstack( ) . stack( 0 ) != '-' ) . any ( 1 ) ] . reset_index( )
df_result = df_result. sort_index( axis= 1 ) . astype( { '深度' : 'float64' , '烈度' : 'float64' , '距离' : 'float64' } )
df_result. head( )
df_result = result. unstack( ) . stack( 0 ) [ ( ~ ( result. unstack( ) . stack( 0 ) == '-' ) ) . any ( 1 ) ] . reset_index( )
df_result. columns. name= None
df_result = df_result. sort_index( axis= 1 ) . astype( { '深度' : 'float64' , '烈度' : 'float64' , '距离' : 'float64' } )
df_result. head( )
方向 日期 时间 深度 烈度 经度 维度 距离 0 south_east 1912.08.09 12:29:00 AM 16.0 6.7 27.2 40.6 4.3 1 south_west 1912.08.10 12:23:00 AM 15.0 6.0 27.1 40.6 2.0 2 south_west 1912.08.10 12:30:00 AM 15.0 5.2 27.1 40.6 2.0 3 south_east 1912.08.11 12:19:04 AM 30.0 4.9 27.2 40.6 4.3 4 south_west 1912.08.11 12:20:00 AM 15.0 4.5 27.1 40.6 2.0
df_result. astype( { '深度' : 'float64' , '烈度' : 'float64' , '距离' : 'float64' } , copy= False ) . dtypes
方向 object
日期 object
时间 object
深度 float64
烈度 float64
经度 float64
维度 float64
距离 float64
dtype: object
df. equals( df_result)
True