目录
to_numeric:于astype转换数据类型方法相比擅长将非数值类型数据转换为数值类型,
astype转换数据类型_我就是一个小怪兽的博客-CSDN博客
import pandas as pd
import seaborn as sns
tips=sns.load_dataset('tips')
t=tips.head(10)#获取子集
t.loc[[1,4,7],'total_bill']='missing'#将其中三个数据修改为missing
print(t)
print(t.dtypes)#查看数据类型
输出如下:
total_bill tip sex smoker day time size
0 16.99 1.01 Female No Sun Dinner 2
1 missing 1.66 Male No Sun Dinner 3
2 21.01 3.50 Male No Sun Dinner 3
3 23.68 3.31 Male No Sun Dinner 2
4 missing 3.61 Female No Sun Dinner 4
5 25.29 4.71 Male No Sun Dinner 4
6 8.77 2.00 Male No Sun Dinner 2
7 missing 3.12 Male No Sun Dinner 4
8 15.04 1.96 Male No Sun Dinner 2
9 14.78 3.23 Male No Sun Dinner 2
------------------------------------
total_bill object
tip float64
sex category
smoker category
day category
time category
size int64
dtype: object
[Finished in 1.8s]
t['total_bill'].astype(float)#此时将字符串类型转换为float类型会出现以下错误
pd.to_numeric(t['total_bill'])
有关astype的内容可以参考astype转换数据类型_我就是一个小怪兽的博客-CSDN博客
ValueError: could not convert string to float: 'missing'
ValueError: Unable to parse string "missing" at position 1
errors参数:
to_numeric函数有一个参数errors,决定了当函数遇到无法转换为数值的值的时候该如何处理,默认值为raise,即遇到无法转换的请款就会引发错误。
errors参数有以下三种取值:
(1)raise:默认值,遇到无法转换的情况便报错。
(2)coerce:当遇到无法转换的情况便返回NaN(缺失值)。
(3)ignore:当遇到无法转换的情况便放弃转换,直接返回整列(什么都不做)。
import pandas as pd
import seaborn as sns
tips=sns.load_dataset('tips')
t=tips.head(10)
t.loc[[1,4,7],'total_bill']='missing'
t['total_bill']=pd.to_numeric(t['total_bill'],errors='coerce')#errors参数指定为coerce
print(t)
输出结果如下:
total_bill tip sex smoker day time size
0 16.99 1.01 Female No Sun Dinner 2
1 NaN 1.66 Male No Sun Dinner 3
2 21.01 3.50 Male No Sun Dinner 3
3 23.68 3.31 Male No Sun Dinner 2
4 NaN 3.61 Female No Sun Dinner 4
5 25.29 4.71 Male No Sun Dinner 4
6 8.77 2.00 Male No Sun Dinner 2
7 NaN 3.12 Male No Sun Dinner 4
8 15.04 1.96 Male No Sun Dinner 2
9 14.78 3.23 Male No Sun Dinner 2
[Finished in 1.8s]
to_numeric向下转型:downcast参数
允许把列转换为数值类型之后,把数值类型更改为最小的数值类型,默认值为None,其他可能的值有integer,signed,unsigned和float
import pandas as pd
import seaborn as sns
tips=sns.load_dataset('tips')
t=tips.head(10)
t.loc[[1,4,7],'total_bill']='missing'
t['total_bill']=pd.to_numeric(t['total_bill'],errors='coerce')#不指定downcast参数
print(t.dtypes)
print('--------'*6)
t['total_bill']=pd.to_numeric(t['total_bill'],errors='coerce',downcast='float')#指定downcast参数
print(t.dtypes)
可以看到,指定downcast参数之后,total_bill的数据类型从float64变成了float32,占用的内存变小了
输出结果如下:
total_bill float64
tip float64
sex category
smoker category
day category
time category
size int64
dtype: object
------------------------------------------------
total_bill float32
tip float64
sex category
smoker category
day category
time category
size int64
dtype: object
[Finished in 1.7s]