Pandas 修改数据类型

最新推荐文章于 2024-08-23 09:48:05 发布

Lstao_

最新推荐文章于 2024-08-23 09:48:05 发布

阅读量1.1w

点赞数 6

分类专栏： python 文章标签： python

本文链接：https://blog.csdn.net/m0_47191576/article/details/106437999

版权

python 专栏收录该内容

1 篇文章 0 订阅

订阅专栏

Pandas 修改列的数据类型

创建DataFrame时写定dtype类型
对DataFrame多列或单列series进行类型转换

创建DataFrame时写定dtype类型

导入数据后，我们在对数据进程操作之前一定要使用DataFrame.info()函数查看数据的类型

import numpy as np
import pandas as pd
data={'name':['小王','小李','小陈','小小'],'scores':[97.0,88.0,76.0,65.0],
      'level':["A","B","C","D"],'rank':['1','2','3','4']}
df=pd.DataFrame(data)
df.info()

 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   name    4 non-null      object 
 1   scores  4 non-null      float64
 2   level   4 non-null      object 
 3   rank    4 non-null      object

在创建DataFrame时通过dtype参数指定类型：

df=pd.DataFrame(data,dtype='int64') #例1

#   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   name    4 non-null      object
 1   scores  4 non-null      int64 
 2   level   4 non-null      object
 3   rank    4 non-null      int64 

df=pd.read_csv('file.csv',dtype=str)#例2  这里在转成int或者float时，如果存在无法转成数值类型的列会报错

对DataFrame多列或单列series进行类型转换

1.to_numeric()

将参数转换为数字类型。

默认返回dtype为float64或int64，具体取决于提供的数据。使用downcast参数获取其他dtype。

>>>s=pd.Series(["8", 6, 7.5, 3, "0.9"])
>>>s
0      8
1      6
2    7.5
3      3
4    0.9
dtype: object

>>>pd.to_numeric(s)
0    8.0
1    6.0
2    7.5
3    3.0
4    0.9
dtype: float64
#可以看到这边是转成了float类型，如果数据中都是整数类型或者整数型的字符串，那么to_numeric转换成的是int类型

df["a"] = pd.to_numeric(df["a"])
#转换DataFrame中的一列

注意：**to_numeric()**返回的是一个新的Series，所以你需要给他分配变量或者列名

>>>s=pd.Series(["8", 6, "7", 3.0, "dada"])
>>>s
0       8
1       6
2       7
3       3
4    dada
dtype: object

>>>s=pd.to_numeric(s)
ValueError: Unable to parse string "dada" at position 4
#这里因为’dada‘是字母型的字符串，to_numeric无法处理，因此出现Error

那么碰到上图所示的某些值无法转换成数值时，该怎么办呢？

我们可以使用to_numeric()中的errors参数，该参数可以把无法转换的值强制为NAN，或者碰到无效值时忽略该操作,errors参数默认为’raise‘，即无效解析时会发生异常

>>>s=pd.Series([8, 6, 7, 3.3, "pandas"])
>>>s
0         8
1         6
2         7
3       3.3
4    pandas
dtype: object

>>>pd.to_numeric(s,errors='coerce')
0    8.0
1    6.0
2    7.0
3    3.3
4    NaN
dtype: float64

>>>pd.to_numeric(s,errors='ignore')
0         8
1         6
2         7
3       3.3
4    pandas
dtype: object
# the original Series is returned untouched

注意to_numeric()只能处理一列,当我们要处理DataFrame中的多列时，我们可以使用apply()方法来转换

>>>data={'name':['小王','小李','小陈','小小'],'scores':[97.0,88.0,76.0,65.0],
      'level':["A","B","C","D"],'rank':['1','2','3','4']}
>>>df=pd.DataFrame(data) 
>>>df=df.apply(pd.to_numeric,errors='ignore')
>>>df.info()
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   name    4 non-null      object 
 1   scores  4 non-null      float64
 2   level   4 non-null      object 
 3   rank    4 non-null      int64

这里转换了全部列，并且无法转换的列被忽略了，当你不知道哪些列可以被转换时，这种方法非常有效

我们也可以指定哪几列被转换

>>>df[['scores','rank']]=df[['scores','rank']].apply(pd.to_numeric,errors='ignore')

2.astype()

前面的 to_numeric() 函数只能转换成数字类型，默认返回dtype为float64或int64，这边使用 astype()可以指定具体的类型
errors参数：
raise ：允许引发异常
ignore：抑制异常。错误时返回原始对象。
默认为raise

>>>df=df.astype(int,errors='ignore')
>>>df.dtypes

name      object
scores     int32
level     object
rank       int32
dtype: object

>>>df[['scores','rank']]=df[['scores','rank']].astype('float64')
>>>df.dtypes
#指定任意列为同一个类型
name       object
scores    float64
level      object
rank      float64
dtype: object

>>>df=df.astype({'scores':'int32','rank':'float32'})
>>>df.dtypes
#利用字典使得任意列为任意类型
name       object
scores      int32
level      object
rank      float32
dtype: object

3.infer_objects()

它尝试为对象列推断更好的dtype。
尝试对对象类型化的列进行软转换，而使非对象和不可转换的列保持不变。推理规则与常规Series / DataFrame构造过程中的规则相同。

下面创建了一列整数型，一列整数的字符串

>>> df = pd.DataFrame({'a': [7, 1, 5], 'b': ['3','2','1']}, dtype='object')
>>> df.dtypes

a    object
b    object
dtype: object

>>> df = df.infer_objects()
>>> df.dtypes

a     int64
b    object
dtype: object

使用infer_objects() 将a列转换为int64类型，这里可以是用astype()
将b列转换

参考：Change data type of columns in Pandas

Lstao_

关注

6
点赞
踩
22

收藏

觉得还不错? 一键收藏
3
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录