pandas根据dtype选择columns
select_dtypes()
方法根据dtype
选择columns中的子集。
import numpy as np
import pandas as pd
df = pd.DataFrame({'string': list('abc'),
'int64': list(range(1, 4)),
'uint8': np.arange(3, 6).astype('u1'),
'float64': np.arange(4.0, 7.0),
'bool1': [True, False, True],
'bool2': [False, True, False],
'dates': pd.date_range('now', periods=3),
'category': pd.Series(list("ABC")).astype('category')})
df['tdeltas'] = df.dates.diff()
df['uint64'] = np.arange(3, 6).astype('u8')
df['other_dates'] = pd.date_range('20130101', periods=3)
df['tz_aware_dates'] = pd.date_range('20130101', periods=3, tz='US/Eastern')
df
| string | int64 | uint8 | float64 | bool1 | bool2 | dates | category | tdeltas | uint64 | other_dates | tz_aware_dates |
---|
0 | a | 1 | 3 | 4.0 | True | False | 2019-12-01 22:00:58.958571 | A | NaT | 3 | 2013-01-01 | 2013-01-01 00:00:00-05:00 |
---|
1 | b | 2 | 4 | 5.0 | False | True | 2019-12-02 22:00:58.958571 | B | 1 days | 4 | 2013-01-02 | 2013-01-02 00:00:00-05:00 |
---|
2 | c | 3 | 5 | 6.0 | True | False | 2019-12-03 22:00:58.958571 | C | 1 days | 5 | 2013-01-03 | 2013-01-03 00:00:00-05:00 |
---|
df.dtypes
string object
int64 int64
uint8 uint8
float64 float64
bool1 bool
bool2 bool
dates datetime64[ns]
category category
tdeltas timedelta64[ns]
uint64 uint64
other_dates datetime64[ns]
tz_aware_dates datetime64[ns, US/Eastern]
dtype: object
select_dtypes()
有两个参数include
和exclude
。
df.select_dtypes(include=[bool])
| bool1 | bool2 |
---|
0 | True | False |
---|
1 | False | True |
---|
2 | True | False |
---|
df.select_dtypes(include=['bool'])
| bool1 | bool2 |
---|
0 | True | False |
---|
1 | False | True |
---|
2 | True | False |
---|
df.select_dtypes(include=['number', 'bool'], exclude=['unsignedinteger'])
| int64 | float64 | bool1 | bool2 | tdeltas |
---|
0 | 1 | 4.0 | True | False | NaT |
---|
1 | 2 | 5.0 | False | True | 1 days |
---|
2 | 3 | 6.0 | True | False | 1 days |
---|
要选择字符串列,你必须使用对象dtype:
df.select_dtypes(include=['object'])
要查看像numpy.number
这样的泛型dtype的所有子dtypes。你可以定义一个返回子类型树的函数:
def subdtypes(dtype):
subs = dtype.__subclasses__()
if not subs:
return dtype
return [dtype,[subdtypes(dt) for dt in subs]]
subdtypes(np.generic)
[numpy.generic,
[[numpy.number,
[[numpy.integer,
[[numpy.signedinteger,
[numpy.int8,
numpy.int16,
numpy.int32,
numpy.int32,
numpy.int64,
numpy.timedelta64]],
[numpy.unsignedinteger,
[numpy.uint8,
numpy.uint16,
numpy.uint32,
numpy.uint32,
numpy.uint64]]]],
[numpy.inexact,
[[numpy.floating,
[numpy.float16, numpy.float32, numpy.float64, numpy.float64]],
[numpy.complexfloating,
[numpy.complex64, numpy.complex128, numpy.complex128]]]]]],
[numpy.flexible,
[[numpy.character, [numpy.bytes_, numpy.str_]],
[numpy.void, [numpy.record]]]],
numpy.bool_,
numpy.datetime64,
numpy.object_]]
subdtypes(np.number)
[numpy.number,
[[numpy.integer,
[[numpy.signedinteger,
[numpy.int8,
numpy.int16,
numpy.int32,
numpy.int32,
numpy.int64,
numpy.timedelta64]],
[numpy.unsignedinteger,
[numpy.uint8, numpy.uint16, numpy.uint32, numpy.uint32, numpy.uint64]]]],
[numpy.inexact,
[[numpy.floating,
[numpy.float16, numpy.float32, numpy.float64, numpy.float64]],
[numpy.complexfloating,
[numpy.complex64, numpy.complex128, numpy.complex128]]]]]]