python dtype o,什么是dtype('O')?

I have a dataframe in pandas and I'm trying to figure out what the types of its values are. I am unsure what the type is of column 'Test'. However, when I run myFrame['Test'].dtype, I get;

dtype('O')

What does this mean?

解决方案

When you see dtype('O') inside dataframe this means Pandas string.

What is dtype?

Something that belongs to pandas or numpy, or both, or something else? If we examine pandas code:

df = pd.DataFrame({'float': [1.0],

'int': [1],

'datetime': [pd.Timestamp('20180310')],

'string': ['foo']})

print(df)

print(df['float'].dtype,df['int'].dtype,df['datetime'].dtype,df['string'].dtype)

df['string'].dtype

It will output like this:

float int datetime string

0 1.0 1 2018-03-10 foo

---

float64 int64 datetime64[ns] object

---

dtype('O')

You can interpret the last as Pandas dtype('O') or Pandas object which is Python type string, and this corresponds to Numpy string_, or unicode_ types.

Pandas dtype Python type NumPy type Usage

object str string_, unicode_ Text

Like Don Quixote is on ass, Pandas is on Numpy and Numpy understand the underlying architecture of your system and uses the class numpy.dtype for that.

Data type object is an instance of numpy.dtype class that understand the data type more precise including:

Type of the data (integer, float, Python object, etc.)

Size of the data (how many bytes is in e.g. the integer)

Byte order of the data (little-endian or big-endian)

If the data type is structured, an aggregate of other data types, (e.g., describing an array item consisting of an integer and a float)

What are the names of the "fields" of the structure

What is the data-type of each field

Which part of the memory block each field takes

If the data type is a sub-array, what is its shape and data type

In the context of this question dtype belongs to both pands and numpy and in particular dtype('O') means we expect the string.

Here is some code for testing with explanation:

If we have the dataset as dictionary

import pandas as pd

import numpy as np

from pandas import Timestamp

data={'id': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5}, 'date': {0: Timestamp('2018-12-12 00:00:00'), 1: Timestamp('2018-12-12 00:00:00'), 2: Timestamp('2018-12-12 00:00:00'), 3: Timestamp('2018-12-12 00:00:00'), 4: Timestamp('2018-12-12 00:00:00')}, 'role': {0: 'Support', 1: 'Marketing', 2: 'Business Development', 3: 'Sales', 4: 'Engineering'}, 'num': {0: 123, 1: 234, 2: 345, 3: 456, 4: 567}, 'fnum': {0: 3.14, 1: 2.14, 2: -0.14, 3: 41.3, 4: 3.14}}

df = pd.DataFrame.from_dict(data) #now we have a dataframe

print(df)

print(df.dtypes)

The last lines will examine the dataframe and note the output:

id date role num fnum

0 1 2018-12-12 Support 123 3.14

1 2 2018-12-12 Marketing 234 2.14

2 3 2018-12-12 Business Development 345 -0.14

3 4 2018-12-12 Sales 456 41.30

4 5 2018-12-12 Engineering 567 3.14

id int64

date datetime64[ns]

role object

num int64

fnum float64

dtype: object

All kind of different dtypes

df.iloc[1,:] = np.nan

df.iloc[2,:] = None

But if we try to set np.nan or None this will not affect the original column dtype. The output will be like this:

print(df)

print(df.dtypes)

id date role num fnum

0 1.0 2018-12-12 Support 123.0 3.14

1 NaN NaT NaN NaN NaN

2 NaN NaT None NaN NaN

3 4.0 2018-12-12 Sales 456.0 41.30

4 5.0 2018-12-12 Engineering 567.0 3.14

id float64

date datetime64[ns]

role object

num float64

fnum float64

dtype: object

So np.nan or None will not change the columns dtype, unless we set the all column rows to np.nan or None. In that case column will become float64 or object respectively.

You may try also setting single rows:

df.iloc[3,:] = 0 # will convert datetime to object only

df.iloc[4,:] = '' # will convert all columns to object

And to note here, if we set string inside a non string column it will become string or object dtype.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值