创建pandas插入数据_Pandas的Categorical Data创建

最新推荐文章于 2024-05-26 20:33:20 发布

罗兹

最新推荐文章于 2024-05-26 20:33:20 发布

阅读量232

点赞数

文章标签：创建pandas插入数据

本文链接：https://blog.csdn.net/weixin_36006862/article/details/113010770

版权

16. Pandas的Categorical Data创建

前一章里介绍了Categorical Data的基本含义，本章就如何创建、使用本数据类型进行较为相近的解析。

需再说明一下Categorical Data和categories的区别，Categorical Data由两部分组成即categories和codes, categories是有限且唯一的分类的集合，codes是Categorical data的值对应于categories的编码用于存储。

16.1 创建Categorical Data数据

在Pandas里有很多的方式可以创建Categorical Data型的数据，可以基于已有的dataframe数据将模列转化成Catagorical data型的数据，也可直接创建Categorical data型数据，某些函数的返回值也有可能就是Categorical data型数据。

1). astype('category')方式创建，可以将某dataframe的某列直接转为Categorical Data型的数据。

import pandas as pd

import time

idx = [1,2,3,5,6,7,9,4,8]

name = ["apple","pearl","orange", "apple","orange","orange","apple","pearl","orange"]

price = [5.20,3.50,7.30,5.00,7.50,7.30,5.20,3.70,7.30]

#df = pd.DataFrame({ "fruit": name , "price" : price}, index = idx)

N = 1

df = pd.DataFrame({ "fruit": name * N, "price" : price * N}, index = idx * N)

df['fruit'] = df['fruit'].astype('category')

print df,"\n"

#print type(df.fruit.values)

print "df.price.values\n", df.price.values,"\n"

print "df.fruit.values\n", df.fruit.values, "\n"

这是前一章里使用的例子就是直接将dataframe的df的第2列即fruit由Series型数据直接转为categorical data型数据即category。

fruit price

1 apple 5.2

2 pearl 3.5

3 orange 7.3

5 apple 5.0

6 orange 7.5

7 orange 7.3

9 apple 5.2

4 pearl 3.7

8 orange 7.3

df.price.values

[5.2 3.5 7.3 5. 7.5 7.3 5.2 3.7 7.3]

df.fruit.values

[apple, pearl, orange, apple, orange, orange, apple, pearl, orange]

Categories (3, object): [apple, orange, pearl]

2). pandas.Categorical直接创建Categorical

import pandas as pd

val = ["apple","pearl","orange", "apple", "orange"]

cat = pd.Categorical(val)

print "type is",type(cat)

print "*" * 20

print "categorical data:\n",cat

print "*" * 20

print cat.categories

print cat.codes

程序执行结果：

type is

********************

categorical data:

[apple, pearl, orange, apple, orange]

Categories (3, object): [apple, orange, pearl]

********************

Index([u'apple', u'orange', u'pearl'], dtype='object')

[0 2 1 0 1]

********************

val是python的列表，而cat则是categorical data数据类型，有categories和codes属性，分别表示数据存储时的分类和编码。

3). 用categoris和codes生成Categorical Data，categories要求唯一、有限，codes可以任意定义。

import pandas as pd

val = ["apple","pearl","orange", "apple", "orange"]

cat = pd.Categorical(val)

print "type is",type(cat)

print "*" * 20

print "categorical data:\n",cat

print "*" * 20

print cat.categories

print cat.codes

print "*" * 20

codes = pd.Series([0,1, 0,2,1,0,2,0])

print "create categorical data:"

print cat.take(codes)

print pd.Categorical.take(cat, codes)

print cat.from_codes(codes, cat.categories)

程序执行结果：

type is

********************

categorical data:

[apple, pearl, orange, apple, orange]

Categories (3, object): [apple, orange, pearl]

********************

Index([u'apple', u'orange', u'pearl'], dtype='object')

[0 2 1 0 1]

********************

create categorical data:

[apple, pearl, apple, orange, pearl, apple, orange, apple]

Categories (3, object): [apple, orange, pearl]

[apple, pearl, apple, orange, pearl, apple, orange, apple]

Categories (3, object): [apple, orange, pearl]

[apple, orange, apple, pearl, orange, apple, pearl, apple]

Categories (3, object): [apple, orange, pearl]

程序里的cat变量是基于列表val创建的一个categorical data数据，cat有categories和codes属性。下面用cat的categories作为分类集来生成另一个categorical。

Categorical Data的实例对象调用take函数，一个categorical的实例对象cat可以传入"要查询"的编码表codes给take函数获得其对应的值，即给出编码找对应的分类。

print cat.take(codes)

"查出"的数据为：

[apple, pearl, apple, orange, pearl, apple, orange, apple]

Categories (3, object): [apple, orange, pearl]

pd.Categorical类调用take函数，这时形参有两个，一个是pd.Categorical的实例对象cat，另一个是编码表。

print pd.Categorical.take(cat, codes)

"查询"结果：

[apple, pearl, apple, orange, pearl, apple, orange, apple]

Categories (3, object): [apple, orange, pearl]

Categorical Data的实例对象调用from_codes函数，此函数需要传入“查询”编码表和分类即categories。

print cat.from_codes(codes, cat.categories)

"查询"结果：

[apple, pearl, apple, orange, pearl, apple, orange, apple]

Categories (3, object): [apple, orange, pearl]

16.2 DataFrame里插入Categorical Data

可以利用pandas.Categorical创建的Categorical data数据插入到DataFrame里。

import pandas as pd

idx = [1,2,3,5,6,7,9,4,8]

fruit = ["apple","pearl","orange", "apple","orange","orange","apple","pearl","orange"]

price = [5.20,3.50,7.30,5.00,7.50,7.30,5.20,3.70,7.30]

df = pd.DataFrame({"price" : price}, index = idx)

print df

cat = pd.Categorical(fruit)

df["fruit"] = cat

print df

print cat.codes

print cat.categories

程序执行结果：

price

1 5.2

2 3.5

3 7.3

5 5.0

6 7.5

7 7.3

9 5.2

4 3.7

8 7.3

price fruit

1 5.2 apple

2 3.5 pearl

3 7.3 orange

5 5.0 apple

6 7.5 orange

7 7.3 orange

9 5.2 apple

4 3.7 pearl

8 7.3 orange

[0 2 1 0 1 1 0 2 1]

Index([u'apple', u'orange', u'pearl'], dtype='object')

当然先创建DataFrame再将某列用astype('category')转也可以。

罗兹

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
创建pandas插入数据_Pandas的Categorical Data创建

16. Pandas的Categorical Data创建前一章里介绍了Categorical Data的基本含义，本章就如何创建、使用本数据类型进行较为相近的解析。需再说明一下Categorical Data和categories的区别，Categorical Data由两部分组成即categories和codes, categories是有限且唯一的分类的集合，codes是Categorica...
复制链接

扫一扫

创建pandas插入数据_Pandas的Categorical Data创建

“相关推荐”对你有帮助么？