Pandas中的数据结构Series和DataFrame（数据酷客学习笔记）

最新推荐文章于 2022-01-23 18:57:38 发布

你今天学习了嘛

最新推荐文章于 2022-01-23 18:57:38 发布

阅读量754

点赞数

文章标签： python

本文链接：https://blog.csdn.net/arrogantt/article/details/105755426

版权

创建Series对象

● 基本的方式为:
pd.Series(data, index=index)
●data可以是很多类型:列表，ndarray，Python字典，标量值

从数组创建

import pandas as pd
a = pd.Series([150,62,31,20])
print(a)
print(a.values)
print(a.index)

0 150
1 62
2 31
3 20
dtype: int64
[150 62 31 20]
RangeIndex(start=0, stop=4, step=1)

从列表创建

import numpy as np
import pandas as pd
pd.Series(np.random.randn(4),index=['a','b','c','d']) # 指定的index必须与data长度相同，多出来的索引项用空值填充

a -0.333946
b 0.499868
c -0.356603
d -1.138034
dtype: float64

从字典创建

pd.Series({'a':1,'b':2,'c':3,'d':4})

a 1
b 2
c 3
d 4
dtype: int64

创建DataFrame对象

●基本的方式为:
pd.DataFrame(data,index,columns)
●与Series不同, DataFrame包括索lindex和表头columns
●data允许输入以下一些格式:
》包含列表,字典或Series的字典
》二维数组
》一个Series对象
》另一个DataFrame对象

pd.Series(2,index=['a','b','c'])

a    2
b    2
c    2
dtype: int64

# Series在算术运算中会自动对齐不同索引的数据。

a1 = pd.Series([1,2,3,4],index=['a','b','c','d'])

a2 = pd.Series([4,3,2,1],index=['d','c','b','a'])

print(a1+a2)

a    2
b    4
c    6
d    8
dtype: int64

# 可以使用unique()去重 用values_counts()统计每个数据频数 用astype()转换类型 

b = {'one':pd.Series([1,2,3],index=['a','b','c']),'two':pd.Series([1,2,4],index=['a','b','d'])}

m = pd.DataFrame(b)

print(m)

print(m.index)

print(m.columns)

   one  two
a  1.0  1.0
b  2.0  2.0
c  3.0  NaN
d  NaN  4.0
Index(['a', 'b', 'c', 'd'], dtype='object')
Index(['one', 'two'], dtype='object')

m['three']=m['one']+m['two']

m['f'] = m['one']>1

print(m)

   one  two  three      f
a  1.0  1.0    2.0  False
b  2.0  2.0    4.0   True
c  3.0  NaN    NaN   True
d  NaN  4.0    NaN  False

m.insert(1,'bar',m['one'])

m

	one 	bar 	two 	three 	f
a 	1.0 	1.0 	1.0 	2.0 	False
b 	2.0 	2.0 	2.0 	4.0 	True
c 	3.0 	3.0 	NaN 	NaN 	True
d 	NaN 	NaN 	4.0 	NaN 	False

del m['f']

m

	one 	bar 	two 	three
a 	1.0 	1.0 	1.0 	2.0
b 	2.0 	2.0 	2.0 	4.0
c 	3.0 	3.0 	NaN 	NaN
d 	NaN 	NaN 	4.0 	NaN

bar = m.pop('bar')

bar

a    1.0
b    2.0
c    3.0
d    NaN
Name: bar, dtype: float64

索引

# 通过标签选取某一行/列

b = {'one':pd.Series([1,2,3],index=['a','b','c']),'two':pd.Series([1,2,4],index=['a','b','d'])}

m = pd.DataFrame(b)

m

	one 	two
a 	1.0 	1.0
b 	2.0 	2.0
c 	3.0 	NaN
d 	NaN 	4.0

m.loc['a']

one    1.0
two    1.0
Name: a, dtype: float64

m.loc['a','one']

1.0
# df,iloc[loc] :通过位置(整数表示)获取某-行/列,其中字母”"表示"index"。记住在这种方式下,一定要用整数或者整数列表进行索引。

Panel

●Panel(面板) 是一种三维数据容器。
●Panel 数据结构借鉴了经济计量学中的面板数据结构。- -个Panel对象主要由
三个轴构成:
●items- axis0 ,每个项目对应于内部包含的DataFrame。
●major_axis - axis1,它是每个DataFrame的索引(行)。
●minor_axis - axis2 ,它是每个DataFrame的列。

创建：np.panel()

你今天学习了嘛

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Pandas中的数据结构Series和DataFrame（数据酷客学习笔记）

创建Series对象● 基本的方式为:pd.Series(data, index=index)●data可以是很多类型:列表，ndarray，Python字典，标量值从数组创建import pandas as pda = pd.Series([150,62,31,20])print(a)print(a.values)print(a.index)0 1501 6...
复制链接

扫一扫