getting_started_0

最新推荐文章于 2020-11-23 22:04:37 发布

涛涛北京

最新推荐文章于 2020-11-23 22:04:37 发布

阅读量125

点赞数

分类专栏： python

本文链接：https://blog.csdn.net/weixin_43522964/article/details/106464816

版权

python 专栏收录该内容

13 篇文章 0 订阅

订阅专栏

在这里插入图片描述

import numpy as np
import pandas as pd

Series类似于一维数组，不过添加了索引index

1、从ndarray创建

s = pd.Series(data=np.random.randint(0,5,size=3), index=['a', 'b', 'c'], dtype=int)  # index在默认情况下是012..，自设定的话要保证与data等长
s  # 支持重复index

a    1
b    1
c    3
dtype: int32

2、从字典创建

data_dict = {'world':2, 'hello':1, }

s2 = pd.Series(data_dict)  # 按照插入顺序，低版本可能按照字母序
s2

world    2
hello    1
dtype: int64

s2 = pd.Series(data_dict, index=['a', 'hello', 'world'])  # 按照插入顺序，低版本可能按照字母序
s2

a        NaN
hello    1.0
world    2.0
dtype: float64

3、从标量创建

s3 = pd.Series(5, index=['a', 'b', 'c'])  # 拷贝的次数和index长度相等
s3

a    5
b    5
c    5
dtype: int64

4、像数组一样操作

s[0]  # 索引，返回int32(取决于定义的时候)

s[0:3] # 切片，返回series

a    1
b    3
b    0
dtype: int32

s[[2,0]]  # array based index

b    0
a    1
dtype: int32

np.exp(s)

a     2.718282
b    20.085537
b     1.000000
dtype: float64

s.array  # 当不需要index的时候有用

<PandasArray>
[1, 3, 0]
Length: 3, dtype: int32

s.to_numpy()  # 返回真的array

array([1, 3, 0])

5、像字典一样操作

s2

a        NaN
hello    1.0
world    2.0
dtype: float64

s2['hello']

1.0

for i in s2.index:
    print(s2[i])

nan
1.0
2.0

'hello' in s2

True

s2.get('haha', np.asarray(0))  # 第二参数为默认值

array(0)

6、np方法

s + s

a    2
b    2
c    6
dtype: int32

a    1
b    1
c    3
dtype: int32

s[1:] + s[:-1] # nan + !nan = nan,所以差集的value为NaN

a    NaN
b    2.0
c    NaN
dtype: float64

(s[1:] + s[:-1]).dropna()

b    2.0
dtype: float64

DataFrame:可以理解为sql中的table，或者Series构成的字典，可以从以下创建

1、一维数组、列表、Series

2、二维列表

从字典创建

d = {"name":['jack', 'Tom'], "Age":[19, 20]}  # 从字典创建,key就是column

df1 = pd.DataFrame(d)  # 也可以指定index

df1

	name	Age
0	jack	19
1	Tom	20

df1.Age

0    19
1    20
Name: Age, dtype: int64

从Serise字典创建

d2 = {'name':pd.Series(data=['jack', 'Tom'], index=['a', 'b']),  # index是必须的，如果没有指定就默认为012...
      'Age':pd.Series(data=[19, 16], index=['a', 'b'])}

df2 = pd.DataFrame(d2)

df2

	name	Age
a	jack	19
b	Tom	16

当索引index不一样的时候，会返回union

d3 = {'name':pd.Series(data=['jack', 'Tom'], index=['a', 'b']),
      'Age':pd.Series(data=[19, 16])}
df3 = pd.DataFrame(d3)

df3

	name	Age
a	jack	NaN
b	Tom	NaN
0	NaN	19.0
1	NaN	16.0

pd.DataFrame(d2, index=['a', 'd'], columns=['name', 'M/F'])  # 如果指点的index在Series中就直接用，否则新增空数据，columns同理（至少创建一个空的）

	name	M/F
a	jack	NaN
d	NaN	NaN

从字典构成的列表创建

data = [{'a':1, 'b':2}, {'a':10, 'b':15, 'c':20}]  # key仍然是column，每一个字典都是独立的一行

pd.DataFrame(data)

	a	b	c
0	1	2	NaN
1	10	15	20.0

pd.DataFrame(data, columns=['a', 'b'])

	a	b
0	1	2
1	10	15

一般而言字典的key是作为columns，不过调用pd.DataFrame.from_dict()，并且将orient参数设置为’index’可以把key设置为索引

默认情况下，orient等于columns

pd.DataFrame.from_dict({'a':[1,2,3], 'b':[4,5,6]}, orient='index', columns=['first', 'second', 'third'])

	first	second	third
a	1	2	3
b	4	5	6

pd.DataFrame.from_dict({'a':[1,2,3], 'b':[4,5,6]})

	a	b
0	1	4
1	2	5
2	3	6

columns的增删改查

df4=pd.DataFrame.from_dict({'a':[1,2,3], 'b':[4,5,6]}, orient='index', columns=['first', 'second', 'third'])

df4

	first	second	third
a	1	2	3
b	4	5	6

查

df4['first']  # 返回Series

a    1
b    4
Name: first, dtype: int64

df4.query('first > 2').query('second < 15')  # 根据条件查

	first	second	third
b	4	5	6

改

df4['third'] = df4['first'] * df4['second']
df4

df[1,2] = 2 # 修改一行2列为2

	first	second	third
a	1	2	2
b	4	5	20

增

df4['flag'] = df4['first']>1
df4

	first	second	third	flag
a	1	2	2	False
b	4	5	20	True

df4['forth'] = 4
df4

	first	second	third	flag	forth
a	1	2	2	False	4
b	4	5	20	True	4

df4.insert(0, 'haha', df4['first'])  # 指定插入的位置
df4

	haha	first	second	third	flag	forth
a	1	1	2	2	False	4
b	4	4	5	20	True	4

del df4['haha']
del df4['flag']
df4

	first	second	third	forth
a	1	2	2	4
b	4	5	20	4

assign用来新增column

df4.assign(ratio=lambda x: x['first'] / x['second'])  # assign只返回原来dataframe的copy，并不会改变原来的数据

	first	second	third	forth	ratio
a	1	2	2	4	0.5
b	4	5	20	4	0.8

df4

	first	second	third	forth
a	1	2	2	4
b	4	5	20	4

df4.assign(sum2=df4['first'] + df4['second'])

	first	second	third	forth	sum2
a	1	2	2	4	3
b	4	5	20	4	9

删

del df4['forth']
df4

	first	second	third	flag
a	1	2	2	False
b	4	5	20	True

df4.pop('forth')

a    4
b    4
Name: forth, dtype: int64

df4

	first	second	third
a	1	2	3
b	4	5	6

df4.loc['b'].to_numpy()  # 行选

array([4, 5, 6], dtype=int64)

df4.iloc[:2, :]  # 行列切片->矩形范围

	first	second	third
a	1	2	3
b	4	5	6

DataFrame算术运算

df4+1

	first	second	third
a	2	3	4
b	5	6	7

df4.sub(df4['first'], axis=0)  # 减法

	first	second	third
a	0	1	2
b	0	1	2

np.sqrt(df4)  # 面向elements的操作，例如log、sqrt、exp都是可以直接使用的

	first	second	third
a	1.0	1.414214	1.732051
b	2.0	2.236068	2.449490

获取表格信息

df4.info()

<class 'pandas.core.frame.DataFrame'>
Index: 2 entries, a to b
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   first   2 non-null      int64
 1   second  2 non-null      int64
 2   third   2 non-null      int64
dtypes: int64(3)
memory usage: 144.0+ bytes

print(df4.to_string())

   first  second  third
a      1       2      3
b      4       5      6

用访问属性的方式访问列

 df6 = pd.DataFrame({'foo1': np.random.randn(5),'foo2': np.random.randn(5)})

df6

	foo1	foo2
0	-0.663151	-1.037235
1	0.492857	-0.012955
2	0.775553	0.736301
3	0.533148	-2.451446
4	-0.748240	-0.465478

df6.foo1

0   -0.663151
1    0.492857
2    0.775553
3    0.533148
4   -0.748240
Name: foo1, dtype: float64

涛涛北京

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
getting_started_0

import numpy as npimport pandas as pdSeries类似于一维数组，不过添加了索引index1、从ndarray创建s = pd.Series(data=np.random.randint(0,5,size=3), index=['a', 'b', 'c'], dtype=int) # index在默认情况下是012..，自设定的话要保证与data等长s # 支持重复indexa 1b 1c 3dtype: int322、
复制链接

扫一扫

专栏目录