python pandas

最新推荐文章于 2024-05-27 15:04:33 发布

心如熊猫

最新推荐文章于 2024-05-27 15:04:33 发布

阅读量216

点赞数

分类专栏： python

本文链接：https://blog.csdn.net/weixin_45427650/article/details/105297923

版权

python 专栏收录该内容

10 篇文章 0 订阅

订阅专栏

Series and DataFrame

框架结构

一、Series

a = Series([1,5,3,2,6]) 创建数组 索引在左数值在右

0    1
1    5
2    3
3    2
4    6
dtype: int64


--------------------------------
a = Series([1,5,3,2,6],index =['a','b','c','d','e']) 指定索引

a    1
b    5
c    3
d    2
e    6
dtype: int64

a.index 查看索引
Index(['a', 'b', 'c', 'd', 'e'], dtype='object')
a.values 查看值
array([1, 5, 3, 2, 6], dtype=int64)
--------------------------------
a['d']  索引和值一一对应
2
a[2]
3
--------------------------------
series计算

a[a<6]
a    1
b    5
c    3
d    2
dtype: int64

a*2

a     2
b    10
c     6
d     4
e    12
dtype: int64

-------------------------
w = {
    '张三':92,
    '李四':89,
    '王五':78,
    '赵六':67
}

t = Series(w) 创建series数据
张三    92
李四    89
王五    78
赵六    67
dtype: int64
----------------------
t = Series(w)
t.index.name='nname'   给series定义名称增加可读性

nname
张三    92
李四    89
王五    78
赵六    67
dtype: int64

二、dataframe

q = {
    'name':['张三','李四','王五','小明'],
    'sex':['89','56','78','64'],
    'city':['北京','上海','天津','南京'],
    'year':['2001','2016','2019','2016']
}
i = DataFrame(q)     数据有行索引和列索引 

name sex	city	year
0	张三	89	北京	2001
1	李四	56	上海	2016
2	王五	78	天津	2019
3	小明	64	南京	2016
--------------------------------
i = DataFrame(q,columns=['name','city','sex','year']) 通过行索引排序
	name	city	sex	year
0	张三	北京	89	2001
1	李四	上海	56	2016
2	王五	天津	78	2019
3	小明	南京	64	2016

--------------------------------
e = {
    'sex':{'张三':'89','李四':'78','王五':'65'},
    'city':{'张三':'北京','李四':'上海','王五':'天津'}
}
s = DataFrame(e)    嵌套字典数据也可以创建dataframe数据

sex	city
张三	89	北京
李四	78	上海
王五	65	天津
------------------------------
s.values           可以将dataframe转换为二维数组
array([['89', '北京'],
       ['78', '上海'],
       ['65', '天津']], dtype=object)

索引对象不可进行修改

s.columns[0]='erer'   修改dataframe columns索引 则报错
    
TypeError                                 Traceback (most recent call last)
<ipython-input-71-ee15e91da0ed> in <module>
----> 1 s.columns[0]='erer'

G:\Anaconda\lib\site-packages\pandas\core\indexes\base.py in __setitem__(self, key, value)
   3908 
   3909     def __setitem__(self, key, value):
-> 3910         raise TypeError("Index does not support mutable operations")
   3911 
   3912     def __getitem__(self, key):

TypeError: Index does not support mutable operations

pandas索引操作

主要介绍 dataframe
set_index ：设置行索引；也可以将列数据作为行索引。

q = {
    'name':['张三','李四','王五','小明'],
    'sex':['89','56','78','64'],
    'city':['北京','上海','天津','南京'],
    'year':['2001','2016','2019','2016']
}
i = DataFrame(q)
i.set_index('name')

	sex	city	year
name			
张三	89	北京	2001
李四	56	上海	2016
王五	78	天津	2019
小明	64	南京	2016

dataframe 排序之后索引会一起发生改变
解决办法：使用reset_index函数

q = {
    'name':['张三','李四','王五','小明'],
    'sex':['89','56','78','64'],
    'city':['北京','上海','天津','南京'],
    'year':['2001','2016','2019','2016']
}
i = DataFrame(q)    正常排序
	name	sex	city	year
0	张三	89	北京	2001
1	李四	56	上海	2016
2	王五	78	天津	2019
3	小明	64	南京	2016

i2=i.sort_values(by='sex')  对sex排序
i2
	name	sex	city	year
1	李四	56	上海	2016
3	小明	64	南京	2016
2	王五	78	天津	2019
0	张三	89	北京	2001

i3=i2.reset_index()   索引重排 可以跳过直接进行下一步
i3
	index	name	sex	city	year
0	1	李四	56	上海	2016
1	3	小明	64	南京	2016
2	2	王五	78	天津	2019
3	0	张三	89	北京	2001

i3=i2.reset_index(drop=True)  删除原索引 
i3

name	sex	city	year
0	李四	56	上海	2016
1	小明	64	南京	2016
2	王五	78	天津	2019
3	张三	89	北京	2001

dataframe选取列：

q = {
    'name':['张三','李四','王五','小明'],
    'sex':['89','56','78','64'],
    'city':['北京','上海','天津','南京'],
    'year':['2001','2016','2019','2016']
}
i = DataFrame(q)
	name	sex	city	year
0	张三	89	北京	2001
1	李四	56	上海	2016
2	王五	78	天津	2019
3	小明	64	南京	2016

三种选取方式   不能用用切片的方式选取列
i['name']    选取
i.name       选取
i[['name','sex']]  选取

选取行

q = {
    'name':['张三','李四','王五','小明'],
    'sex':['89','56','78','64'],
    'city':['北京','上海','天津','南京'],
    'year':['2001','2016','2019','2016']
}
i = DataFrame(q)
	name	sex	city	year
0	张三	89	北京	2001
1	李四	56	上海	2016
2	王五	78	天津	2019
3	小明	64	南京	2016

选取方式：
可以进行切片：
i[0:2] 
如果name为索引则可以：i['张三':'李四']

获取单独几行

使用loc 、iloc
loc ：按行索引标签选取
iloc：按行索引位置选取

i2.loc['张三']

sex       89
city      北京
year    2001
Name: 张三, dtype: object
---
i2.loc[['张三','王五']]

	sex	city	year
name			
张三	89	北京	2001
王五	78	天津	2019

-------------------------------------------
i2.iloc['2']
i2.iloc[['1','2']]

选取部分行和列：

使用 ix
能同时限制行和列

	sex	city	year
name			
张三	89	北京	2001
李四	56	上海	2016
王五	78	天津	2019
小明	64	南京	2016

i2.ix[['张三','王五'],0:2]  限制行和列
i2.ix[:,['sex','year']] 不限制行数但 值输出 指定两列
i2.ix[[1,3],:]

布尔选择

可以使用：(!=),(-),(&),(|)


sex	city	year
name			
张三	89	北京	2001
李四	56	上海	2016
王五	78	天津	2019
小明	64	南京	2016
----------
i2['sex']=='56'
name
张三    False
李四     True
王五    False
小明    False
Name: sex, dtype: bool
---
i2[i2['sex']=='56']
	sex	city	year
name			
李四	56	上海	2016
-------
i2[(i2['sex']=='56')&(i2['city']=='上海')]
	sex	city	year
name			
李四	56	上海	2016

函数应用和映射：
map ：经函数套用在series的每个元素中
apply：将函数套用在dataframe的行与列上
applymap：将函数套用在dataframe的每个元素上
待补充

排序：

sort_index 对索引进行排序默认升序
sort_index(ascending=False) 降序

sort_values 对值进行排序对列进行排序
sort_values(by=’ ‘) 默认升序
sort_values(by=’ ',ascending=False) 降序

汇总

dataframe 函数中 sum函数可以对每列汇总
df.sum()
df.sum(axis=1) 对行进行汇总
df.describe()对每个数值型列进行统计显示最大，最小，和，百分数等。

唯一值和值计数：

对于series和dataframe都适用：
unique 可获得数组中的唯一值；
value_counts()可统计每个值出现的次数

r = Series(['a','b','b','c','a','d'])
r.unique()
array(['a', 'b', 'c', 'd'], dtype=object)
---
r.value_counts()
a    2
b    2
c    1
d    1
dtype: int64

pandas可视化

sersies创建点线图和柱状图
.plot() 点线
.plot(kind=‘bar’) 柱状
在这里插入图片描述

dataframe 点线图和柱状图-----------------------------------------------
df.plot() 点线图
df.plot(kind=‘bar’)柱状图
df.plot(kind=‘bar’,stacked=True,alpha=0.7) 堆积柱状图 alpha 设置颜色透明度
在这里插入图片描述

心如熊猫

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python pandas

Series and DataFrame一、Seriesa = Series([1,5,3,2,6]) 创建数组索引在左数值在右0 11 52 33 24 6dtype: int64--------------------------------a = Series([1,5,3,2,6],index =['a','b','c','d'...
复制链接

扫一扫