数据分析常用库之Pandas之数据结构

最新推荐文章于 2024-06-23 22:24:17 发布

向日葵不嗑瓜子

最新推荐文章于 2024-06-23 22:24:17 发布

阅读量153

点赞数

分类专栏： python

本文链接：https://blog.csdn.net/qq_36640156/article/details/105407634

版权

python 专栏收录该内容

18 篇文章 1 订阅

订阅专栏

1.`pandas`

特点：
（1）专门用于数据处理和分析，拥有各种复杂函数；
（2）支持类似于SQL数据处理，支持时间序列分析等；

2.基础数据结构：`Series`类型

（1）
由一组数据和与之相对应的数据索引组成。数据可以是标量，列表和字典，数据索引默认[0, 1, 2…]，也可以使用index参数指定。
示例：
列表：

import pandas as pd

s = pd.Series([1, 2, 3])
print(s)
###
0    1
1    2
2    3
dtype: int64

指定索引：

s = pd.Series([1, 2, 3], index = ['a', 'b', 'c'])
###
a    1
b    2
c    3
dtype: int64

标量：

s = pd.Series(1, index = ['a', 'b', 'c'])
###
a    1
b    1
c    1
dtype: int64

字典：键值对中的“键”是索引

s = pd.Series({'a':1,'b':2})
###
a    1
b    2
dtype: int64

如果定义的index在原字典中已经存在，那么该索引会一直对应原字典的值，如果index对应不到原字典的值，则会返回NaN

s = pd.Series({'a':1,'b':2}, index = ['b', 'a', 'c'])
###
b    2.0
a    1.0
c    NaN
dtype: float64

索引(index)和数据(value)都可以通过ndarray类型，range()函数等创建;
操作类似于ndarray，具有自定义索引和自动索引;

a = pd.Series({'a':1,'b':5})
a.index
# Index(['a', 'b'], dtype='object')

 a.values  #返回一个多维数组numpy对象
# array([1, 5], dtype=int64)

#自动索引
a[0] #1
#自定义索引
 a['a']# 1
#不能混用
 a[['a',1]]
 ##
a    1.0
1    NaN
dtype: float64

在对字典操作时，对Index保留In操作，而值不可以

 'a' in a
# True

 1 in a
# False

Series类型在运算中会自动对齐不同索引的数据，并且可以随时修改并即刻生效

a = pd.Series([1,3,5],index = ['a','b','c'])

b = pd.Series([2,4,5,6],index = ['c,','d','e','b'])

print(a+b)
###
a     NaN
b     9.0
c     NaN
c,    NaN
d     NaN
e     NaN
dtype: float64

（2）方法

import numpy as np
import pandas as pd

series1 = pd.Series({'北京':2.8,'上海':3.01,'广州':8.99,'江苏':9.59,'浙江':5.18})
series1.ndim#1
series1.dtype#dtype('float64')
series1[0:3]#左闭右开（'北京':2.8,'上海':3.01,'广州':8.99）
series1['北京':'广州']#左闭右闭


#增加数据
series2 = pd.Series({'四川':5.33})
a = series1.append(series2)#不对原数据series1进行操作
print(series1,a)

北京    2.80
上海    3.01
广州    8.99
江苏    9.59
浙江    5.18
dtype: float64

北京    2.80
上海    3.01
广州    8.99
江苏    9.59
浙江    5.18
四川    5.33
dtype: float64

#删除数据
b = series1.drop('北京')#不对原数据series1进行操作
print(series1,b)
北京    2.80
上海    3.01
广州    8.99
江苏    9.59
浙江    5.18
dtype: float64 
上海    3.01
广州    8.99
江苏    9.59
浙江    5.18
dtype: float64


c = series1.drop('北京',inplace = True)#改变原数据series1且无返回值
print(series1,c)

上海    3.01
广州    8.99
江苏    9.59
浙江    5.18
dtype: float64 
None

3. 基础数据结构：`DataFrame`类型

链接: link.
1.基本语法结构：

pandas.DataFrame(data, index, dtype, columns)

data可以为数组、列表、字典
index代表行索引
columns代表列名或者列标签

2.常用属性
（1）三种创建方法：

import numpy as np
import pandas as pd

list1 = [['张三',23,'男'],['李四',24,'女'],['李四',21,'女']]
df1 = pd.DataFrame(list1,columns=['姓名','年龄','性别'])

df2 = pd.DataFrame({'姓名':['张三','李四','李四'],'年龄':[23,24,21],'性别':['男','女']})

array1 = np.array(list1)
df3 = pd.DataFrame(array1,columns = ['姓名','年龄','性别'],index = ['a','b','c'])

（2）基本方法：

import numpy as np
import pandas as pd
df2 = pd.DataFrame({'姓名':['张三','李四','李四'],'年龄':[23,24,21],'性别':['男','女']})

df2.values
df2.shape#查询表各维度值
df2.dtypes#查询每一列的数据类型
dsf2.columns.tolist()#输出列名

向日葵不嗑瓜子

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
数据分析常用库之Pandas之数据结构

1.pandas1、特点：（1）专门用于数据处理和分析，拥有各种复杂函数；（2）支持类似于SQL数据处理，支持时间序列分析等；2、基础函数（1）Series类型：由一组数据和与之相对应的数据索引组成。数据可以是标量，列表和字典，数据索引默认[0, 1, 2…]，也可以使用index参数指定。示例：列表：import pandas as pds = pd.Series([1, 2...
复制链接

扫一扫