数据分析之pandas基本用法

最新推荐文章于 2024-07-12 16:41:07 发布

漫天飘雪13

最新推荐文章于 2024-07-12 16:41:07 发布

阅读量298

点赞数

文章标签：数据分析 python 数据挖掘

本文链接：https://blog.csdn.net/w1530/article/details/105698545

版权

pandas的常用数据类型

1.Series 一维，带标签数组，带标签的数组是前面的索引。
2.DataFrame二维，Series容器。

pandas之Series创建

import pandas as pd
t = pd.Series([1,2,3,4])
print(t,type(t))

输出：
0    1
1    2
2    3
3    4
dtype: int64 <class 'pandas.core.series.Series'>

指定索引

t = pd.Series([1,2,3,4],index=list('abcd'))
print(t)

输出：
a    1
b    2
c    3
d    4
dtype: int64

通过字典创建Series

#通过字典创建Series
temp_dict = {"name":"xioahong","age":30,"tel":10086}
t = pd.Series(temp_dict)
print(3)

输出：
name    xioahong
age           30
tel        10086
dtype: object

pandas之Series切片和索引

切片：直接传入起始位置，结束为止，步长即可。（start，end，strides）。
索引：一个的时候直接传入序号或者index，多个的时候传入序号或者index的列表。

temp_dict = {"name":"xioahong","age":30,"tel":10086}
t3 = pd.Series(temp_dict)
print(3)

输出：
name    xioahong
age           30
tel        10086
dtype: object

#通过索引获取对应元素值
print(t3['age'])
->30

#取连续的两行
print(t3[:2])
输出
name    xioahong
age           30
dtype: object

#取不连续的两行数据
print(t3[[0,2]])

输出：
name    xioahong
tel        10086
dtype: object

pandas的索引和值

#取出索引值
for i in t3.index:
	print(i)
输出：
name
age
tel

#获取values值
for i in t3.values:
	print(i)
输出：
xioahong
30
10086

print(type(t3.index))
#输出
<class 'pandas.core.indexes.base.Index'>

#用list对t3.index进行强制转换
print(list(t3.index))
输出：
['name', 'age', 'tel']

pandas读取外部数据

当数据存在csv中，我们可以直接使用pd.read_csv来进行读取。

#pandas读取csv中文件
import pandas as pd
df = pd.read_csv("./dogNames2.csv")
print(df)

pandas之DataFrame的创建和使用

DataFrame区别于Series，Series是一维的，只具备行索引，DataFrame对象既有行索引，又有类索引。
行索引：表明不同行，横向索引，叫index，0轴，axis=0；
列索引：表明不同列，纵向索引，叫columns，1轴，axis=1。

#DataFrame的创建
import pandas as pd
import numpy as np
t = pd.DataFrame(np.arange(12).reshape(3,4))
print(t)
输出：
   0  1   2   3
0  0  1   2   3
1  4  5   6   7
2  8  9  10  11

#字典作为数据传入DataFrame
import pandas as pd
d = [{"name":"xiaohu","age":23,"tel":10081},
    {"name":"xiaofang","age":24,"tel":10082},
    {"name":"xiaogang","tel":10086}]
d1 = pd.DataFrame(d)
print(d1)#当字典中缺少key和value值时，DataFrame中对应元素为空
输出：
    age     name    tel
0  23.0  xiaohau  10084
1  24.0   xiaohu  10085
2   NaN  xiaohau  10086

- DataFrame的描述信息

和一个ndarray一样，我们通过shape，ndim，dtype了解这个ndarray的基本信息，对于DataFrame我们通过以下方法来了解其是属性。

基本属性	功能
df.shape	查询行数列数
df.ndim	数据维度
df.index,df.columns	行索引，列索引
df.head（3）	显示头部3行
df.tail（3）	显示尾部3行信息
df .info()	相关信息概览：行数列数，列索引，列非空值个数
df.describe()	快速综合统计结果：计数，均值，标准差，最大值，四分位数，最小值

panda之loc和iloc

df.loc通过标签索引行数据
df.iloc通过位置获取行数据

t = pd.DataFrame(np.arange(12).reshape(3,4),index=list("abc"),columns=list("wxyz"))
print(t)
输出:
   w  x   y   z
a  0  1   2   3
b  4  5   6   7
c  8  9  10  11

#取DataFrame中具体某个数据值
print(t.loc["a","z"])
输出：
3

#取单独某一行数据
print(t.loc["a"])
print(type(t.loc["a"]))#返回值为series类型
输出：
w    0
x    1
y    2
z    3
Name: a, dtype: int32
<class 'pandas.core.series.Series'>

#取某一列元素值
print(t.loc[:,"y"])
输出：
a     2
b     6
c    10
Name: y, dtype: int32

#取不连续的多行值
print(t.loc[["a","c"],:])
输出：
   w  x   y   z
a  0  1   2   3
c  8  9  10  11

#取交叉位置的数据值
print(t.loc[["a","b"],["w","x"]])
输出：
  w  x
a  0  1
b  4  5

#loc()在连续取多行数据时可以取到最后一个，区别range()
print(t.loc["a":"c",["w","z"]])
输出：
   w   z
a  0   3
b  4   7
c  8  11

iloc的使用

print(t.iloc[1:,:2])#切片操作
输出：
   w  x
b  4  5
c  8  9

print(t.iloc[[0],[1]])#具体取某个数字
输出：
   x
a  1

print(t.iloc[[0,2],[2,1]])#取第0行和第二行，取第2列和第一列
输出：
  y  x
a   2  1
b   6  5
c  10  9

print(t.iloc[:,[2,1]])#取第二列和第一列
输出：
   y  x
a   2  1
b   6  5
c  10  9

print(t.iloc[1])#获取第一行数据
输出：
w    4
x    5
y    6
z    7
Name: b, dtype: int32

#将列表转为DataFrame
import pandas as pd
a =[[1,2,3],[4,5,6],[7,8,9]]
a1 = pd.DataFrame(a)#将列表转为DataFrame
print(a1)
输出：
  0  1  2
0  1  2  3
1  4  5  6
2  7  8  9

#1.将一个DataFrame转换成ndarry ---》np.array(df)
#2.将np.array()转换为一个嵌套列表[[1,2],[3,4]]
#3.将嵌套列表转为列表[[1,2],[3,4]]->[1,2,3,4]
import pandas as pd
import numpy as np
t = pd.DataFrame(np.arange(16).reshape(4,4),index=list("abcd"),columns=list("wxyz"))
t = np.array(t)#将DataFrame转换为np.ndarry
# print(t)
print(t.tolist())
t1 = [i for j in t for i in j]#先确定嵌套列表中列表个数记为j，然后依次遍历j个列表中每个元素i
print(t1)
输出：
[[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11], [12, 13, 14, 15]]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]