数据分析之Pandas-详细版

最新推荐文章于 2023-12-30 13:51:29 发布

喝可乐加雪碧

最新推荐文章于 2023-12-30 13:51:29 发布

阅读量1.3k

点赞数 1

分类专栏：数据清洗文章标签： pandas 数据分析 python 数据挖掘 pip

本文链接：https://blog.csdn.net/m0_58468567/article/details/128752112

版权

一、为什么要学习pandas

二、Pandas读取外部数据

import pandas as pd

# 读取csv文件
df1 = pd.read_csv("./dogNames2.csv",encoding='gbk')

# 读取剪切板
df2 = pd.read_clipboard(sep,kwargs)

# 读取mysql数据库
df3 = pd.read_sql(sql_sentence,connection)

三、Pandas的常用数据类型

Series 一维，带标签数组（也可以叫做带索引的数组）
DataFrame 二维，Series 容器（保存多个Series）

四、Series数据类型

1. Series创建、指定索引以及修改dtype类型

import pandas as pd

# 通过列表来创建Series，也可以通过numpy.ndarray创建Series，也可以通过字典创建Series
t1 = pd.Series([3,2,4])
print(t1)
print(type(t1))
# 输出结果：
# 0    3
# 1    2
# 2    4
# dtype: int64
# <class 'pandas.core.series.Series'>

t11 = pd.Series(np.arange(3))
print(t11)
# 0    0
# 1    1
# 2    2
***************************************************************************************
# 指定Series的索引
t1 = pd.Series([3,2,4],index=list("abc"))   # 其中list()表示将括号里的变成列表的形式
print(t1)
print(type(t1))
# 输出结果：
# a    3
# b    2
# c    4
# dtype: int64
# <class 'pandas.core.series.Series'>
print(list("abc"))
# ['a', 'b', 'c']
***************************************************************************************
# 通过字典来创建Series，其中字典的键是Series的索引，字典的值是Series的值
temp_dict = {"name":"xiaoming","age":30,"tel":10086}
t2 = pd.Series(temp_dict)
print(t2)
# 输出结果：
# name    xiaoming
# age           30
# tel        10086
# dtype: object

import string
a= {string.ascii_uppercase[i]:i for i in range(10)}   # string.ascii_uppercase[i]表示用字符串字母作为索引
t3 = pd.Series(a)
print(t3)
# 输出结果：
# A    0
# B    1
# C    2
# D    3
# E    4
# F    5
# G    6
# H    7
# I    8
# J    9
# dtype: int64

# 重新给其指定其他索引后，如果能够对应上，则取其值，如果不能则为NaN
t3 = pd.Series(a,index=list(string.ascii_uppercase[5:15]))
print(t3)
# 输出结果：
# F    5.0
# G    6.0
# H    7.0
# I    8.0
# J    9.0
# K    NaN
# L    NaN
# M    NaN
# N    NaN
# O    NaN
# dtype: float64
# 此时存在NaN，则pandas会自动根据数值的数据类型更改Series的dtype类型
****************************************************************************
# 查看Series的dtype类型
t1.dtype

# 修改Series的dtype类型，和numpy的方法一样
t1.astype(float)   # 将t1的数值修改为float类型，注意该方法存在返回值，不会修改原始数据的dtype类型

2. Series切片和索引

t2
# name    xiaoming
# age           30
# tel        10086
# dtype: object
********************************
# 取单行                                          
# 用字符串索引来取值
t2["age"]
# 30

# 用数字索引来取值
t2[2]
# 10086
*********************************
# 取连续的多行
t2[:2]                    # 使用该方法无法取到第三行tel
# name    xiaoming
# age           30
# dtype: object

t2[:"tel"]                # 使用该方法可以取到第三行tel
# name    xiaoming
# age           30
# tel        10086
# dtype: object
*********************************
# 取不连续的多行
t2[["age","tel"]]
# age       30
# tel    10086
# dtype: object

t2[[0,2]]
# name    xiaoming
# tel        10086
# dtype: object
*********************************
# 布尔索引
t1[t1>10]     # 取数值大于10的所有行

3. Series索引和值

t2
# name    xiaoming
# age           30
# tel        10086
# dtype: object
******************************************************
# 提取索引
t2.index
# 输出结果：
# Index(['name', 'age', 'tel'], dtype='object')
    
# t2.index可以进行迭代
for i in t2.index:
    print(i)
# 输出结果
# name
# age
# tel

# 查看t2.index类型
type(t2.index)
# 输出结果
# <class 'pandas.core.indexes.base.Index'>

# 查看t2.index中存在几个数值
len(t2.index)
# 3

# 将t2.index转变为列表
list(t2.index)
# ['name', 'age', 'tel']

list(t2.index)[:2]
# ['name', 'age']
******************************************************
# 提取值
t2.values
# array(['xiaoming' 30 10086],dtype=object)

# t2.values可以进行迭代
for i in t2.values:
    print(i)
# 输出结果
# xiaoming
# 30
# 10086

# 查看t2.values类型
type(t2.values)
# 输出结果
# <class 'numpy.ndarray'>

# 查看t2.values中存在几个数值
len(t2.values)
# 3

# 将t2.values转变为列表
list(t2.values)
# ['xiaoming', 30, 10086]

list(t2.values)[:2]
# ['xiaoming', 30]

4. Series排序方法