Pandas入门详细教程(一)

最新推荐文章于 2023-07-24 20:57:48 发布

qq_40697046

最新推荐文章于 2023-07-24 20:57:48 发布

阅读量2.2k

点赞数 3

文章标签： python 数据挖掘开发语言 pandas 人工智能

本文链接：https://blog.csdn.net/qq_40697046/article/details/121177735

版权

本文是Pandas入门教程，介绍了Pandas的基本概念、数据类型如Series和DataFrame，包括如何创建、切片、索引、读取外部数据。讲解了DataFrame的排序、选择行和列的方法，loc和iloc的使用，以及布尔索引和字符串方法。还讨论了缺失数据的处理和常用统计方法，是Python数据挖掘和分析的重要工具。

摘要由CSDN通过智能技术生成

pandas基本介绍

为什么要学习pandas

numpy能够帮我们处理处理数值型数据，但是这还不够，很多时候，我们的数据除了数值之外，还有字符串，还有时间序列等
比如：我们通过爬虫获取到了存储在数据库中的数据
比如：之前youtube的例子中除了数值之外还有国家的信息，视频的分类(tag)信息，标题信息等

所以，numpy能够帮助我们处理数值，但是pandas除了处理数值之外(基于numpy)，还能够帮助我们处理其他类型的数据

什么是 pandas ？

pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.

pandas 的常用数据类型

Series 一维，带标签数组

pandas之Series创建

import numpy as np
import pandas as pd
import string

# Series创建带索引的数组，index可以修改索引形式
t = pd.Series(np.arange(10), index = list("abcdefghij"))
t
'''**********************结果为**********************'''
a    0
b    1
c    2
d    3
e    4
f    5
g    6
h    7
i    8
j    9
dtype: int32
'''**********************查看结构**********************'''
type(t)

'''**********************结果为**********************'''
<class 'pandas.core.series.Series'>

pandas之Series切片和索引

切片：直接传入start end 或者步长即可
索引：一个的时候直接传入序号或者 index ，多个的时候传入序号或者 index 的列表

t
'''**********************结果为**********************'''
a    0
b    1
c    2
d    3
e    4
f    5
g    6
h    7
i    8
j    9
dtype: int32
'''*************************************************'''

t[2:10:2]

'''**********************结果为**********************'''
c    2
e    4
g    6
i    8
dtype: int32
'''************************索引：*************************'''
    
t[1]

'''**********************结果为**********************'''
1

'''*************************************************'''

t[[2,3,6]]

'''**********************结果为**********************'''
c    2
d    3
g    6
dtype: int32

'''*************************************************'''

t[t>4]

'''**********************结果为**********************'''
f    5
g    6
h    7
i    8
j    9
dtype: int32
 
'''*************************************************'''

t["f"]

'''**********************结果为**********************'''
5

t[["a","f","g"]]

'''**********************结果为**********************'''
a    0
f    5
g    6
dtype: int32

pandas之Series的索引和值

t.index
Index(['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'], dtype='object')

t.values
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

type(t.index)
<class 'pandas.core.indexes.base.Index'>

type(t.values)
<class 'numpy.ndarray'>

Series对象本质上由数组构成，一个数组构成对象的键（index, 索引），一个数组构成对象的值（values），键 -> 值
ndarray 的很多方法都可以运用于 series 类型，比如argmax，clip
series 具有 where 方法，但是结果和 ndarray 不同

pandas 之读取外部数据

当数据存储在csv中，可以直接使用 pd.read_csv

# coding=utf-8
import pandas as pd

# pandas 读取csv中的文件
df = pd.read_csv("./dogNames2.csv")

# 选取动物名称大于800，小于1000的信息
print(df[(800 < df["Count_AnimalName"])|(df["Count_AnimalName"] < 1000)])

结果为：

      Row_Labels  Count_AnimalName
0              1                 1
1              2                 2
2          40804                 1
3          90201                 1
4          90203                 1
...          ...               ...
16215      37916                 1
16216      38282                 1
16217      38583                 1
16218      38948                 1
16219      39743                 1

[16220 rows x 2 columns]

Process finished with exit code 0

DataFrame 二维，Series 容器

pandas 之 DataFrame

DataFrame****对象既有行索引，又有列索引
行索引，表明不同行，横向索引，叫index，0轴，axis=0

列索引，表名不同列，纵向索引，叫columns，1轴，axis=1

DataFrame和Series有什么关系呢？

1、DataFrame 是若干有序排列的Series对象。

2、DataFrame 可以看做是含有航索引和列索引的二维数组结构。

3、DataFrame 可以看做是特殊字典，反映了列索引到Series的映射关系。

4、DataFrame 的常见创建方法

a、通过Series对象字典创建
b、通过字典列表创建。
c、用 Numpy 二维数组创建。

DataFrame传入字典、mongodb数据

# coding=utf-8
from pymongo import MongoClient
import pandas as pd


client = MongoClient()
collection = client["douban"]["tv1"]
data = collection.find()
data_list = []
for i in data:
    temp = {
   }
    temp["info"]= i["info"]
    temp["rating_count"] = i["rating"]["count"]
    temp["rating_value"] = i["rating"]["value"]
    temp["title"] = i["title"]
    temp["country"] = i["tv_category"]
    temp["directors"] = i["directors"]
    temp["actors"] = i['actors']
    data_list.append(temp)
# t1 = data[0]
# t1 = pd.Series(t1)
# print(t1)

df = pd.DataFrame(data_list)
# print(df)

#显示头几行
print(df.head(1))
# print("*"*100)
# print(df.tail(2))

#展示df的概览
# print(df.info())
# print(df.describe())
print(df["info"].str.split("/").tolist())

DataFrame 的基本属性

在这里插入图片描述

pandas 之取行或者取列

dataFrame中排序的方法

# coding=utf-8
import pandas as pd

df = pd.read_csv("./dogNames2.csv")

# print(df.head())
# print("*"*50)
# print(df.info())
# print("*"*50)

# dataFrame中的排序方法
df = df.sort_values(by="Count_AnimalName",ascending=False)
print(df.head(5))
print("*"*50)

# pandas取行或列的注意点
# - 方括号写数组，表示取行，对行进行操作
# - 方括号写字符串，表示去列索引，对列进行操作
print(df[:20])
print("*"*50)

# 选择具体的某一列
print(df["Row_Labels"])
print("*"*50)

print(type(df["Row_Labels"</

最低0.47元/天解锁文章

qq_40697046

关注

3
点赞
踩
12

收藏

觉得还不错? 一键收藏
0
评论
Pandas入门详细教程(一)

目录pandas基本介绍为什么要学习pandas什么是 pandas ？pandas 的常用数据类型Series 一维，带标签数组pandas之Series创建pandas之Series切片和索引pandas之Series的索引和值pandas 之读取外部数据DataFrame 二维，Series 容器pandas 之 DataFrameDataFrame和Series有什么关系呢？DataFrame传入字典、mongodb数据DataFrame 的基本属性pandas 之取行或者取列dataFrame中
复制链接

扫一扫