Pandas常规问题汇总

最新推荐文章于 2023-03-15 22:48:17 发布

celine0227

最新推荐文章于 2023-03-15 22:48:17 发布

阅读量731

点赞数

分类专栏：机器学习文章标签： python r语言 pycharm

本文链接：https://blog.csdn.net/celine0227/article/details/119876977

版权

机器学习专栏收录该内容

6 篇文章 2 订阅

订阅专栏

一、Series

1. 创建series

(1) 列表创建

n [1]: import pandas as pd

In [2]: list_a = [2,4,5,6]

In [3]: pd.Series(list_a)
Out[3]:
0    2
1    4
2    5
3    6
dtype: int64

(2) 字典创建

In [5]: pd.Series({'a':1,'b':3})
Out[5]:
a    1
b    3
dtype: int64
#如果定义的index在原字典中已经存在，那么该索引会一直对应原字典的值，如果index对应不到原字典的值，则会返回NaN
In [11]: pd.Series({'a':1,'b':3},index = ['b','a','c'])
Out[11]:
b    3.0
a    1.0
c    NaN
dtype: float64

(3) 数组

import numpy as np
import pandas as pd
 
data = np.array([1, 2, 3])
ser = pd.Series(data.tolist())

(4) 其他

#range()函数
In [12]: pd.Series(range(3))
Out[12]:
0    0
1    1
2    2
dtype: int32
#ndarray，索引和数据都可以通过ndarray类型创建
In [9]: list_b = np.arange(6)
In [10]: pd.Series(list_b)
Out[10]:
0    0
1    1
2    2
3    3
4    4
5    5
dtype: int32

2. Series特征

(1) Series包含index和value两部分

In [14]: a = pd.Series({'a':1,'b':5})

In [15]: a.index
Out[15]: Index(['a', 'b'], dtype='object')

In [16]: a.values  #返回一个多维数组numpy对象
Out[16]: array([1, 5], dtype=int64)

复制代码

(2) Series如何转换成数组

data2 = pd.Series([1, 2, 3])
data2.values
out
1,2,3

(3) 访问Series值

#自动索引和自定义索引并存，但不能混用
In [17]: a[0]  #自动索引
Out[17]: 1
#自定义索引
In [18]: a['a']
Out[18]: 1

3. Series可随时修改并即时生效

In [32]: a.index = ['c','d','e']

In [33]: a
Out[33]:
c    1
d    3
e    5
dtype: int64

二、Dataframe

1. 创建Dataframe

（1）创建一个空的数据框架

import pandas as pd
df = pd.DataFrame()
print(df)
out
Empty DataFrame 
Columns: [] 
Index: []

（2）通过列表创建

一维列表

data = [1,2,3,4,5]
df = pd.DataFrame(data) # 将列表数据转化为 一列
print(df)
out
   0
0  1
1  2
2  3
3  4
4  5

二维列表

data = [['Alex',10],['Bob',12],['Clarke',13]]
df = pd.DataFrame(data,columns=['Name','Age']) # 将第一维度数据转为为行，第二维度数据转化为列，即 3 行 2 列，并设置列标签
print(df)
out
     Name  Age
0    Alex   10
1     Bob   12
2  Clarke   13

#设置数据格式
data = [['Alex',10],['Bob',12],['Clarke',13]]
df = pd.DataFrame(data,columns=['Name','Age'],dtype=float) # 将数字元素 自动转化为 浮点数
print(df)
out
     Name   Age
0    Alex  10.0
1     Bob  12.0
2  Clarke  13.0

(3) 通过字典创建

data = [{'a': 1, 'b': 2},{'a': 5, 'b': 10, 'c': 20}] # 列表对应的是第一维，即行，字典为同一行不同列元素
df = pd.DataFrame(data) # 第 1 行 3 列没有元素，自动添加 NaN (Not a Number)
print(df)
out
   a   b     c
0  1   2   NaN
1  5  10  20.0

特定表头元素

data = [{'a': 1, 'b': 2},{'a': 5, 'b': 10, 'c': 20}]

#With two column indices, values same as dictionary keys
df1 = pd.DataFrame(data, index=['first', 'second'], columns=['a', 'b']) # 指定表头都存在于 data，只取部分

#With two column indices with one index with other name
df2 = pd.DataFrame(data, index=['first', 'second'], columns=['a', 'b1']) # 指定表头中 b1 不存在，添加 b1 列，元素 NaN
print(df1)
print(df2)
out
        a   b
first   1   2
second  5  10
        a  b1
first   1 NaN
second  5 NaN

(4) 通过Series创建

d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),
   'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}
# index 与序列长度相投
# 字典不同的 key 代表一个列的表头，pd.Series 作为 value 作为该列的元素
df = pd.DataFrame(d)
print(df)
out
   one  two
a  1.0    1
b  2.0    2
c  3.0    3
d  NaN    4

2. 按行列

import numpy as np
from pandas import DataFrame
import pandas as pd
 
df=DataFrame(np.arange(12).reshape((3,4)),index=['one','two','thr'],columns=list('abcd'))
df['a']#取a列
df[['a','b']]#取a、b列

按照数字索引

#ix可以用数字索引，也可以用index和column索引
df.ix[0]#取第0行
df.ix[0:1]#取第0行
df.ix['one':'two']#取one、two行
df.ix[0:2,0]#取第0、1行，第0列
df.ix[0:1,'a']#取第0行，a列
df.ix[0:2,'a':'c']#取第0、1行，abc列
df.ix['one':'two','a':'c']#取one、two行，abc列
df.ix[0:2,0:1]#取第0、1行，第0列
df.ix[0:2,0:2]#取第0、1行，第0、1列

#iat取某个单值,只能数字索引
df.iat[1,1]#第1行，1列
#at取某个单值,只能index和columns索引
df.at['one','a']#one行，a列

按照行列名（只通过Index和columns来取）

df.loc['one','a']#one行，a列
df.loc['one':'two','a']#one到two行，a列
df.loc['one':'two','a':'c']#one到two行，a到c列
df.loc['one':'two',['a','c']]#one到two行，ac列

只能用数字索引

df.iloc[0:2]#前2行
df.iloc[0]#第0行
df.iloc[0:2,0:2]#0、1行，0、1列
df.iloc[[0,2],[1,2,3]]#第0、2行，1、2、3列

celine0227

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Pandas常规问题汇总

一、Series1. 创建series(1) 列表创建n [1]: import pandas as pdIn [2]: list_a = [2,4,5,6]In [3]: pd.Series(list_a)Out[3]:0 21 42 53 6dtype: int64(2) 字典创建In [5]: pd.Series({'a':1,'b':3})Out[5]:a 1b 3dtype: int64#如果定义的index
复制链接

扫一扫