Pandas基础介绍

最新推荐文章于 2024-09-03 16:26:28 发布

qq_43592077

最新推荐文章于 2024-09-03 16:26:28 发布

阅读量300

点赞数

本文链接：https://blog.csdn.net/qq_43592077/article/details/106946752

版权

文章目录

一、对象创建

1.通过Series创建

Series 是一种类似于一维数组的对象, 由一组数据和一组与之相关的数据标签(索引)组成
Creating a Series by passing a list of values, letting pandas create a default integer index:

import numpy as np
import pandas as pd
s = pd.Series([1, 3, 5, np.nan, 6, 8])
s

0    1.0
1    3.0
2    5.0
3    NaN
4    6.0
5    8.0
dtype: float64

2.通过DataFrame创建

Creating a DataFrame by passing a NumPy array, with a datetime index and labeled columns:

dates = pd.date_range('20130101', periods=6)
dates

DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04',
               '2013-01-05', '2013-01-06'],
              dtype='datetime64[ns]', freq='D')

df = pd.DataFrame(np.random.randn(6, 4), index=dates, columns=list('ABCD'))
df

                  A         B         C         D
2013-01-01  0.469112 -0.282863 -1.509059 -1.135632
2013-01-02  1.212112 -0.173215  0.119209 -1.044236
2013-01-03 -0.861849 -2.104569 -0.494929  1.071804
2013-01-04  0.721555 -0.706771 -1.039575  0.271860
2013-01-05 -0.424972  0.567020  0.276232 -1.087401
2013-01-06 -0.673690  0.113648 -1.478427  0.524988

Creating a DataFrame by passing a dict of objects that can be converted to series-like.

df2 = pd.DataFrame({'A': 1.,
   ...:                     'B': pd.Timestamp('20130102'),
   ...:                     'C': pd.Series(1, index=list(range(4)), dtype='float32'),
   ...:                     'D': np.array([3] * 4, dtype='int32'),
   ...:                     'E': pd.Categorical(["test", "train", "test", "train"]),
   ...:                     'F': 'foo'})

 A          B    C  D      E    F
0  1.0 2013-01-02  1.0  3   test  foo
1  1.0 2013-01-02  1.0  3  train  foo
2  1.0 2013-01-02  1.0  3   test  foo
3  1.0 2013-01-02  1.0  3  train  foo

The columns of the resulting DataFrame have different dtypes.

df2.dtypes

A           float64
B    datetime64[ns]
C           float32
D             int32
E          category
F            object
dtype: object

3. DataFrame 的基本属性和整体情况查询

在这里插入图片描述

二、数据查看

df.head() #默认前5个

                A         B         C         D
2013-01-01  0.469112 -0.282863 -1.509059 -1.135632
2013-01-02  1.212112 -0.173215  0.119209 -1.044236
2013-01-03 -0.861849 -2.104569 -0.494929  1.071804
2013-01-04  0.721555 -0.706771 -1.039575  0.271860
2013-01-05 -0.424972  0.567020  0.276232 -1.087401

df.tail(3)

                A         B         C         D
2013-01-04  0.721555 -0.706771 -1.039575  0.271860
2013-01-05 -0.424972  0.567020  0.276232 -1.087401
2013-01-06 -0.673690  0.113648 -1.478427  0.524988

Display the index, columns:

df.index

DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04',
               '2013-01-05', '2013-01-06'],
              dtype='datetime64[ns]', freq='D')

 df.columns

 Index(['A', 'B', 'C', 'D'], dtype='object')

三、Series应用NumPy数组运算

1.NumPy中运算和操作可用于Series类型

In [26]: series04
Out[26]:
20071001       6789.98
20071002      34556.89
20071003    3748758.88

In [27]: series04[series04>10000]
Out[27]:
20071002      34556.89
20071003    3748758.88
dtype: float64

In [28]: series04/100
Out[28]:
20071001       67.8998
20071002      345.5689
20071003    37487.5888
dtype: float64

In [29]: series01
Out[29]:
0    1
1    2
2    3
3    4
dtype: int32

In [30]: np.exp(series01)
Out[30]:
0     2.718282
1     7.389056
2    20.085537
3    54.598150
dtype: float64

2.Series类型的操作类似Python字典类型

• 通过自定义索引访问
• 保留字in操作
• 使用.get()方法

In [18]: b=pd.Series([9,8,7,6],index=list('abcd'))

In [19]: b['b']
Out[19]: 8

In [20]: 'c' in b
Out[20]: True

In [23]: b.get('f',100)
Out[23]: 100

In [25]: b.get('c',100)
Out[25]: 7

3.Series缺失值检测

pandas中的isnull和notnull函数

In [31]: score=pd.Series({'Tom':89,'John':88,'Merry':96,'Max':65})

In [32]: score
Out[32]:
Tom      89
John     88
Merry    96
Max      65
dtype: int64

In [33]: new_index=['Tom','Max','Joe','John','Merry']

In [34]: scores = pd.Series(score,index=new_index)

In [35]: scores
Out[35]:
Tom      89.0
Max      65.0
Joe       NaN
John     88.0
Merry    96.0
dtype: float64

pandas中的isnull和notnull函数可用于Series缺失值检测。
isnull和notnull都返回一个布尔类型的Series。

In [37]: pd.isnull(scores)
Out[37]:
Tom      False
Max      False
Joe       True
John     False
Merry    False
dtype: bool

In [38]: pd.notnull(scores)
Out[38]:
Tom       True
Max       True
Joe      False
John      True
Merry     True
dtype: bool

In [39]: scores[pd.isnull(scores)]
Out[39]:
Joe   NaN
dtype: float64

In [40]: scores[pd.notnull(scores)]
Out[40]:
Tom      89.0
Max      65.0
John     88.0
Merry    96.0
dtype: float64