Python中的Pandas DataFrame

Python | 熊猫数据框 (Python | Pandas DataFrame)

A DataFrame is a two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). Arithmetic operations align on both row and column labels. It can be thought of as a dict-like container for Series objects.

DataFrame是带有标签轴(行和列)的二维大小可变的,可能是异构的表格数据结构。 算术运算在行和列标签上对齐。 可以将其视为Series对象的类似dict的容器

Syntax:

句法:

    class pandas.DataFrame(
        data=None, 
        index=None, 
        columns=None, 
        dtype=None, 
        copy=False
        )

创建DataFrame的示例 (Example creation of DataFrame)

import numpy as np
import pandas as pd
from numpy.random import randn

np.random.seed(101)

df = pd.DataFrame(randn(5,4), ['A','B','C','D','E'],['W','X','Y','Z'])
print(df)

Output

输出量

          W         X         Y         Z
A  2.706850  0.628133  0.907969  0.503826
B  0.651118 -0.319318 -0.848077  0.605965
C -2.018168  0.740122  0.528813 -0.589001
D  0.188695 -0.758872 -0.933237  0.955057
E  0.190794  1.978757  2.605967  0.683509

In the above example, each of the columns is a series and the respective rows are the common index-labels.

在上面的示例中,每个列都是一个系列 ,相应的行是公共索引标签。

In order to do indexing and selection, the approach followed is,

为了进行索引和选择,遵循的方法是

print(df['W'])
'''
Output:
A    2.706850
B    0.651118
C   -2.018168
D    0.188695
E    0.190794
Name: W, dtype: float64
'''

print(type(df['W']))
'''
Output:
<class 'pandas.core.series.Series'>
'''

The above explains that dataframe is a bunch of series with common index-labels. Another approach to retrieve the series from the dataframe is following the SQL way (less preferred way),

上面说明了数据帧是一堆带有常见索引标签的序列。 从数据框中检索序列的另一种方法是遵循SQL方法(不太受欢迎的方法),

print(df.W)

'''
Output:
A    2.706850
B    0.651118
C   -2.018168
D    0.188695
E    0.190794
Name: W, dtype: float64
'''

To get multiple columns from the dataframes

从数据框中获取多个列

print(df[['W','X']])
'''
Output:
          W         X
A  2.706850  0.628133
B  0.651118 -0.319318
C -2.018168  0.740122
D  0.188695 -0.758872
E  0.190794  1.978757
'''
print(df[list('W''X')])

'''
Output:
          W         X
A  2.706850  0.628133
B  0.651118 -0.319318
C -2.018168  0.740122
D  0.188695 -0.758872
E  0.190794  1.978757
'''

To create a new column in a dataframe

在数据框中创建新列

df['new'] = df['X']+df['Y']
print(df)

'''
Output:
          W         X         Y         Z       new
A  2.706850  0.628133  0.907969  0.503826  1.536102
B  0.651118 -0.319318 -0.848077  0.605965 -1.167395
C -2.018168  0.740122  0.528813 -0.589001  1.268936
D  0.188695 -0.758872 -0.933237  0.955057 -1.692109
E  0.190794  1.978757  2.605967  0.683509  4.584725
'''

To remove the column in a dataframe

删除数据框中的列

# doesn't remove from the object df
df.drop('W', axis=1) 
print(df)
'''
Output:
          W         X         Y         Z       new
A  2.706850  0.628133  0.907969  0.503826  1.536102
B  0.651118 -0.319318 -0.848077  0.605965 -1.167395
C -2.018168  0.740122  0.528813 -0.589001  1.268936
D  0.188695 -0.758872 -0.933237  0.955057 -1.692109
E  0.190794  1.978757  2.605967  0.683509  4.584725
'''

df = df.drop('W', axis=1)
print(df)
'''
Output:
          X         Y         Z       new
A  0.628133  0.907969  0.503826  1.536102
B -0.319318 -0.848077  0.605965 -1.167395
C  0.740122  0.528813 -0.589001  1.268936
D -0.758872 -0.933237  0.955057 -1.692109
E  1.978757  2.605967  0.683509  4.584725
'''

# use inplace = True to retain the changes
df.drop('X', axis=1, inplace = True)
print(df)
'''
Output:
          Y         Z       new
A  0.907969  0.503826  1.536102
B -0.848077  0.605965 -1.167395
C  0.528813 -0.589001  1.268936
D -0.933237  0.955057 -1.692109
E  2.605967  0.683509  4.584725
'''

To remove a row from the dataframe

从数据框中删除一行

df.drop('E', axis=0, inplace = True)
print(df)
'''
Output:
          Y         Z       new
A  0.907969  0.503826  1.536102
B -0.848077  0.605965 -1.167395
C  0.528813 -0.589001  1.268936
D -0.933237  0.955057 -1.692109
'''

To order to explain the reasoning behind the value 0 and 1 to axis, we have to know the shape of the dataframe

为了解释轴值0和1背后的原因,我们必须知道数据框的形状

print(df)
'''
Output:
          Y         Z       new
A  0.907969  0.503826  1.536102
B -0.848077  0.605965 -1.167395
C  0.528813 -0.589001  1.268936
D -0.933237  0.955057 -1.692109
'''

print(df.shape)
'''
Output:
(4, 3)
'''

The return type of shape is a tuple, and in above example the 0th index of tuple (4) refers to number of rows and 1st index of tuple (3) refers to the number of columns and hence the value given to axis as 0 or 1 while deleting the row/column.

形状的返回类型为元组,在上面的示例中,元组的 0 索引(4)表示行数,元组的 1个索引(3)表示列数,因此,将给定的axis值指定为删除行/列时为0或1。

Selecting rows in a dataFrame

在dataFrame中选择行

print(df)
'''
Output:
          Y         Z       new
A  0.907969  0.503826  1.536102
B -0.848077  0.605965 -1.167395
C  0.528813 -0.589001  1.268936
D -0.933237  0.955057 -1.692109
'''
# here the argument is the location based index
print(df.loc['B'])
'''
Output:
Y     -0.848077
Z      0.605965
new   -1.167395
Name: B, dtype: float64
'''

# here the argument is the numerical based index of the row
print(df.iloc[1] )
'''
Output:
Y     -0.848077
Z      0.605965
new   -1.167395
Name: B, dtype: float64
'''

Selecting subsets of rows and columns

选择行和列的子集

print(df)
'''
Output:
          Y         Z       new
A  0.907969  0.503826  1.536102
B -0.848077  0.605965 -1.167395
C  0.528813 -0.589001  1.268936
D -0.933237  0.955057 -1.692109
'''
# row, column
print(df.loc['C','Y'])
'''
Output: 0.5288134940893595
'''
# pass the list of rows and columns to get the subsets
print(df.loc[['B','C'],['Y','Z']])
'''
Output:
          Y         Z
B -0.848077  0.605965
C  0.528813 -0.589001
'''

翻译自: https://www.includehelp.com/python/pandas-dataframe-in-python.aspx

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值