Python中的Pandas DataFrame

最新推荐文章于 2024-07-27 12:20:46 发布

cumtv80668

最新推荐文章于 2024-07-27 12:20:46 发布

阅读量277

点赞数

文章标签： python numpy 机器学习数据分析索引

原文链接：https://www.includehelp.com/python/pandas-dataframe-in-python.aspx

版权

Python | 熊猫数据框 (Python | Pandas DataFrame)

A DataFrame is a two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). Arithmetic operations align on both row and column labels. It can be thought of as a dict-like container for Series objects.

DataFrame是带有标签轴(行和列)的二维大小可变的，可能是异构的表格数据结构。算术运算在行和列标签上对齐。可以将其视为Series对象的类似dict的容器 。

Syntax:

句法：

    class pandas.DataFrame(
        data=None, 
        index=None, 
        columns=None, 
        dtype=None, 
        copy=False
        )

创建DataFrame的示例 (Example creation of DataFrame)

import numpy as np
import pandas as pd
from numpy.random import randn

np.random.seed(101)

df = pd.DataFrame(randn(5,4), ['A','B','C','D','E'],['W','X','Y','Z'])
print(df)

Output

输出量

          W         X         Y         Z
A  2.706850  0.628133  0.907969  0.503826
B  0.651118 -0.319318 -0.848077  0.605965
C -2.018168  0.740122  0.528813 -0.589001
D  0.188695 -0.758872 -0.933237  0.955057
E  0.190794  1.978757  2.605967  0.683509

In the above example, each of the columns is a series and the respective rows are the common index-labels.

在上面的示例中，每个列都是一个系列，相应的行是公共索引标签。

In order to do indexing and selection, the approach followed is,

为了进行索引和选择，遵循的方法是

print(df['W'])
'''
Output:
A    2.706850
B    0.651118
C   -2.018168
D    0.188695
E    0.190794
Name: W, dtype: float64
'''

print(type(df['W']))
'''
Output:
<class 'pandas.core.series.Series'>
'''

The above explains that dataframe is a bunch of series with common index-labels. Another approach to retrieve the series from the dataframe is following the SQL way (less preferred way),

上面说明了数据帧是一堆带有常见索引标签的序列。从数据框中检索序列的另一种方法是遵循SQL方法(不太受欢迎的方法)，

print(df.W)

'''
Output:
A    2.706850
B    0.651118
C   -2.018168
D    0.188695
E    0.190794
Name: W, dtype: float64
'''

To get multiple columns from the dataframes

从数据框中获取多个列

print(df[['W','X']])
'''
Output:
          W         X
A  2.706850  0.628133
B  0.651118 -0.319318
C -2.018168  0.740122
D  0.188695 -0.758872
E  0.190794  1.978757
'''
print(df[list('W''X')])

'''
Output:
          W         X
A  2.706850  0.628133
B  0.651118 -0.319318
C -2.018168  0.740122
D  0.188695 -0.758872
E  0.190794  1.978757
'''

To create a new column in a dataframe

在数据框中创建新列

df['new'] = df['X']+df['Y']
print(df)

'''
Output:
          W         X         Y         Z       new
A  2.706850  0.628133  0.907969  0.503826  1.536102
B  0.651118 -0.319318 -0.848077  0.605965 -1.167395
C -2.018168  0.740122  0.528813 -0.589001  1.268936
D  0.188695 -0.758872 -0.933237  0.955057 -1.692109
E  0.190794  1.978757  2.605967  0.683509  4.584725
'''

To remove the column in a dataframe

删除数据框中的列

# doesn't remove from the object df
df.drop('W', axis=1) 
print(df)
'''
Output:
          W         X         Y         Z       new
A  2.706850  0.628133  0.907969  0.503826  1.536102
B  0.651118 -0.319318 -0.848077  0.605965 -1.167395
C -2.018168  0.740122  0.528813 -0.589001  1.268936
D  0.188695 -0.758872 -0.933237  0.955057 -1.692109
E  0.190794  1.978757  2.605967  0.683509  4.584725
'''

df = df.drop('W', axis=1)
print(df)
'''
Output:
          X         Y         Z       new
A  0.628133  0.907969  0.503826  1.536102
B -0.319318 -0.848077  0.605965 -1.167395
C  0.740122  0.528813 -0.589001  1.268936
D -0.758872 -0.933237  0.955057 -1.692109
E  1.978757  2.605967  0.683509  4.584725
'''

# use inplace = True to retain the changes
df.drop('X', axis=1, inplace = True)
print(df)
'''
Output:
          Y         Z       new
A  0.907969  0.503826  1.536102
B -0.848077  0.605965 -1.167395
C  0.528813 -0.589001  1.268936
D -0.933237  0.955057 -1.692109
E  2.605967  0.683509  4.584725
'''

To remove a row from the dataframe

从数据框中删除一行

df.drop('E', axis=0, inplace = True)
print(df)
'''
Output:
          Y         Z       new
A  0.907969  0.503826  1.536102
B -0.848077  0.605965 -1.167395
C  0.528813 -0.589001  1.268936
D -0.933237  0.955057 -1.692109
'''

To order to explain the reasoning behind the value 0 and 1 to axis, we have to know the shape of the dataframe

为了解释轴值0和1背后的原因，我们必须知道数据框的形状

print(df)
'''
Output:
          Y         Z       new
A  0.907969  0.503826  1.536102
B -0.848077  0.605965 -1.167395
C  0.528813 -0.589001  1.268936
D -0.933237  0.955057 -1.692109
'''

print(df.shape)
'''
Output:
(4, 3)
'''

The return type of shape is a tuple, and in above example the 0^th index of tuple (4) refers to number of rows and 1^st index of tuple (3) refers to the number of columns and hence the value given to axis as 0 or 1 while deleting the row/column.

形状的返回类型为元组，在上面的示例中，元组的^第 0 ^个索引(4)表示行数，元组的^第 1个索引(3)表示列数，因此，将给定的axis值指定为删除行/列时为0或1。

Selecting rows in a dataFrame

在dataFrame中选择行

print(df)
'''
Output:
          Y         Z       new
A  0.907969  0.503826  1.536102
B -0.848077  0.605965 -1.167395
C  0.528813 -0.589001  1.268936
D -0.933237  0.955057 -1.692109
'''
# here the argument is the location based index
print(df.loc['B'])
'''
Output:
Y     -0.848077
Z      0.605965
new   -1.167395
Name: B, dtype: float64
'''

# here the argument is the numerical based index of the row
print(df.iloc[1] )
'''
Output:
Y     -0.848077
Z      0.605965
new   -1.167395
Name: B, dtype: float64
'''

Selecting subsets of rows and columns

选择行和列的子集

print(df)
'''
Output:
          Y         Z       new
A  0.907969  0.503826  1.536102
B -0.848077  0.605965 -1.167395
C  0.528813 -0.589001  1.268936
D -0.933237  0.955057 -1.692109
'''
# row, column
print(df.loc['C','Y'])
'''
Output: 0.5288134940893595
'''
# pass the list of rows and columns to get the subsets
print(df.loc[['B','C'],['Y','Z']])
'''
Output:
          Y         Z
B -0.848077  0.605965
C  0.528813 -0.589001
'''

翻译自: https://www.includehelp.com/python/pandas-dataframe-in-python.aspx

cumtv80668

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
Python中的Pandas DataFrame

Python | 熊猫数据框 (Python | Pandas DataFrame)A DataFrame is a two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). Arithmetic operations ...
复制链接

扫一扫