熊猫Pivot_table（）– DataFrame数据分析

最新推荐文章于 2024-01-08 21:46:22 发布

cunchi4221

最新推荐文章于 2024-01-08 21:46:22 发布

阅读量1.2k

点赞数

文章标签： python java 机器学习数据分析大数据

原文链接：https://www.journaldev.com/33521/pandas-pivot_table-dataframe-data-analysis

版权

什么是数据透视表？ (What is a Pivot Table?)

A pivot table is a table of statistics that summarizes the data of a more extensive table. The summary of data is reached through various aggregate functions – sum, average, min, max, etc.

数据透视表是统计表，汇总了更广泛的表的数据。数据汇总可通过各种汇总函数获得-总和，平均值，最小值，最大值等。

A pivot table is a data processing technique to derive useful information from a table.

数据透视表是一种从表中获取有用信息的数据处理技术。

熊猫Pivot_table（）函数 (Pandas pivot_table() function)

Pandas pivot_table() function is used to create pivot table from a DataFrame object. We can generate useful information from the DataFrame rows and columns. The pivot_table() function syntax is:

熊猫的pivot_table（）函数用于从DataFrame对象创建数据透视表。我们可以从DataFrame的行和列中生成有用的信息。 ivot_table（）函数的语法为：

def pivot_table(
    data,
    values=None,
    index=None,
    columns=None,
    aggfunc="mean",
    fill_value=None,
    margins=False,
    dropna=True,
    margins_name="All",
    observed=False,
)

data: the DataFrame instance from which pivot table is created.
data ：从中创建数据透视表的DataFrame实例。
values: column to aggregate.
values ：要汇总的列。
index: the column to group by on the pivot table index.
index ：要在数据透视表索引上分组的列。
columns: the column to group by on the pivot table column.
column ：要在数据透视表列上分组的列。
aggfunc: the aggregate function to run on the data, default is numpy.mean
aggfunc ：在数据上运行的聚合函数，默认为numpy.mean
fill_value: value to replace null or missing value in the pivot table.
fill_value ：替换数据透视表中空值或缺失值的值。
margins: add all rows/columns. It’s useful in generating grand total of the records.
边距：添加所有行/列。在生成总计记录时很有用。
dropna: don’t include columns whose entries are all NaN.
dropna ：不包括所有条目均为NaN的列。
margins_name: Name of the row / column that will contain the totals when margins is True.
margins_name ：margins为True时将包含总计的行/列的名称。
observed: This only applies if any of the groupers are Categoricals. If True: only show observed values for categorical groupers. If False: show all values for categorical groupers.
观察到的 ：仅当任何石斑鱼是分类者时才适用。如果为True：仅显示分类石斑鱼的观测值。如果为False：显示分类石斑鱼的所有值。

熊猫数据透视表示例 (Pandas Pivot Table Examples)

It’s better to use real-life data to understand the actual benefit of pivot tables. I have downloaded a sample CSV file from this link. Here is the direct download link for the CSV file.

最好使用实际数据来了解数据透视表的实际好处。我已经从此链接下载了示例CSV文件。这是CSV文件的直接下载链接。

The CSV file is a listing of 1,460 company funding records reported by TechCrunch. The below image shows the sample data from the file.

CSV文件列出了TechCrunch报告的1,460家公司资金记录。下图显示了文件中的样本数据。

We are interested in the columns – ‘company’, ‘city’, ‘state’, ‘raisedAmt’, and ’round’. Let’s create some pivot tables to generate useful statistics from this data.

我们对“公司”，“城市”，“州”，“ raisedAmt”和“回合”列感兴趣。让我们创建一些数据透视表，以根据这些数据生成有用的统计信息。

1.简单数据透视表示例 (1. Simple Pivot Table Example)

Let’s try to create a pivot table for average funding by the state.

让我们尝试创建一个枢纽分析表，以了解各州的平均资助情况。

import pandas as pd
import numpy as np

df = pd.read_csv('TechCrunchcontinentalUSA.csv', usecols=['company', 'city', 'state', 'raisedAmt', 'round'])

print('DataFrame Records:\n', df.head(6))

# average funding by State
df1 = pd.pivot_table(df, values='raisedAmt', columns='state')

print('\nAverage Funding by State:\n', df1)

Output:

输出：

DataFrame Records:
         company        city state  raisedAmt round
0      LifeLock       Tempe    AZ    6850000     b
1      LifeLock       Tempe    AZ    6000000     a
2      LifeLock       Tempe    AZ   25000000     c
3   MyCityFaces  Scottsdale    AZ      50000  seed
4      Flypaper     Phoenix    AZ    3000000     a
5  Infusionsoft     Gilbert    AZ    9000000     a

Average Funding by State:
 state             AZ            CA  ...            VA            WA
raisedAmt  5613750.0  1.072324e+07  ...  1.158261e+07  8.140103e+06

[1 rows x 33 columns]

We can also call pivot_table() function directly on the DataFrame object. The above pivot table can be generated using the below code snippet too.

我们还可以直接在DataFrame对象上调用ivot_table（）函数。上面的数据透视表也可以使用下面的代码段生成。

df1 = df.pivot_table(values='raisedAmt', columns='state')

2.具有“同意”功能的数据透视表 (2. Pivot Table with Agreegate Function)

The default aggregate function is numpy.mean. We can specify the aggregate function as numpy.sum to generate the total funding by the state.

默认的聚合函数是numpy.mean 。我们可以将合计函数指定为numpy.sum来生成州的总资金。

df1 = pd.pivot_table(df, values='raisedAmt', columns='state', aggfunc=np.sum)

print('\nTotal Funding by State:\n', df1)

Output:

输出：

Total Funding by State:
 state            AZ          CA         CO  ...         UT         VA         WA
raisedAmt  50523750  9361385000  126470000  ...  153080000  266400000  789590000

[1 rows x 33 columns]

3.公司总注资 (3. Total Funding by Company)

df1 = pd.pivot_table(df, values='raisedAmt', columns='company', aggfunc=np.sum)

print('\nTotal Funding by Company:\n', df1)

Output:

输出：

Total Funding by Company:
 company    23andMe     3Jam  4HomeMedia  ...    vbs tv       x+1    xkoto
raisedAmt  9000000  4000000     2850000  ...  10000000  16000000  7500000

[1 rows x 909 columns]

4.在数据透视表中设置索引列 (4. Setting Index Column in the Pivot Table)

Let’s try to create a pivot table for the average funding by round grouped by the state. The trick is to generate a pivot table with ’round’ as the index column.

让我们尝试创建一个按州分组的平均融资的数据透视表。诀窍是生成一个以“ round”作为索引列的数据透视表。

df1 = pd.pivot_table(df, values='raisedAmt', columns='state', index='round')
print('\nAverage Funding by round in State:\n', df1)

Output:

输出：

Average Funding by round in State:
 state                   AZ            CA  ...          VA            WA
round                                     ...                          
a             6.000000e+06  7.158314e+06  ...   9910000.0  6.570476e+06
angel         2.337500e+05  1.006784e+06  ...         NaN  8.935714e+05
b             6.850000e+06  1.238483e+07  ...   9850000.0  1.187826e+07
c             2.500000e+07  2.369708e+07  ...  19500000.0  1.592222e+07
d                      NaN  3.012188e+07  ...  20000000.0  8.500000e+06
debt_round             NaN  1.660833e+07  ...         NaN           NaN
e                      NaN  3.132500e+07  ...         NaN  2.200000e+07
seed          1.466667e+05  8.778214e+05  ...    350000.0  7.800000e+05
unattributed           NaN  1.933000e+07  ...         NaN  2.050000e+07

[9 rows x 33 columns]

5.用默认值替换空值 (5. Replacing Null Values with a default value)

df1 = pd.pivot_table(df, values='raisedAmt', columns='state', index='round', aggfunc=np.sum, fill_value=0)
print('\nTotal Funding by round in State:\n', df1)

Output:

输出：

Total Funding by round in State:
 state               AZ          CA        CO  ...        UT        VA         WA
round                                         ...                               
a             18000000  2526885000  25650000  ...  31800000  99100000  275960000
angel           233750    74502000   3950000  ...         0         0   12510000
b              6850000  2898050000  66900000  ...  67200000  68950000  273200000
c             25000000  2109040000  28850000  ...  54000000  78000000  143300000
d                    0   963900000         0  ...         0  20000000   17000000
debt_round           0   199300000    500000  ...         0         0          0
e                    0   250600000         0  ...         0         0   44000000
seed            440000    49158000    620000  ...     80000    350000    3120000
unattributed         0   289950000         0  ...         0         0   20500000

[9 rows x 33 columns]

5.多个索引列数据透视表示例 (5. Multiple Index Columns Pivot Table Example)

Let’s look at a more complex example. We will create a pivot table of total funding per company per round, state wise.

让我们看一个更复杂的例子。我们将在州级明智的情况下创建每个公司每轮总资金的数据透视表。

df1 = pd.pivot_table(df, values='raisedAmt', columns='state', index=['company', 'round'], aggfunc=np.sum, fill_value=0)
print('\nTotal Funding by company and round in State:\n', df1)

Output:

输出：

Total Funding by round in State:
 state             AZ       CA  CO  CT  DC  FL  GA  ...  PA  RI  TN  TX  UT  VA  WA
company    round                                   ...                            
23andMe    a       0  9000000   0   0   0   0   0  ...   0   0   0   0   0   0   0
3Jam       a       0  4000000   0   0   0   0   0  ...   0   0   0   0   0   0   0
4HomeMedia a       0  2850000   0   0   0   0   0  ...   0   0   0   0   0   0   0
5min       a       0        0   0   0   0   0   0  ...   0   0   0   0   0   0   0
           angel   0        0   0   0   0   0   0  ...   0   0   0   0   0   0   0
...               ..      ...  ..  ..  ..  ..  ..  ...  ..  ..  ..  ..  ..  ..  ..
uber       b       0  7600000   0   0   0   0   0  ...   0   0   0   0   0   0   0
utoopia    seed    0        0   0   0   0   0   0  ...   0   0   0   0   0   0   0
vbs tv     seed    0        0   0   0   0   0   0  ...   0   0   0   0   0   0   0
x+1        a       0        0   0   0   0   0   0  ...   0   0   0   0   0   0   0
xkoto      b       0        0   0   0   0   0   0  ...   0   0   0   0   0   0   0

[1405 rows x 33 columns]

参考资料 (References)

翻译自: https://www.journaldev.com/33521/pandas-pivot_table-dataframe-data-analysis

cunchi4221

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
熊猫Pivot_table（）– DataFrame数据分析

什么是数据透视表？ (What is a Pivot Table?)A pivot table is a table of statistics that summarizes the data of a more extensive table. The summary of data is reached through various aggregate functions – sum,...
复制链接

扫一扫