5-数值运算--数据分析

创建DataFrame格式数据,指定他的行索引名称和列索引名称

In [3]:
import pandas as pd
df = pd.DataFrame([[1,2,3],[4,5,6]],index=['a','b'],columns=['A','B','C'])
df
Out[3]:
 ABC
a123
b456

默认按列求和计算

In [4]:
df.sum()

Out[4]:
A    5
B    7
C    9
dtype: int64

按行求和计算

In [6]:
 
           
df.sum(axis=1)
Out[6]:
a     6
b    15
dtype: int64

根据指定的轴进行计算

In [7]:
 
           
df.sum(axis='columns')
Out[7]:
a     6
b    15
dtype: int64
In [8]:
 
           
df.mean()
Out[8]:
A    2.5
B    3.5
C    4.5
dtype: float64
In [9]:
 
           
df.mean(axis=1)
Out[9]:
a    2.0
b    5.0
dtype: float64
In [10]:
 
           
df.median()
Out[10]:
A    2.5
B    3.5
C    4.5
dtype: float64

二元统计

  • .cov():斜方差
In [11]:
df = pd.read_csv('C:/JupyterWork/data/titanic.csv')
df.head()
Out[11]:
 PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
0103Braund, Mr. Owen Harrismale22.010A/5 211717.2500NaNS
1211Cumings, Mrs. John Bradley (Florence Briggs Th...female38.010PC 1759971.2833C85C
2313Heikkinen, Miss. Lainafemale26.000STON/O2. 31012827.9250NaNS
3411Futrelle, Mrs. Jacques Heath (Lily May Peel)female35.01011380353.1000C123S
4503Allen, Mr. William Henrymale35.0003734508.0500NaNS
In [12]:
 
           
df.cov()
Out[12]:
 PassengerIdSurvivedPclassAgeSibSpParchFare
PassengerId66231.000000-0.626966-7.561798138.696504-16.325843-0.342697161.883369
Survived-0.6269660.236772-0.137703-0.551296-0.0189540.0320176.221787
Pclass-7.561798-0.1377030.699015-4.4960040.0765990.012429-22.830196
Age138.696504-0.551296-4.496004211.019125-4.163334-2.34419173.849030
SibSp-16.325843-0.0189540.076599-4.1633341.2160430.3687398.748734
Parch-0.3426970.0320170.012429-2.3441910.3687390.6497288.661052
Fare161.8833696.221787-22.83019673.8490308.7487348.6610522469.436846

corr():相关系数

In [13]:
 
           
df.corr()
Out[13]:
 PassengerIdSurvivedPclassAgeSibSpParchFare
PassengerId1.000000-0.005007-0.0351440.036847-0.057527-0.0016520.012658
Survived-0.0050071.000000-0.338481-0.077221-0.0353220.0816290.257307
Pclass-0.035144-0.3384811.000000-0.3692260.0830810.018443-0.549500
Age0.036847-0.077221-0.3692261.000000-0.308247-0.1891190.096067
SibSp-0.057527-0.0353220.083081-0.3082471.0000000.4148380.159651
Parch-0.0016520.0816290.018443-0.1891190.4148381.0000000.216225
Fare0.0126580.257307-0.5495000.0960670.1596510.2162251.000000

value_counts(): 统计指定列下各个数值出现的次数,默认降序排序

In [14]:
 
           
df['Age'].value_counts()
Out[14]:
24.00    30
22.00    27
18.00    26
19.00    25
30.00    25
28.00    25
21.00    24
25.00    23
36.00    22
29.00    20
32.00    18
27.00    18
35.00    18
26.00    18
16.00    17
31.00    17
20.00    15
33.00    15
23.00    15
34.00    15
39.00    14
17.00    13
42.00    13
40.00    13
45.00    12
38.00    11
50.00    10
2.00     10
4.00     10
47.00     9
         ..
71.00     2
59.00     2
63.00     2
0.83      2
30.50     2
70.00     2
57.00     2
0.75      2
13.00     2
10.00     2
64.00     2
40.50     2
32.50     2
45.50     2
20.50     1
24.50     1
0.67      1
14.50     1
0.92      1
74.00     1
34.50     1
80.00     1
12.00     1
36.50     1
53.00     1
55.50     1
70.50     1
66.00     1
23.50     1
0.42      1
Name: Age, Length: 88, dtype: int64

### value_counts(): 统计指定列下各个数值出现的次数,设置升序排序

In [15]:
 
           
df['Age'].value_counts(ascending = True)
Out[15]:
0.42      1
23.50     1
66.00     1
70.50     1
55.50     1
53.00     1
36.50     1
12.00     1
80.00     1
34.50     1
74.00     1
0.92      1
14.50     1
0.67      1
24.50     1
20.50     1
45.50     2
32.50     2
40.50     2
64.00     2
10.00     2
13.00     2
0.75      2
57.00     2
70.00     2
30.50     2
0.83      2
63.00     2
59.00     2
71.00     2
         ..
47.00     9
4.00     10
2.00     10
50.00    10
38.00    11
45.00    12
40.00    13
42.00    13
17.00    13
39.00    14
34.00    15
23.00    15
33.00    15
20.00    15
31.00    17
16.00    17
26.00    18
35.00    18
27.00    18
32.00    18
29.00    20
36.00    22
25.00    23
21.00    24
28.00    25
30.00    25
19.00    25
18.00    26
22.00    27
24.00    30
Name: Age, Length: 88, dtype: int64

计算一等舱,二等舱,三等舱分别有多少人

In [16]:
df['Pclass'].value_counts(ascending = True)
Out[16]:
2    184
1    216
3    491
Name: Pclass, dtype: int64

bins: 将数据按照指定的数值进行分组划分

In [19]:
df['Age'].value_counts(ascending = True,bins = 5)
Out[19]:
(64.084, 80.0]       11
(48.168, 64.084]     69
(0.339, 16.336]     100
(32.252, 48.168]    188
(16.336, 32.252]    346
Name: Age, dtype: int64
In [20]:
 
           
df['Age'].count()
Out[20]:
714

help() 显示某个命令使用方法

In [21]:
 
           
print(help(pd.value_counts))
Help on function value_counts in module pandas.core.algorithms:

value_counts(values, sort=True, ascending=False, normalize=False, bins=None, dropna=True)
    Compute a histogram of the counts of non-null values.
    
    Parameters
    ----------
    values : ndarray (1-d)
    sort : boolean, default True
        Sort by values
    ascending : boolean, default False
        Sort in ascending order
    normalize: boolean, default False
        If True then compute a relative histogram
    bins : integer, optional
        Rather than count values, group them into half-open bins,
        convenience for pd.cut, only works with numeric data
    dropna : boolean, default True
        Don't include counts of NaN
    
    Returns
    -------
    value_counts : Series

None
In [ ]:
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值