数据分析Pandas（1）-基础数据结构Series、DataFrame

最新推荐文章于 2022-01-23 18:57:38 发布

二叉叔

最新推荐文章于 2022-01-23 18:57:38 发布

阅读量609

点赞数

分类专栏： Python数据分析文章标签： python 数据分析 pandas

本文链接：https://blog.csdn.net/qq_33360009/article/details/108514355

版权

Python数据分析专栏收录该内容

24 篇文章 1 订阅

订阅专栏

Pandas特点：字典形式，比Numpy更加简单。

Series ：左边索引，右边为值。在没有指定索引的情况下会自动创建一个0到N-1的整数型索引。

import pandas as pd
import numpy as np

a = pd.Series([1,2,3,np.nan,4,5])

print(s)
"""
0     1.0
1     2.0
2     3.0
3     NaN
4     4.0
5     5.0
"""

DataFrame :表格型的数据结构，它包含有一组有序的列，每列可以是不同的值类型（数值，字符串，布尔值等）,既有行索引也有列索引。

b = pd.date_range('20160101',periods=6) #20160101开始逐渐递增6次
c = pd.DataFrame(np.random.randn(6,4),index=b,columns=['A','B','C','D']) #随机生成六行四列，
#行索引为index,列索引为columns
print(c)
"""
                   A         B         C         D
2016-01-01 -0.253065 -2.071051 -0.640515  0.613663
2016-01-02 -1.147178  1.532470  0.989255 -0.499761
2016-01-03  1.221656 -2.390171  1.862914  0.778070
2016-01-04  1.473877 -0.046419  0.610046  0.204672
2016-01-05 -1.584752 -0.700592  1.487264 -1.778293
2016-01-06  0.633675 -1.414157 -0.277066 -0.442545
"""
print(c['C'])
"""                 C         
2016-01-01   -0.640515 
2016-01-02   0.989255 
2016-01-03   1.862914 
2016-01-04    0.610046  
2016-01-05   1.487264 
2016-01-06   -0.277066 
"""

d = pd.DataFrame(np.arange(12).reshape((3,4)))
print(d)

"""
   0  1   2   3
0  0  1   2   3
1  4  5   6   7
2  8  9  10  11
"""

e = pd.DataFrame({'A' : 1.,
                    'B' : pd.Timestamp('20130102'),
                    'C' : pd.Series(1,index=list(range(4)),dtype='float32'),
                    'D' : np.array([3] * 4,dtype='int32'),
                    'E' : pd.Categorical(["test","train","test","train"]),
                    'F' : 'foo'})
                    
print(e)

"""
     A          B    C  D      E    F
0  1.0 2013-01-02  1.0  3   test  foo
1  1.0 2013-01-02  1.0  3  train  foo
2  1.0 2013-01-02  1.0  3   test  foo
3  1.0 2013-01-02  1.0  3  train  foo
"""

print(e.dtypes)

"""
A           float64
B    datetime64[ns]
C           float32
D             int32
E          category
F            object
"""

print(e.index) #查看行索引
print(e.columns) #查看列索引
print(e.values) #仅查看值
e.describe() #整体数据统计描述
e.transpose() #翻转，等价于e.T
print(e.sort_index(axis=1, ascending=False)) #以列索引为基准降序排列，axis=0是以行索引为基准
print(e.sort_values(by='B')) #数据值排序