pandas 入门学习
Series 使用方法:
- Series定义
- Series 对象基本创建,值,索引如何访问
- Series 对象通过字典创建
- 判断Series对象是否存在缺失值
import pandas as pd
from pandas import Series, DataFrame
import numpy as np
Pandas 的 DataFrame 学习
- DataFrame 定义:它是一个表格型的数据结构,它含有一组有序的列,每列可以是不同的值类型(布尔,字符串,数值等)
- DataFrame可以看作是由 Series组成的字典它们共用一个索引
DataFrame 的创建:最常用的方法是传入一个等长列表或者Numpy数组组成的 字典
data = {'state':['ohic','ohic','ohic','nevada','nevada'],
'years':[2000,2001,2002,2003,2004],
'pop':[[1.5, 1.7,3.5, 2.4,1.9]}
frame = DataFrame(data)
frame
| state | years | pop |
---|
0 | ohic | 2000 | 1.5 |
---|
1 | ohic | 2001 | 1.7 |
---|
2 | ohic | 2002 | 3.5 |
---|
3 | nevada | 2003 | 2.4 |
---|
4 | nevada | 2004 | 1.9 |
---|
指定DataFrame列排列顺序
frame = DataFrame(data, columns = ['pop', 'state', 'years'])
frame
| pop | state | years |
---|
0 | 1.5 | ohic | 2000 |
---|
1 | 1.7 | ohic | 2001 |
---|
2 | 3.5 | ohic | 2002 |
---|
3 | 2.4 | nevada | 2003 |
---|
4 | 1.9 | nevada | 2004 |
---|
fm = DataFrame(data, index = ['one','two', 'thress','four', 'five'], columns=['pop','state','years'])
fm
| pop | state | years |
---|
one | 1.5 | ohic | 2000 |
---|
two | 1.7 | ohic | 2001 |
---|
thress | 3.5 | ohic | 2002 |
---|
four | 2.4 | nevada | 2003 |
---|
five | 1.9 | nevada | 2004 |
---|
fm.columns
Index(['pop', 'state', 'years'], dtype='object')
fm.index
Index(['one', 'two', 'thress', 'four', 'five'], dtype='object')
DataFrame 对于 列属性的处理:查看列的值,给某列赋值
fm['state']
one ohic
two ohic
thress ohic
four nevada
five nevada
Name: state, dtype: object
fm.state
one ohic
two ohic
thress ohic
four nevada
five nevada
Name: state, dtype: object
fm1 = DataFrame(data, index = ['one','two', 'thress','four', 'five'], columns=['pop','state','years','det'])
fm1
| pop | state | years | det |
---|
one | 1.5 | ohic | 2000 | NaN |
---|
two | 1.7 | ohic | 2001 | NaN |
---|
thress | 3.5 | ohic | 2002 | NaN |
---|
four | 2.4 | nevada | 2003 | NaN |
---|
five | 1.9 | nevada | 2004 | NaN |
---|
fm1.det = 19
fm1
| pop | state | years | det |
---|
one | 1.5 | ohic | 2000 | 19 |
---|
two | 1.7 | ohic | 2001 | 19 |
---|
thress | 3.5 | ohic | 2002 | 19 |
---|
four | 2.4 | nevada | 2003 | 19 |
---|
five | 1.9 | nevada | 2004 | 19 |
---|
v = [1,2,3,4,5]
fm1.det = v
fm1
| pop | state | years | det |
---|
one | 1.5 | ohic | 2000 | 1 |
---|
two | 1.7 | ohic | 2001 | 2 |
---|
thress | 3.5 | ohic | 2002 | 3 |
---|
four | 2.4 | nevada | 2003 | 4 |
---|
five | 1.9 | nevada | 2004 | 5 |
---|
fm1['east'] = fm1.state=='ohic'
fm1
| pop | state | years | det | east |
---|
one | 1.5 | ohic | 2000 | 1 | True |
---|
two | 1.7 | ohic | 2001 | 2 | True |
---|
thress | 3.5 | ohic | 2002 | 3 | True |
---|
four | 2.4 | nevada | 2003 | 4 | False |
---|
five | 1.9 | nevada | 2004 | 5 | False |
---|
del fm1['east']
fm1
| pop | state | years | det |
---|
one | 1.5 | ohic | 2000 | 1 |
---|
two | 1.7 | ohic | 2001 | 2 |
---|
thress | 3.5 | ohic | 2002 | 3 |
---|
four | 2.4 | nevada | 2003 | 4 |
---|
five | 1.9 | nevada | 2004 | 5 |
---|
fm1.columns
Index(['pop', 'state', 'years', 'det'], dtype='object')
data1 = {'a':{'x':1, 'y':2,'z':3},
'b':{'x':11,'y':12, 'z':13}}
fm2 = DataFrame(data1)
fm2
DataFrame的名称属性:name, values,
fm1.index.name = 'Uid'
fm1.columns.name = 'co_name'
fm1
co_name | pop | state | years | det |
---|
Uid | | | | |
---|
one | 1.5 | ohic | 2000 | 1 |
---|
two | 1.7 | ohic | 2001 | 2 |
---|
thress | 3.5 | ohic | 2002 | 3 |
---|
four | 2.4 | nevada | 2003 | 4 |
---|
five | 1.9 | nevada | 2004 | 5 |
---|
fm1.values
array([[1.5, 'ohic', 2000, 1],
[1.7, 'ohic', 2001, 2],
[3.5, 'ohic', 2002, 3],
[2.4, 'nevada', 2003, 4],
[1.9, 'nevada', 2004, 5]], dtype=object)