import numpy as np
import pandas as pd
### series
obj=pd.Series([4 ,5 ,6 ,7 ])
obj
0 4 1 5 2 6 3 7 dtype: int64
obj.index
RangeIndex(start=0, stop=4, step=1)
obj.values
array([4, 5, 6, 7], dtype=int64)
obj.index.name='number'
obj
number 0 4 1 5 2 6 3 7 dtype: int64 series看成一个定长有序字典,由索引到值的映射,values返回的是array;不同的series可以通过索引关联,未匹配到的以NaN表示 ### dataframe
data=np.empty((3 ,4 ))
for i in range(data.shape[0 ]):
data[i]=i+1
frame=pd.DataFrame(data,columns=['col_1' ,'col_2' ,'col_3' ,'col_4' ],index=['one' ,'two' ,'three' ])
frame
col_1 col_2 col_3 col_4 one 1.0 1.0 1.0 1.0 two 2.0 2.0 2.0 2.0 three 3.0 3.0 3.0 3.0
frame.columns
Index([‘col_1’, ‘col_2’, ‘col_3’, ‘col_4’], dtype=’object’)
frame.index
Index([‘one’, ‘two’, ‘three’], dtype=’object’)
frame.index.name='num'
frame
col_1 col_2 col_3 col_4 num one 1.0 1.0 1.0 1.0 two 2.0 2.0 2.0 2.0 three 3.0 3.0 3.0 3.0
frame['col_1' ]
one 1.0 two 2.0 three 3.0 Name: col_1, dtype: float64
frame.loc['one' ]
col_1 1.0 col_2 1.0 col_3 1.0 col_4 1.0 Name: one, dtype: float64
frame['col_5' ]=0
frame
col_1 col_2 col_3 col_4 col_5 num one 1.0 1.0 1.0 1.0 0 two 2.0 2.0 2.0 2.0 0 three 3.0 3.0 3.0 3.0 0
del frame['col_4' ]
frame
col_1 col_2 col_3 col_5 num one 1.0 1.0 1.0 0 two 2.0 2.0 2.0 0 three 3.0 3.0 3.0 0
frame.values
array([[ 1., 1., 1., 0.], [ 2., 2., 2., 0.], [ 3., 3., 3., 0.]]) 1.dataframe表格结构,有行索引和列索引,且面向行列的操作基本是平衡的,一般是二维结构,但可以表示高维数据(层次化索引) 2.通过单个索引返回的是series,并且是相应数据的视图,如需复制需显示copy; 3.dataframe.values 返回与数据对应的n维array
frame.index[0 :2 ]
Index([‘one’, ‘two’], dtype=’object’, name=’num’)
frame.index[2 ]=3
————————————————————————— TypeError Traceback (most recent call last) in () —-> 1 frame.index[2]=3 C:\Users\lz\Anaconda3\lib\site-packages\pandas\indexes\base.py in __setitem__(self, key, value) 1402 1403 def __setitem__(self, key, value): -> 1404 raise TypeError(“Index does not support mutable operations”) 1405 1406 def __getitem__(self, key): TypeError: Index does not support mutable operations
frame
col_1 col_2 col_3 col_5 num one 1.0 1.0 1.0 0 two 2.0 2.0 2.0 0 three 3.0 3.0 3.0 0
frame.reindex(['two' ,'one' ,'four' ],fill_value=0 )
col_1 col_2 col_3 col_5 num two 2.0 2.0 2.0 0 one 1.0 1.0 1.0 0 four 0.0 0.0 0.0 0
reindex返回新的对象(copy参数为True),按照指定索引重新排序,如果未匹配到索引则na(fill_values可以将na填充指定值)