pandas _use_practice

import pandas as pd
from pandas import Series, DataFrame
import numpy as np

pandas基本数据结构:

  • Series数据结构:它是一种类似于一维数组的对象,由一组数据和与之对应的索引组成
  • DataFrame数据结构:它是一种表格型的数据结构,它含有一组有序的列,每列可以是不同的值类型(数字,布尔值,字符串等)

Series数据结构

  • 对象的创建
  • 查看对象的值,和索引的值
  • 通过对象索引访问单个值或者一组值
  • isnull()和 notnull()函数判断对象缺失值
  • 对象和对象索引的name属性
  • 对象索引的赋值修改
obj = pd.Series([2,3,4,5,6])
obj
0    2
1    3
2    4
3    5
4    6
dtype: int64
obj.values
array([2, 3, 4, 5, 6], dtype=int64)
obj.index.values
array([0, 1, 2, 3, 4], dtype=int64)
obj2 =  Series  ([  4 ,  7 , - 5 ,  3 ],  index= [  'd' ,  'b' ,  'a' ,  'c'])
obj2
d    4
b    7
a   -5
c    3
dtype: int64
obj2.index
Index(['d', 'b', 'a', 'c'], dtype='object')

通过索引访问对象的单个值,或者一组值

obj2['a']
-5
obj2[['a','b']]
a   -5
b    7
dtype: int64
obj2 > 3
d     True
b     True
a    False
c    False
dtype: bool
obj2[obj2>3]
d    4
b    7
dtype: int64
# 用字典创建Series对象
dic = { 'Ohio':  35000 ,  'Texas':  71000 ,  'Oregon':  16000 ,  'Utah':  5000}
obj3 = Series(dic)
obj3
Ohio      35000
Texas     71000
Oregon    16000
Utah       5000
dtype: int64
idx = ['China','Ohio','Texas','Oregon']
obj4 = Series(dic, index = idx)
obj4
China         NaN
Ohio      35000.0
Texas     71000.0
Oregon    16000.0
dtype: float64

pandas 中的 isnull() 和 notnull() 函数用于检测确实值数据

pd.isnull(obj4)
China      True
Ohio      False
Texas     False
Oregon    False
dtype: bool
pd.isnull(obj4).sum()
1
# i=0
# if (pd.isnull(obj4)):
#     i+=1
#     print(i)
obj4.isnull().count()
4
obj4.isnull().sum()
1

Series对象及索引的name属性

obj4.name = 'Ilove'
obj4
China         NaN
Ohio      35000.0
Texas     71000.0
Oregon    16000.0
Name: Ilove, dtype: float64
obj4.index.name = 'sheng'
# obj4.values.name = 'P'
---------------------------------------------------------------------------

AttributeError                            Traceback (most recent call last)

<ipython-input-36-e077b7c726ad> in <module>()
      1 obj4.index.name = 'sheng'
----> 2 obj4.values.name = 'P'


AttributeError: 'numpy.ndarray' object has no attribute 'name'
obj4
sheng
China         NaN
Ohio      35000.0
Texas     71000.0
Oregon    16000.0
Name: Ilove, dtype: float64
# Series对象索引可以赋值修改
obj4.index = ['a','b','c','d']
obj4
a        NaN
b    35000.0
c    71000.0
d    16000.0
Name: Ilove, dtype: float64

DataFrame数据结构

  • 创建方式:传入等长列表或者Numpy数组组成的字典, 传入嵌套字典
  • 访问DataFrame的数据结构的列,返回一个 Series对象
  • 获取DataFrame对象的行,访问行 (iloc[],loc[])
  • 列可以通过赋值的方式进行修改,但是长度需要和DataFrame匹配,
  • 创建新的列,为不存在的列赋值会创建新的列,关键字 del 用于删除列
data = {'state': ['Ohio','Ohio','Ohio','Nevada','Nevada'],
       'year': [2000, 2001, 2002, 2001, 2002],
       'pop':[1.5, 1.7, 3.6, 2.4, 2.9]}
frame = DataFrame(data)

# DataFrame(data, index, columns,dtype,copy):参数
frame
stateyearpop
0Ohio20001.5
1Ohio20011.7
2Ohio20023.6
3Nevada20012.4
4Nevada20022.9
# 终点序列顺序
frame = DataFrame(data, columns = ['pop', 'state','year'])
frame
popstateyear
01.5Ohio2000
11.7Ohio2001
23.6Ohio2002
32.4Nevada2001
42.9Nevada2002
frame2 = DataFrame(data, index = ['one','two','thress','four','five'], columns = ['year','state','pop','dept'])
# 如果传入的列在数据集里找不到,补缺失值
frame2
yearstatepopdept
one2000Ohio1.5NaN
two2001Ohio1.7NaN
thress2002Ohio3.6NaN
four2001Nevada2.4NaN
five2002Nevada2.9NaN
frame2.index.name='Digit'
frame2
yearstatepopdept
Digit
one2000Ohio1.5NaN
two2001Ohio1.7NaN
thress2002Ohio3.6NaN
four2001Nevada2.4NaN
five2002Nevada2.9NaN

访问DataFrame的数据结构的列,返回一个 Series对象,

frame2.year
Digit
one       2000
two       2001
thress    2002
four      2001
five      2002
Name: year, dtype: int64
frame2['year']
Digit
one       2000
two       2001
thress    2002
four      2001
five      2002
Name: year, dtype: int64
# frame2.index = ['O','T','H','F','V']
# # 通过赋值的方式改变行索引
# frame2.columns = ['Y','S','P','D']
# # 通过赋值的方式改变列索引
frame2
yearstatepopdept
Digit
one2000Ohio1.5NaN
two2001Ohio1.7NaN
thress2002Ohio3.6NaN
four2001Nevada2.4NaN
five2002Nevada2.9NaN

获取DataFrame对象的行,访问行

frame2.ix['thress']
D:\anacoda\lib\site-packages\ipykernel_launcher.py:1: DeprecationWarning: 
.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#ix-indexer-is-deprecated
  """Entry point for launching an IPython kernel.





year     2002
state    Ohio
pop       3.6
dept      NaN
Name: thress, dtype: object
frame2.loc['two',:]
year     2001
state    Ohio
pop       1.7
dept      NaN
Name: two, dtype: object
frame2.iloc[2]
year     2002
state    Ohio
pop       3.6
dept      NaN
Name: thress, dtype: object

列可以通过赋值的方式进行修改,但是长度需要和DataFrame匹配,

frame2['dept'] = 15
frame2
yearstatepopdept
Digit
one2000Ohio1.515
two2001Ohio1.715
thress2002Ohio3.615
four2001Nevada2.415
five2002Nevada2.915
frame2.dept = np.arange(5.)
frame2
yearstatepopdept
Digit
one2000Ohio1.50.0
two2001Ohio1.71.0
thress2002Ohio3.62.0
four2001Nevada2.43.0
five2002Nevada2.94.0
# 如果赋值是一个Series对象,则会精确匹配到DataFrame对象的索引,所有空位都会补全缺失值
val = Series([-1.2, -1.5, -1.7, -1.3], index = ['two','thress','five','T'])
frame2.dept = val
frame2
yearstatepopdept
Digit
one2000Ohio1.5NaN
two2001Ohio1.7-1.2
thress2002Ohio3.6-1.5
four2001Nevada2.4NaN
five2002Nevada2.9-1.7

创建新的列,为不存在的列赋值会创建新的列,关键字 del 用于删除列

frame2['es'] = frame2.state == 'Ohio'
frame2
yearstatepopdeptes
Digit
one2000Ohio1.5NaNTrue
two2001Ohio1.7-1.2True
thress2002Ohio3.6-1.5True
four2001Nevada2.4NaNFalse
five2002Nevada2.9-1.7False
del frame2['es']
frame2
yearstatepopdept
Digit
one2000Ohio1.5NaN
two2001Ohio1.7-1.2
thress2002Ohio3.6-1.5
four2001Nevada2.4NaN
five2002Nevada2.9-1.7

通过嵌套字典创建 DataFrame对象: 外层键为列索引,内层键为行索引

pop = {'Nevada':{2001:2.4, 2002:2.9},
      'Ohio':{2000:1.2, 2001:1.7, 2002:3.6}}
frame3 = DataFrame(pop)
frame3
NevadaOhio
2000NaN1.2
20012.41.7
20022.93.6
frame3.T # 转置
200020012002
NevadaNaN2.42.9
Ohio1.21.73.6

DataFrame对象行索引,列索引的 name属性,values属性(返回二维数组形式),

frame3.index.name = 'GJ'
frame3.columns.name = 'year'
frame3
yearNevadaOhio
GJ
2000NaN1.2
20012.41.7
20022.93.6
frame3.values
array([[nan, 1.2],
       [2.4, 1.7],
       [2.9, 3.6]])
frame3.index
Int64Index([2000, 2001, 2002], dtype='int64', name='GJ')
frame3.columns
Index(['Nevada', 'Ohio'], dtype='object', name='year')
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值