机器学习NumPy工具pandas

当你拿到了很多数据后(它可以是图片、音乐、文字、安全的IP地址、日志、视频、地图、地震、医疗方面病症的数据以及一些其他形式存在的数据。),pandas提供很多接口,统一的将其转化为我object。然后你再对object进行操作,比如里面的表格转化成矩阵或其他形式,接下来进行更复杂的操作。
在这里插入图片描述
导入库:

import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

Creating a Series by passing a list of values ,letting pandas create a default integer index:

s = pd Series([1,3,4,np,nan,6,8])

s

0 1.0

1 3.0

2 5.0

3 NaN

4 6.0

5 8.0

dtype: float64

Creating a DataFrame by passing a numpy array ,with a datetime index and labeled columns:

dates = pd.date.range(‘20130101,periods=6’)

print(“datas:\n{}”.format(dates))

df = pd.DataFrame(np.random.randn(6,4),index=dates, columns=list(‘ABCD’)

df

datas:

DatetimeIndex([‘2013-01-01’,‘2013-01-02’,‘2013-01-03’‘2013-01-04’,‘2013-01-05’,‘2013-01-06’]

                      dtype=‘datetime64[ns]’,freq=‘D’)

在这里插入图片描述

然后你就可以设置一些参数来显示其他的东西。

df2 = pd.DataFrame({‘A’: 1,

                                  ‘B’ : pd.Timestamp(‘20130102’),

                                  ‘C’: pd.Series(1,index=list(range(4)),dtype=‘float32’),

                                  ‘D’:np.array([3]*4,dtype=‘int32’)

                                  ‘E’ :pd.Categorical(“test”,"train”,”test”,”train”]),

                                  ‘F’ :’foo’ })

df2

通过标签来进行数据提取

Operating with objects that have different dimensionality and need alignment.In addition, pandas automatically broadcasts along the specified dimension.

s = pd.Series([1,3,5,np,nan,6,8], index=dates),shift(2)

s

2013-01-01 NaN

2013-01-02 NaN

2013-01-03 1.0

2013-01-04 3.0

2013-01-05 5.0

2013-01-06 NaN

Freq : D,dtype: float64

df.sub(s, axis=‘index’)

在这里插入图片描述
在这里插入图片描述
Grouping

By “group by” we are referring to a process involving >

plitting the data into groups based >

Applying a function to each group independently

Comblining the result into a data structure

df = pd.DataFrame([‘A’ : [‘foo’, ‘bar’, ‘foo’, ‘bar’,

                                        ‘foo’, ‘bar’, ‘foo’, ‘foo’],

                               ‘B’ : [‘one’, ‘oner’, ‘two’, ‘three’,  

                                        ‘two’, ‘two’, ‘one’, ‘three’],

                               ‘C’ :  np.random.randn(8),

                               ‘D’ :  np.random.randn(8),

df

在这里插入图片描述

Categoricals

Since version 0.15, pandas can include categorical data in a DataFrame.

df = pd.DataFrame(“id”:[1,2,3,4,5,6],”raw_grade”:[‘a’, ‘b’, ‘b’, ’a’, ’a’, ‘e’])

Grouping

By “group by” we are referring to a process involving >

plitting the data into groups based >

Applying a function to each group independently

Comblining the result into a data structure

df = pd.DataFrame([‘A’ : [‘foo’, ‘bar’, ‘foo’, ‘bar’,

                                        ‘foo’, ‘bar’, ‘foo’, ‘foo’],

                               ‘B’ : [‘one’, ‘oner’, ‘two’, ‘three’,  

                                        ‘two’, ‘two’, ‘one’, ‘three’],

                               ‘C’ :  np.random.randn(8),

                               ‘D’ :  np.random.randn(8),

df

在这里插入图片描述

Categoricals

Since version 0.15, pandas can include categorical data in a DataFrame.

df = pd.DataFrame(“id”:[1,2,3,4,5,6],”raw_grade”:[‘a’, ‘b’, ‘b’, ’a’, ’a’, ‘e’])

在这里插入图片描述

Convert the raw grades to a categorical data type.

df[“grade”] = df[“raw_grade”].astype(“category”)

df[“grade”]

Rename the categories to more meaningful names (assigning to Series.cat.categories is inplace)

df[“grade”].cat.categories = [“very good”, “good”,”very bad”]

Reorder the categories and simultaneously add the missing categories (methods under Series.cat return a new Series per default ).

df[“grade”] = df[“grade”.cat.set_categories([“very good”, “bad”,”medium”,“good”,”very bad”])

df[“grade”]

Grouping by a categorical column shows also empty categories.

df.groupby(“grade”).size()

grade

very bad 1

bad 0

medium 0

good 2

very good 3

dtype: int64

Plotting

Plotting docs

ts = pd.Series(np.random.randn(1000), index=pd.date_range(‘1/1/2000’, perios = 1000))

ts = ts.cumsum()

plt.figure(); ts.plot()

plt.show()

On DataFrame, plot() is a convenience to plot all of the columns with labels:

df = pd.DataFrame(np.random.randn(1000, 4),index=ts.index,

                             columns =[‘A’,’B’,’C’,’D’]) 

df = df.cumsum()

plt.figure():df.plot() : plt.legend(loc=‘best’)

plt.show()​​​​

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值