机器学习NumPy工具pandas

最新推荐文章于 2023-11-09 10:05:11 发布

牛油果2023

最新推荐文章于 2023-11-09 10:05:11 发布

阅读量287

点赞数

分类专栏：机器学习与网络安全

本文链接：https://blog.csdn.net/anquanniu/article/details/103737612

版权

机器学习与网络安全专栏收录该内容

7 篇文章 5 订阅

订阅专栏

当你拿到了很多数据后（它可以是图片、音乐、文字、安全的IP地址、日志、视频、地图、地震、医疗方面病症的数据以及一些其他形式存在的数据。），pandas提供很多接口，统一的将其转化为我object。然后你再对object进行操作，比如里面的表格转化成矩阵或其他形式，接下来进行更复杂的操作。
在这里插入图片描述
导入库：

import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

Creating a Series by passing a list of values ,letting pandas create a default integer index:

s = pd Series([1,3,4,np,nan,6,8])

0 1.0

1 3.0

2 5.0

3 NaN

4 6.0

5 8.0

dtype: float64

Creating a DataFrame by passing a numpy array ,with a datetime index and labeled columns:

dates = pd.date.range(‘20130101,periods=6’)

print(“datas:\n{}”.format(dates))

df = pd.DataFrame(np.random.randn(6,4),index=dates, columns=list(‘ABCD’)

datas:

DatetimeIndex([‘2013-01-01’,‘2013-01-02’,‘2013-01-03’‘2013-01-04’,‘2013-01-05’,‘2013-01-06’]

                      dtype=‘datetime64[ns]’,freq=‘D’)

在这里插入图片描述

然后你就可以设置一些参数来显示其他的东西。

df2 = pd.DataFrame({‘A’: 1,

                                  ‘B’ : pd.Timestamp(‘20130102’),

                                  ‘C’: pd.Series(1,index=list(range(4)),dtype=‘float32’),

                                  ‘D’:np.array([3]*4,dtype=‘int32’)

                                  ‘E’ :pd.Categorical(“test”,"train”,”test”,”train”]),

                                  ‘F’ :’foo’ })

df2

通过标签来进行数据提取

Operating with objects that have different dimensionality and need alignment.In addition, pandas automatically broadcasts along the specified dimension.

s = pd.Series([1,3,5,np,nan,6,8], index=dates),shift(2)

2013-01-01 NaN

2013-01-02 NaN

2013-01-03 1.0

2013-01-04 3.0

2013-01-05 5.0

2013-01-06 NaN

Freq : D,dtype: float64

df.sub(s, axis=‘index’)

在这里插入图片描述

Grouping

By “group by” we are referring to a process involving >

plitting the data into groups based >

Applying a function to each group independently

Comblining the result into a data structure

df = pd.DataFrame([‘A’ : [‘foo’, ‘bar’, ‘foo’, ‘bar’,

                                        ‘foo’, ‘bar’, ‘foo’, ‘foo’],

                               ‘B’ : [‘one’, ‘oner’, ‘two’, ‘three’,  

                                        ‘two’, ‘two’, ‘one’, ‘three’],

                               ‘C’ :  np.random.randn(8),

                               ‘D’ :  np.random.randn(8),

在这里插入图片描述

Categoricals

Since version 0.15, pandas can include categorical data in a DataFrame.

df = pd.DataFrame(“id”:[1,2,3,4,5,6],”raw_grade”:[‘a’, ‘b’, ‘b’, ’a’, ’a’, ‘e’])

Grouping

By “group by” we are referring to a process involving >

plitting the data into groups based >

Applying a function to each group independently

Comblining the result into a data structure

df = pd.DataFrame([‘A’ : [‘foo’, ‘bar’, ‘foo’, ‘bar’,

                                        ‘foo’, ‘bar’, ‘foo’, ‘foo’],

                               ‘B’ : [‘one’, ‘oner’, ‘two’, ‘three’,  

                                        ‘two’, ‘two’, ‘one’, ‘three’],

                               ‘C’ :  np.random.randn(8),

                               ‘D’ :  np.random.randn(8),

在这里插入图片描述

Categoricals

Since version 0.15, pandas can include categorical data in a DataFrame.

df = pd.DataFrame(“id”:[1,2,3,4,5,6],”raw_grade”:[‘a’, ‘b’, ‘b’, ’a’, ’a’, ‘e’])

在这里插入图片描述

Convert the raw grades to a categorical data type.

df[“grade”] = df[“raw_grade”].astype(“category”)

df[“grade”]

Rename the categories to more meaningful names (assigning to Series.cat.categories is inplace)

df[“grade”].cat.categories = [“very good”, “good”,”very bad”]

Reorder the categories and simultaneously add the missing categories (methods under Series.cat return a new Series per default ).

df[“grade”] = df[“grade”.cat.set_categories([“very good”, “bad”,”medium”,“good”,”very bad”])

df[“grade”]

Grouping by a categorical column shows also empty categories.

df.groupby(“grade”).size()

grade

very bad 1

bad 0

medium 0

good 2

very good 3

dtype: int64

Plotting

Plotting docs

ts = pd.Series(np.random.randn(1000), index=pd.date_range(‘1/1/2000’, perios = 1000))

ts = ts.cumsum()

plt.figure(); ts.plot()

plt.show()

On DataFrame, plot() is a convenience to plot all of the columns with labels:

df = pd.DataFrame(np.random.randn(1000, 4),index=ts.index,

                             columns =[‘A’,’B’,’C’,’D’])

df = df.cumsum()

plt.figure():df.plot() : plt.legend(loc=‘best’)

plt.show()

牛油果2023

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
机器学习NumPy工具pandas

当你拿到了很多数据后（它可以是图片、音乐、文字、安全的IP地址、日志、视频、地图、地震、医疗方面病症的数据以及一些其他形式存在的数据。），pandas提供很多接口，统一的将其转化为我object。然后你再对object进行操作，比如里面的表格转化成矩阵或其他形式，接下来进行更复杂的操作。导入库：import pandas as pdimport numpy as npimport...
复制链接

扫一扫

专栏目录