python机器学入门到精通(二)

最新推荐文章于 2024-04-26 16:24:02 发布

Charlie。

最新推荐文章于 2024-04-26 16:24:02 发布

阅读量1.9k

点赞数 3

本文链接：https://blog.csdn.net/caoyu1221/article/details/80684821

版权

讲到机器学习，是离不开各种工具库的，特别是pandas。今天我们就来了解一下pandas的一些基础知识。

【本篇所有代码均在jupyter上面运行】

初识pandas

pandas是python analysis data library 或pandas是基于NumPy的一种工具，该工具是为了解决数据分析任务而创建的。 pandas纳入了大量库和一些标准的数据类型，提供了高效的操作大型数据集所需的工具。 pandas提供了大量能是我们快速边界地处理数据的函数和方法，它使python成为强大而高效的数据分析环境的重要因素之一。
数据分析有三剑客，即三个模块：

# 数据分析有三剑客，三个模块

import numpy as np

import pandas as pd
from pandas import Series,DataFrame

# 前两个属于数据分析，展示数据，画图，一图顶千言
import matplotlib.pyplot as plt
# 如果大家用的自己的ubuntu或者用的windows系统尽心数据分心，使用plt.imshow(显示图片，图片没有出来)

1. Series

Series是一种类似与一维数组的对象，由下面两个部分组成：

values：一组数据（ndarray类型）
index：相关的数据索引标签
1）Series的创建
两种创建方式：
(1) 由列表或numpy数组创建
默认索引为0到N-1的整数型索引

nd = np.random.randint(0,150,size=10)
nd
Series(nd)
Out[]:
0    107
1     28
2     81
3     11
4    148
5     68
6     44
7     69
8    131
9     88
dtype: int64

或者

# string 类型在Series中也会显示成object
l = list('qwertyuiop')
s = Series(l)
s
Out[]:
0    q
1    w
2    e
3    r
4    t
5    y
6    u
7    i
8    o
9    p
dtype: object

通过设置index参数指定索引

# mysql中有两种索引，语言中一般也有两种索引，比如dict 枚举型(数字)，关联索引('字符串')
l = [1,2,3,4,5]
s = Series(l,index=list('abcde'))
s
Out[]:
a    1
b    2
c    3
d    4
e    5
dtype: int64

name参数

# name比较类似于表名
# Series用于创建一维数据
l = [1,2,3]
s1 = Series(np.random.randint(0,150,size=8), index=list('abcdefgh'), name='python')
s2 = Series(np.random.randint(0,150,size=8), index=list('abcdefgh'), name='数学')
s3 = Series(np.random.randint(0,150,size=8), index=list('abcdefgh'), name='语文')
display(s1,s2,s3)
Out[]:
a     11
b     28
c     15
d     64
e    126
f     75
g    103
h     86
Name: python, dtype: int64
a    112
b    120
c     14
d     95
e     66
f     48
g     49
h     87
Name: 数学, dtype: int64
a     48
b      1
c     22
d    114
e    121
f    147
g     64
h    120
Name: 语文, dtype: int64

# copy属性
# Series是引用ndarray或列表
nd = np.ones((10))
s = Series(nd,copy=True)
s
Out[]:
0    1.0
1    1.0
2    1.0
3    1.0
4    1.0
5    1.0
6    1.0
7    1.0
8    1.0
9    1.0
dtype: float64

特别地，由ndarray创建的是引用，而不是副本。对Series元素的改变也会改变原来的ndarray对象中的元素。（列表没有这种情况）

(2) 由字典创建

# 字典的方式在实际的应用中比较适合Series
# 我们在教学中，我为了方便会使用ndarray
s=Series({
  'a':1, 'b':2, 'c':3})
s
Out[]:
a    1
b    2
c    3
dtype: int64

2）Series的索引和切片
可以使用中括号取单个索引（此时返回的是元素类型），或者中括号里一个列表取多个索引（此时返回的仍然是一个Series类型）。分为显示索引和隐式索引：

(1) 显式索引：

使用index中的元素作为索引值
使用.loc[]（推荐）
可以理解为pandas是ndarray的升级版,但是Series也可是dict的升级版
注意，此时是闭区间

s2
Out[]:
a    121
b

最低0.47元/天解锁文章

Charlie。

关注

3
点赞
踩
15

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录