pandas、numpy、scipy常见用法

pandas、numpy、scipy常见用法

导入标准库
In [139]:
import matplotlib.pyplot as plt
%matplotlib notebook 
import seaborn as sns
sns.set(style='whitegrid', context='notebook')
sns.reset_orig()

import pandas as pd
import numpy as np
import scipy as sp
import scipy.io

一、pandas

1.DataFrame:表格型数据结构

In [111]:
df = pd.DataFrame(data={'y': [1, 2, 3],
                       'score': [93.5, 89.4, 90.3],
                       'name': ['Dirac', 'Pauli', 'Bohr'],
                       'birthday': ['1902-08-08', '1900-04-25', '1895-10-07']})
print(type(df))
print(df.dtypes)
df
<class 'pandas.core.frame.DataFrame'>
birthday     object
name         object
score       float64
y             int64
dtype: object
Out[111]:
 birthdaynamescorey
01902-08-08Dirac93.51
11900-04-25Pauli89.42
21895-10-07Bohr90.33

2.read_csv:读取csv数据方法

In [112]:
df.to_csv("./test.csv")
In [113]:
df = pd.read_csv('./test.csv')
df
Out[113]:
 Unnamed: 0birthdaynamescorey
001902-08-08Dirac93.51
111900-04-25Pauli89.42
221895-10-07Bohr90.33

3.Series:类似于一维数组的对象

In [114]:
items = pd.Series(data=[93.5, 89.4, 90.3], name='score')
print(type(items))
items
<class 'pandas.core.series.Series'>
Out[114]:
0    93.5
1    89.4
2    90.3
Name: score, dtype: float64

4.concat:合并不同的轴数据

In [115]:
items2 = pd.Series(data=['1902-08-08', '1900-04-25'], name='birthday')
print('')
print(items2)
print('')
print('按列合并到一起:')
print(pd.concat(objs=[items, items2], axis=0))
print('')
print('按行合并到一起:')
print(pd.concat(objs=[items, items2], axis=1))
0    1902-08-08
1    1900-04-25
Name: birthday, dtype: object

按列合并到一起:
0          93.5
1          89.4
2          90.3
0    1902-08-08
1    1900-04-25
dtype: object

按行合并到一起:
   score    birthday
0   93.5  1902-08-08
1   89.4  1900-04-25
2   90.3         NaN

5.to_datetime:时间格式转换

In [116]:
pd.to_datetime(arg=df.birthday, format='%Y-%m-%d')
Out[116]:
0            1902-08-08
1            1900-04-25
2   1895-10-07 00:00:00
Name: birthday, dtype: datetime64[ns]

6.merge:数据合并

In [41]:
df_new = pd.DataFrame(data=list(zip(['Dirac', 'Pauli', 'Bohr', 'Einstein'],
                                    [True, False, True, True])),
                      columns=['name', 'friendly'])

df_merge = pd.merge(left=df, right=df_new, on='name', how='outer')
df_merge
Out[41]:
 Unnamed: 0birthdaynamescoreyfriendly
00.01902-08-08Dirac93.51.0True
11.01900-04-25Pauli89.42.0False
22.01885-10-07Bohr90.33.0True
3NaNNaNEinsteinNaNNaNTrue

7.date_range:时间序列索引

In [117]:
pd.date_range(start=df.birthday[2], end=df.birthday[0],
              freq='M')
Out[117]:
DatetimeIndex(['1895-10-31', '1895-11-30', '1895-12-31', '1896-01-31',
               '1896-02-29', '1896-03-31', '1896-04-30', '1896-05-31',
               '1896-06-30', '1896-07-31', '1896-08-31', '1896-09-30',
               '1896-10-31', '1896-11-30', '1896-12-31', '1897-01-31',
               '1897-02-28', '1897-03-31', '1897-04-30', '1897-05-31',
               '1897-06-30', '1897-07-31', '1897-08-31', '1897-09-30',
               '1897-10-31', '1897-11-30', '1897-12-31', '1898-01-31',
               '1898-02-28', '1898-03-31', '1898-04-30', '1898-05-31',
               '1898-06-30', '1898-07-31', '1898-08-31', '1898-09-30',
               '1898-10-31', '1898-11-30', '1898-12-31', '1899-01-31',
               '1899-02-28', '1899-03-31', '1899-04-30', '1899-05-31',
               '1899-06-30', '1899-07-31', '1899-08-31', '1899-09-30',
               '1899-10-31', '1899-11-30', '1899-12-31', '1900-01-31',
               '1900-02-28', '1900-03-31', '1900-04-30', '1900-05-31',
               '1900-06-30', '1900-07-31', '1900-08-31', '1900-09-30',
               '1900-10-31', '1900-11-30', '1900-12-31', '1901-01-31',
               '1901-02-28', '1901-03-31', '1901-04-30', '1901-05-31',
               '1901-06-30', '1901-07-31', '1901-08-31', '1901-09-30',
               '1901-10-31', '1901-11-30', '1901-12-31', '1902-01-31',
               '1902-02-28', '1902-03-31', '1902-04-30', '1902-05-31',
               '1902-06-30', '1902-07-31'],
              dtype='datetime64[ns]', freq='M')

8.read_table:读取表格数据,与read_csv类似

In [119]:
df = pd.read_table(filepath_or_buffer='test.csv')
df
Out[119]:
 ,birthday,name,score,y
00,1902-08-08,Dirac,93.5,1
11,1900-04-25,Pauli,89.4,2
22,1895-10-07,Bohr,90.3,3

9. util.testing:集合很多常用功能的模块

In [128]:
import pandas.util.testing as tm
tm.np.random.choice(['red','green'], 10)
Out[128]:
array(['green', 'red', 'green', 'red', 'red', 'green', 'green', 'red',
       'red', 'red'], 
      dtype='|S5')

10.isnull:判断是否为空

In [121]:
test_list = [[None, 1, 2, 3, 4], [None, 1, None, 3, None]]
print(pd.isnull(test_list))

pd.isnull(df_merge)
[[ True False False False False]
 [ True False  True False  True]]
Out[121]:
 Unnamed: 0birthdaynamescoreyfriendly
0FalseFalseFalseFalseFalseFalse
1FalseFalseFalseFalseFalseFalse
2FalseFalseFalseFalseFalseFalse
3TrueTrueFalseTrueTrueFalse

11.value_counts:值的数量

In [121]:

pd.value_counts(dataset.y)



二、numpy

1.arry:基本的数组类型

In [46]:
np.array(object=[[1, 9, 9, 1], [2, 0, 1, 6]], dtype=np.float32)
Out[46]:
array([[ 1.,  9.,  9.,  1.],
       [ 2.,  0.,  1.,  6.]], dtype=float32)

2.zeros:生成值为0的数组

In [47]:
np.zeros(shape=(2, 4), dtype=int)
Out[47]:
array([[0, 0, 0, 0],
       [0, 0, 0, 0]])

3.arange:数组生成(开始,结尾,步长)

In [48]:
np.arange(start=1.5, stop=8.5, step=0.7, dtype=float)
Out[48]:
array([ 1.5,  2.2,  2.9,  3.6,  4.3,  5. ,  5.7,  6.4,  7.1,  7.8])

4.sqrt:数组开方

In [49]:
np.sqrt([16, 9, 4])
Out[49]:
array([ 4.,  3.,  2.])

5.ones:值为1的数组

In [50]:
np.ones(shape=(2, 3, 1), dtype=np.unicode)
Out[50]:
array([[[u'1'],
        [u'1'],
        [u'1']],

       [[u'1'],
        [u'1'],
        [u'1']]], 
      dtype='<U1')

6.sum:求和

In [51]:
vals = np.arange(0, 12, 1).reshape((3, 4))
print(vals)
print('')
print('sum entire array =', np.sum(vals))
print('sum along columns =', np.sum(vals, axis=0))
print('sum along rows =', np.sum(vals, axis=1))
[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]

('sum entire array =', 66)
('sum along columns =', array([12, 15, 18, 21]))
('sum along rows =', array([ 6, 22, 38]))

7. mean:求平均值

In [52]:
vals = np.array([1, 2, 3, 4]*3).reshape((3, 4))
print(vals)
print('')
print('mean entire array =', np.mean(vals))
print('mean along columns =', np.mean(vals, axis=0))
print('mean along rows =', np.mean(vals, axis=1))
[[1 2 3 4]
 [1 2 3 4]
 [1 2 3 4]]

('mean entire array =', 2.5)
('mean along columns =', array([ 1.,  2.,  3.,  4.]))
('mean along rows =', array([ 2.5,  2.5,  2.5]))

8.linspace:等差数列

In [53]:
np.linspace(0, 19.3, 6)
Out[53]:
array([  0.  ,   3.86,   7.72,  11.58,  15.44,  19.3 ])

9.asarray: 拷贝的时候不会复制对象

In [54]:
vals = np.array([9, 2, 3, 5])
print(type(vals))
print(vals)
a = np.asarray(vals)
a += 1
print(vals) # vals changes because it was not copied when assigning 'a'
<type 'numpy.ndarray'>
[9 2 3 5]
[10  3  4  6]

三、Scipy

1.stats:统计假设与检验的包

In [129]:
# 得到标准差,忽略NA
vals = [0.0, np.nan, 8.3, 2.4, np.nan, 3.2]
sp.nanstd(vals)
Out[129]:
3.0243801017729237
In [140]:
# 正态分布
x = np.linspace(0,10,50)
# 画高斯曲线
plt.plot(x, sp.stats.norm.pdf(x=x, loc=5, scale=2))
# 高斯随机样本
sp.stats.norm.rvs(loc=5, scale=2, size=4)
plt.show()

2.sparse:矩阵压缩

In [59]:
vals = np.array([[0, 3.4, 2], [0, 9.9, 0], [0, 0, -5.4]])
print(vals)
print('')
a = sp.sparse.csr_matrix(vals)
print(type(a))
print('non-zero entries =', a.data) # 稀疏矩阵中元素的个数
print('diagonal entries =',a.diagonal())# 对角数据
print('upper triangular =\n',sp.sparse.triu(a))
[[ 0.   3.4  2. ]
 [ 0.   9.9  0. ]
 [ 0.   0.  -5.4]]

<class 'scipy.sparse.csr.csr_matrix'>
('non-zero entries =', array([ 3.4,  2. ,  9.9, -5.4]))
('diagonal entries =', array([ 0. ,  9.9, -5.4]))
('upper triangular =\n', <3x3 sparse matrix of type '<type 'numpy.float64'>'
	with 4 stored elements in COOrdinate format>)

3.optimize 最优化函数库

In [141]:
# 求函数的根

f = lambda x: x**2 - 3*x + 2 # = (x-1)*(x-2)
print(f)
roots = (sp.optimize.brentq(f=f, a=0, b=1.5),
         sp.optimize.brentq(f=f, a=1.5, b=5))
print('First root =', roots[0])
print('Second root =', roots[1])
<function <lambda> at 0x0D724CF0>
('First root =', 1.0000000000000002)
('Second root =', 1.9999999999999998)
In [143]:
# 最小二乘法参数优化

x = np.linspace(0, 10, 10)
y = np.array([-0.5, -1.8, -1.3, -0.1, 0.4,
              1.6, 3.5, 8.9, 12.6, 24.8])

# 二次函数形式拟合
f = lambda beta, x:  beta[0] + beta[1]*x + beta[2]*x**2

# f和实际值之间的差异
error_function = lambda beta, x, y: f(beta, x) - y

beta_0 = (0.0, 0.0, 0.0)

beta, _ = sp.optimize.leastsq(func=error_function, x0=beta_0, args=(x, y))
print('optimal parameters =', beta)
plt.scatter(x, y);
plt.plot(x, [f(beta, xx) for xx in x])
plt.show()
('optimal parameters =', array([ 0.6, -2.2,  0.4]))

4. io:读取matlab文件

In [62]:
# 将数组转换成matlab数据

# 初始化数组
np.set_printoptions(precision=1)
matrix = np.random.random(size=(8, 6))
print(matrix)

# 创建行字典
data_dict = {'row'+str(r_id): row for r_id, row in
             zip(range(len(matrix)), matrix)}
# 将每行变量,写入matlab文件
scipy.io.savemat('random_array.mat', mdict=data_dict, oned_as='row')

# 读取刚保存的数据
loaded_data_dict = scipy.io.loadmat('random_array.mat')
loaded_data_dict
[[ 0.5  0.3  0.7  0.5  0.6  0.2]
 [ 0.2  0.8  0.6  0.6  0.7  0.6]
 [ 0.5  0.3  0.9  0.4  0.6  0.3]
 [ 0.9  0.   0.5  0.6  0.9  0.4]
 [ 0.5  0.3  0.2  0.5  0.7  0.1]
 [ 0.6  0.9  1.   0.8  0.2  0.3]
 [ 0.5  0.6  0.5  0.6  0.8  0.9]
 [ 0.6  0.8  0.   0.8  0.5  0.2]]
Out[62]:
{'__globals__': [],
 '__header__': 'MATLAB 5.0 MAT-file Platform: nt, Created on: Thu Apr 19 09:47:48 2018',
 '__version__': '1.0',
 'row0': array([[ 0.5,  0.3,  0.7,  0.5,  0.6,  0.2]]),
 'row1': array([[ 0.2,  0.8,  0.6,  0.6,  0.7,  0.6]]),
 'row2': array([[ 0.5,  0.3,  0.9,  0.4,  0.6,  0.3]]),
 'row3': array([[ 0.9,  0. ,  0.5,  0.6,  0.9,  0.4]]),
 'row4': array([[ 0.5,  0.3,  0.2,  0.5,  0.7,  0.1]]),
 'row5': array([[ 0.6,  0.9,  1. ,  0.8,  0.2,  0.3]]),
 'row6': array([[ 0.5,  0.6,  0.5,  0.6,  0.8,  0.9]]),
 'row7': array([[ 0.6,  0.8,  0. ,  0.8,  0.5,  0.2]])}

5.linalg:线性代数模块

In [63]:
matrix = np.array([[4.3, 8.9],[2.2, 3.4]])
print(matrix)
print('')

# 求范数
norm = sp.linalg.norm(matrix)
print('norm =', norm)
# Alternate method
print(norm == np.square([v for row in matrix for v in row]).sum()**(0.5))
print('')

# 求特征值和特征向量
eigvals, eigvecs = sp.linalg.eig(matrix)
print('eigenvalues =', eigvals)
print('eigenvectors =\n', eigvecs)
[[ 4.3  8.9]
 [ 2.2  3.4]]

('norm =', 10.681760154581267)
True

('eigenvalues =', array([ 8.3+0.j, -0.6+0.j]))
('eigenvectors =\n', array([[ 0.9, -0.9],
       [ 0.4,  0.5]]))

6.interpolate:插值

In [144]:
# 散点拟合

x = np.linspace(0, 10, 10)
xs = np.linspace(0, 11, 50)
y = np.array([0.5, 1.8, 1.3, 3.5, 3.4,
              5.2, 3.5, 1.0, -2.3, -6.3])
spline = sp.interpolate.UnivariateSpline(x, y)
plt.scatter(x, y);
plt.plot(xs, spline(xs))
plt.show()

7.special:排列、组合、阶乘

In [145]:
x = np.linspace(0,10,500)
fix, ax = plt.subplots(2)

ax[0].set_title('Zero and first order bessel functions of the first kind')
ax[0].plot(x, sp.special.j0(x), c='blue', alpha=0.6)
ax[0].plot(x, sp.special.j1(x), c='red', alpha=0.6)

ax[1].set_title('Zero and first order bessel functions of the second kind')
ax[1].plot(x, sp.special.y0(x), c='blue', alpha=0.6)
ax[1].plot(x, sp.special.y1(x), c='red', alpha=0.6)
ax[1].set_ylim(-2,1); ax[1].set_xlim(0.5,10)
ax[1].annotate('$Y_0$ and $Y_1$ approach -$\infty$', xy=(1,-1.7), xytext=(2.5, -0.9),
               arrowprops=dict(arrowstyle='->', lw=1), fontsize=15)

plt.show()

8. signal:信号处理

In [146]:
# A modified example posted in the docs:
# http://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.lfilter.html#scipy.signal.lfilter

import scipy.signal
np.random.seed(0)

x = np.linspace(0,6*np.pi,100)
y = [sp.special.sph_jn(n=3, z=xi)[0][0] for xi in x]
y = [yi + (np.random.random()-0.5)*0.7 for yi in y]
# y = np.sin(x)

# 得到一个3阶低通巴特沃斯滤波器参数
b, a = sp.signal.butter(3, 0.08)

# Initialize filter
zi = sp.signal.lfilter_zi(b, a)

# Apply filter
y_smooth, _ = sp.signal.lfilter(b, a, y, zi=zi*y[0])

plt.plot(x, y, c='blue', alpha=0.6)
plt.plot(x, y_smooth, c='red', alpha=0.6)
plt.title('Noisy spherical bessel function signal processing')
plt.savefig('noisy_signal_fit.png', bbox_inches='tight')
plt.show()
D:\python2713\lib\anaconda_install\lib\site-packages\ipykernel_launcher.py:9: DeprecationWarning: `sph_jn` is deprecated!
scipy.special.sph_jn is deprecated in scipy 0.18.0. Use scipy.special.spherical_jn instead. Note that the new function has a different signature.
  if __name__ == '__main__':

9.ndimage:图像处理

In [148]:
# 模糊图像

# 导入图像
figure = plt.imread('noisy_signal_fit.png')

# 模糊图像
figure_blur = sp.ndimage.filters.gaussian_filter(figure, sigma=2)# sigma值越大。越模糊

# 画图
pics = [figure, figure_blur]
sns.set_style('white')
fig, axes = plt.subplots(1, 2, figsize=(10, 4))
for pic, ax in zip(pics, axes):
    ax.imshow(pic); ax.set_xticks([]); ax.set_yticks([])

10.misc:图像处理

In [149]:
# 获得浣熊脸

# 获取浣熊
pics = sp.misc.face(), sp.misc.face(gray=True)

# 画出来
fig, axes = plt.subplots(1, 2, figsize=(10, 4))
for pic, ax in zip(pics, axes):
    ax.imshow(pic); ax.set_xticks([]); ax.set_yticks([])
plt.show()

项目地址

参考

  • 5
    点赞
  • 24
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
numpypandasscipy、scikit-learn和matplotlib是Python中常用的科学计算和数据可视化库。它们之间的关系是: - Numpy是一个用于进行数值计算的库,提供了多维数组对象和一组用于操作数组的函数。它是其他科学计算库的基础。 - Pandas是一个用于数据分析和处理的库,提供了高效的数据结构和数据分析工具。它建立在Numpy之上,可以方便地处理和操作数据。 - Scipy是一个用于科学计算的库,提供了许多数学、科学和工程计算的功能。它建立在Numpy之上,并提供了更高级的数学和科学计算功能。 - Scikit-learn是一个用于机器学习的库,提供了各种机器学习算法和工具。它建立在NumpyScipy之上,并提供了用于训练和评估模型的函数和工具。 - Matplotlib是一个用于数据可视化的库,提供了各种绘图函数和工具。它可以用于创建各种类型的图表和图形,包括线图、散点图、柱状图等。 如果你想学习这些库,可以按照以下学习路径进行: 1. 确定学习目的,即你想要使用这些库来解决什么问题或实现什么功能。 2. 搜索相关资源,包括官方文档、教程、示例代码等。你可以参考引用\[1\]和引用\[2\]中提供的学习资料。 3. 制定学习计划,确定你需要学习的内容和学习的顺序。你可以按照引用\[2\]中提供的学习路径原则进行规划。 总之,numpypandasscipy、scikit-learn和matplotlib是Python中常用的科学计算和数据可视化库,它们之间有着密切的关系,并且可以相互配合使用来进行数据分析、科学计算和机器学习等任务。 #### 引用[.reference_title] - *1* *2* [NumpyPandasSciPy、Scikit-Learn、Matplotlib的关系以及学习资料](https://blog.csdn.net/u014410989/article/details/89947128)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v91^koosearch_v1,239^v3^insert_chatgpt"}} ] [.reference_item] - *3* [安装conda和pandasnumpy、scikit-learn、seaborn、matplotlib、xlutils](https://blog.csdn.net/q839039228/article/details/124516133)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v91^koosearch_v1,239^v3^insert_chatgpt"}} ] [.reference_item] [ .reference_list ]

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

若云流风

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值