使用Python开发AI常用的工具或库

tony_dr

已于 2023-10-31 16:49:43 修改

阅读量360

点赞数

分类专栏：人工智能文章标签：人工智能 python 开发语言

于 2023-10-27 13:53:04 首次发布

本文链接：https://blog.csdn.net/tony19820314/article/details/134068815

版权

人工智能专栏收录该内容

13 篇文章 0 订阅

订阅专栏

Jupyter Notebook

官网：Project Jupyter | Home

官网的描述比较抽象：“Jupyter Notebook是用于通信和执行交互式计算的社区标准。它们是一个融合了计算、输出、解释性文本、数学、图像和对象的富媒体表示的文档”。简单来说，Jupyter Notebook可以认为是Python的集成开发环境，可以编辑并运行Python代码。同时，由于Jupyter Notebook可以包含很多Python的工具库，比如matplotlib，因此可以很方便地在运行时绘制类似Matlab的图标。

Google Colab Notebook

官网：https://colab.research.google.com/?utm_source=scs-index#scrollTo=gJr_9dXGpJ05

该链接可以看作是一个网页形式的Jupyter Notebook。在里面可以非常方便地使用Python以及所有Python的工具库、进行AI的开发、执行，并输出执行结果。其用法在我的另一篇文章中可以查阅：初识Google Colab Jupyter Notebook-CSDN博客

Pandas

官网：pandas - Python Data Analysis Library

“Pandas是一个快速、强大、灵活且易于使用的开源数据分析和操作工具，建立在Python编程语言之上”。这句是官网的一个简介，更具体地，Pandas提供了很多形式的数据文件的访问，比如：Excel表，JSON，XML，SQL数据库等，同时提供了对读取数据的基本处理，比如：给出数据的维度（行列数），选择某些行/列的数据，数据合并和比较等等。

由于Pandas也是一个Python的工具库，所以可以在前面的Jupyter Notebook或者Google Colab Notebook中直接import，并使用。例如，可以直接在Google Colab Notebook中书写如下代码，读取github中存放的一个excel表，并将读取的数据存储到advertising_df中：

import pandas as pd
url = "https://raw.githubusercontent.com/LinkedInLearning/artificial-intelligence-foundations-neural-networks-4381282/main/Advertising_2023.csv"
advertising_df = pd.read_csv(url, index_col=0)

上例中，读取了github的链接中存放的一个excel文件，其中url的获取方式如下：

找到github中的这个文件，点击该文件

点击raw

然后复制地址栏的链接，即可

NumPy

官网：NumPy

“用Python实现科学计算的基础软件包”。Numpy提供了强大的N维矩阵相关功能，提供了“全面的数学函数、随机数生成器、线性代数例程、傅立叶变换等”。

和Pandas一样，Numpy也是一个Python的工具库，所以可以在前面的Jupyter Notebook或者Google Colab Notebook中直接import，并使用。例如，可以直接在Google Colab Notebook中书写如下代码，对向量x、y执行多项式的线性拟合(polyfit)：

import numpy as np
x = advertising_df['digital']
y = advertising_df['sales']
np.polyfit(x, y, 1)

其中，advertising_df是前面使用Pandas从Excel表中读取的数据。运行后，输出结果如下：

array([ 0.01456994, 12.03168039])

即y=ax+b的参数a, b的值。

Matplotlib

官网：Matplotlib — Visualization with Python

按照官网的描述，“Matplotlib是一个用于在Python中创建静态的、动画的和交互式的可视化图像的综合库”。在文章初识Google Colab Jupyter Notebook-CSDN博客中，使用了Matplotlib绘制了一个100个随机数的数值图像。

Matplotlib同样可以在Google Colab Notebook中直接import并使用。

seaborn

官网：seaborn: statistical data visualization — seaborn 0.13.0 documentation

按照官网的描述，“Seaborn是一个基于matplotlib的Python数据可视化库。它提供了一个高级界面，用于绘制有吸引力且信息丰富的统计图形”；“Seaborn是一个用Python制作统计图形的库。它构建在matplotlib之上，并与Pandas数据结构紧密集成”。

同Matplotlib一样，seaborn可以在Google Colab Notebook中直接import并使用。下面的代码展示了使用Matplotlib和seaborn绘制反应数据之间相关性的heatmap。注意Seaborn是基于matplotlib的，因此对Matplotlib的操作，其效果可以体现在seaborn上。

import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

plt.figure(figsize=(10,5))
sns.heatmap(advertising_df.corr(), annot=True, vmin=0, vmax=1, cmap='ocean')

上述代码使用Matplotlib的figure函数设置图像的大小，然后使用seaborn的heatmap函数对advertising_df的相关矩阵advertising_df.corr()绘制heatmap。执行效果如下：

scikit-learn

官网：scikit-learn: machine learning in Python — scikit-learn 1.3.2 documentation

官网的描述有：“用于预测数据分析的简单高效的工具”，“基于NumPy、SciPy和matplotlib构建”。具体来说，scikit-learn是一个用于建模的工具，尤其是AI相关的建模。其中包含许多AI算法的实现，例如：Gradient boosting(梯度提升), nearest neighbors,(
最近邻插值法
), random forest(随机森林), logistic regression(逻辑回归), k-Means(K-均值聚类), HDBSCAN(Hierarchical Density-Based Spatial Clustering of Applications with Noise, 基于分层密度的噪声应用的空间聚类), hierarchical clustering(层次聚类),...。因此，可以用于许多类型的AI问题处理，例如Classification（识别）、Regression（回归）、Clustering（分类/聚类）等等。

TODO(案例)

TensorFlow

官网：https://www.tensorflow.org/

官网描述有，“创建产品级的机器学习（Machine Learning，ML）模型”；可以使用预先训练好的模型、也可以训练自己的模型；是一个“端到端的机器学习平台”，可以从加载数据开始、构建ML模型、部署模型、运行模型。因此TensorFlow也是一个AI建模工具。

TODO(案例)

Keras

官网：Keras: Deep Learning for humans

官网描述：“Keras是一个用Python编写的深度学习API，运行在机器学习平台TensorFlow之上。它的开发重点是实现快速实验”。因此，Keras是一个基于TensorFlow实现的AI建模工具。

Keras也可以在Google Colab Notebook中直接import并使用。下面是使用Keras实现AI建模的一段代码。

#Build the Network
from tensorflow import keras
from keras.models import Sequential
from keras.layers import Dense

X = advertising_df[['digital', 'TV', 'radio', 'newspaper']]
y = advertising_df['sales']

#feature normalization
normalized_feature = keras.utils.normalize(X.values)

from sklearn.model_selection import train_test_split

# Split up the data into a training set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=101)

## Build Model (Building a three layer network - with one hidden layer)
model = Sequential()
model.add(Dense(4, input_dim=4, activation='relu'))
model.add(Dense(3, activation='relu'))
model.add(Dense(1))

# Compile Model
model.compile(optimizer='adam', loss='mse', metrics=['mse'])

# Fit the Model
history = model.fit(X_train, y_train, validation_data=(X_test, y_test),
                    epochs=32)

## Plot a graph of model loss # show the graph of model loss in trainig and validation
plt.figure(figsize=(15,8))
plt.plot(history.history['val_loss'])
plt.plot(history.history['loss'])
plt.title("Model Loss (MSE) on Training and Validation Data")
plt.ylabel('Loss - Mean Square Error')
plt.xlabel('Epoch')
plt.legend(['Val Loss', 'Train Loss'], loc='upper right')
plt.show()

以上代码的说明如下：

Pandas的DataFrame实例，在取某列数据的时候，只需要1对方括号即可，但如果取多列数据的时候，需要2对方括号。所以y = advertising_df['sales']是一列数据，一维列向量，而X = advertising_df[['digital', 'TV', 'radio', 'newspaper']]是由4列数据构成的一个新的矩阵。
keras.utils.normalize()是取范数，默认是二范数（范数的含义是长度）
sklearn就是前面提到的scikit-learn，其中的train_test_split函数用于将整个数据拆分为用于训练的数据（X_train，y_train）和用于测试验证的数据（X_test，y_test），test_size=0.4表示测试验证数据的占比
Sequential模型”适用于每层只有一个输入扩张向量和一个输出扩张向量的、多个普通层的堆栈”。model.add()函数是为这个模型添加层。上述代码中一共添加了3个层：
- 第一层在神经网络中称为“输入层”，有4个维度，激活函数为Relu函数
- 第二层在神经网络中称为“隐藏层”，这里设定了3个神经元（3个维度），激活函数为Relu函数
- 第三层在神经网络中称为“输出层”，只有1个维度
model.fit即模型开始运行，epoch=32表示模型循环运行32次