openEuler入门学习教程，从入门到精通，openEuler 24.03 LTS 中人工智能及开发环境配置与编程实践(15）

最新推荐文章于 2025-11-26 08:46:08 发布

原创最新推荐文章于 2025-11-26 08:46:08 发布 · 284 阅读

5 ·

CC 4.0 BY-SA版权

文章标签：

#人工智能 #Liunx #openEuler

Linux 同时被 3 个专栏收录

28 篇文章

订阅专栏

操作系统

27 篇文章

订阅专栏

openEuler

16 篇文章

订阅专栏

openEuler 24.03 LTS 中人工智能及开发环境配置与编程实践

一、机器学习开发环境配置

1. 安装 Anaconda（适用于 openEuler 24.03 LTS）

openEuler 基于 aarch64（ARM64）架构时，请下载 ARM64 版本。

# 下载 Anaconda（以 Python 3.11 为例）
wget https://repo.anaconda.com/archive/Anaconda3-2024.06-Linux-aarch64.sh

# 安装（按提示操作，建议安装到 /opt/anaconda3 或用户目录）
bash Anaconda3-2024.06-Linux-aarch64.sh

# 初始化 conda（自动修改 ~/.bashrc）
source ~/.bashrc

# 验证
conda --version
python --version

⚠️ 若为 x86_64 架构，请下载 x86_64 版本。

2. conda 基本用法

命令	说明
`conda create -n ml_env python=3.11`	创建名为 `ml_env` 的虚拟环境
`conda activate ml_env`	激活环境
`conda deactivate`	退出环境
`conda install numpy pandas scikit-learn matplotlib`	安装包
`conda list`	查看已安装包
`conda env remove -n ml_env`	删除环境

示例：创建机器学习专用环境

conda create -n ai_lab python=3.11 -y
conda activate ai_lab
conda install scikit-learn matplotlib pandas jupyter -y

3. Python 开发基础（AI 场景常用语法）

(1) NumPy 数组操作

import numpy as np

# 创建数组
arr = np.array([[1, 2, 3], [4, 5, 6]])
print("形状:", arr.shape)        # (2, 3)
print("均值:", arr.mean())       # 3.5

# 向量化运算（无需 for 循环）
scaled = arr * 2 + 1
print("缩放后:\n", scaled)

(2) Pandas 数据处理

import pandas as pd

# 创建 DataFrame
df = pd.DataFrame({
    'name': ['Alice', 'Bob', 'Charlie'],
    'age': [25, 30, 35],
    'score': [88.5, 92.0, 76.5]
})

# 筛选与统计
high_score = df[df['score'] > 80]
print("高分学生:\n", high_score)
print("平均年龄:", df['age'].mean())

二、综合案例：基于 scikit-learn 的聚类分析

案例概述

使用 K-Means 聚类算法 对客户消费行为数据进行分群，识别高价值客户群体。

案例详解

步骤 1：准备模拟数据

import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn.datasets import make_blobs

# 生成模拟客户数据：[年消费金额, 平均单次消费]
X, _ = make_blobs(n_samples=300, centers=4, cluster_std=1.5,
                  random_state=42, center_box=(20, 80))

# 可视化原始数据（无标签）
plt.scatter(X[:, 0], X[:, 1], s=30, alpha=0.7)
plt.title("客户消费行为原始分布")
plt.xlabel("年消费金额（千元）")
plt.ylabel("平均单次消费（百元）")
plt.grid(True)
plt.show()

步骤 2：执行 K-Means 聚类

# 创建 K-Means 模型（预设 4 类）
kmeans = KMeans(n_clusters=4, random_state=42, n_init=10)

# 训练模型并预测类别
y_pred = kmeans.fit_predict(X)

# 获取聚类中心
centers = kmeans.cluster_centers_

# 可视化聚类结果
plt.scatter(X[:, 0], X[:, 1], c=y_pred, cmap='viridis', s=30, alpha=0.7)
plt.scatter(centers[:, 0], centers[:, 1], c='red', marker='x', s=200, linewidths=3, label='聚类中心')
plt.title("K-Means 聚类结果")
plt.xlabel("年消费金额（千元）")
plt.ylabel("平均单次消费（百元）")
plt.legend()
plt.grid(True)
plt.show()

步骤 3：评估聚类效果（可选）

from sklearn.metrics import silhouette_score

score = silhouette_score(X, y_pred)
print(f"轮廓系数（越接近1越好）: {score:.3f}")
# 输出示例：0.582 → 中等聚类效果

✅ 应用场景：客户分群、异常检测、市场细分。

三、深度学习开发环境配置

1. TensorFlow 简介

Google 开源的端到端机器学习框架
支持 CPU/GPU/TPU 训练
提供高级 API（如 Keras）简化模型构建

2. 在 openEuler 上安装 TensorFlow

注意：openEuler 24.03 LTS 默认为 aarch64 架构，官方 不提供 ARM64 的 TensorFlow PyPI 包。
解决方案：使用 社区编译版本 或 源码编译（推荐使用 Conda-Forge）。

方法：通过 conda-forge 安装（支持 aarch64）

# 激活环境
conda activate ai_lab

# 添加 conda-forge 频道
conda config --add channels conda-forge

# 安装 tensorflow（社区维护版）
conda install tensorflow -y

若为 x86_64 架构，可直接使用 pip：
pip install tensorflow

3. 测试 TensorFlow 是否安装成功

import tensorflow as tf

print("TensorFlow 版本:", tf.__version__)
print("GPU 可用:", tf.config.list_physical_devices('GPU'))

# 简单计算测试
a = tf.constant([[1, 2], [3, 4]])
b = tf.constant([[5, 6], [7, 8]])
c = tf.matmul(a, b)
print("矩阵乘法结果:\n", c.numpy())

预期输出（CPU 模式）：

TensorFlow 版本: 2.16.1
GPU 可用: []
矩阵乘法结果:
[[19 22]
[43 50]]

四、综合案例：基于 TensorFlow 的服饰图像分类

案例概述

使用 Fashion-MNIST 数据集（10 类服饰图像），构建 CNN 模型实现自动分类。

环境准备

# 确保已安装
conda install tensorflow matplotlib numpy -y

案例详解

步骤 1：加载并预处理数据

import tensorflow as tf
from tensorflow.keras import layers, models
import matplotlib.pyplot as plt

# 加载 Fashion-MNIST 数据集（28x28 灰度图，60k 训练 + 10k 测试）
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.fashion_mnist.load_data()

# 归一化像素值到 [0,1]
x_train = x_train.astype('float32') / 255.0
x_test = x_test.astype('float32') / 255.0

# 添加通道维度（CNN 需要 (height, width, channels)）
x_train = x_train[..., tf.newaxis]
x_test = x_test[..., tf.newaxis]

# 类别标签名称
class_names = ['T-shirt', 'Trouser', 'Pullover', 'Dress', 'Coat',
               'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']

步骤 2：构建 CNN 模型

model = models.Sequential([
    layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)),
    layers.MaxPooling2D((2, 2)),
    layers.Conv2D(64, (3, 3), activation='relu'),
    layers.MaxPooling2D((2, 2)),
    layers.Conv2D(64, (3, 3), activation='relu'),
    layers.Flatten(),
    layers.Dense(64, activation='relu'),
    layers.Dense(10, activation='softmax')  # 10 类输出
])

# 编译模型
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

# 查看模型结构
model.summary()

步骤 3：训练模型

# 训练（epochs=5 足够达到 ~90% 准确率）
history = model.fit(x_train, y_train,
                    batch_size=32,
                    epochs=5,
                    validation_data=(x_test, y_test))

步骤 4：评估与可视化

# 测试集准确率
test_loss, test_acc = model.evaluate(x_test, y_test, verbose=0)
print(f"\n测试准确率: {test_acc:.4f}")

# 绘制训练历史
plt.figure(figsize=(12, 4))

plt.subplot(1, 2, 1)
plt.plot(history.history['accuracy'], label='训练准确率')
plt.plot(history.history['val_accuracy'], label='验证准确率')
plt.title('模型准确率')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend()

plt.subplot(1, 2, 2)
plt.plot(history.history['loss'], label='训练损失')
plt.plot(history.history['val_loss'], label='验证损失')
plt.title('模型损失')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()

plt.tight_layout()
plt.show()

步骤 5：预测示例

# 预测前 5 张测试图像
predictions = model.predict(x_test[:5])
predicted_labels = predictions.argmax(axis=1)

# 显示结果
plt.figure(figsize=(10, 2))
for i in range(5):
    plt.subplot(1, 5, i+1)
    plt.imshow(x_test[i].squeeze(), cmap='gray')
    plt.title(f"真实: {class_names[y_test[i]]}\n预测: {class_names[predicted_labels[i]]}")
    plt.axis('off')
plt.show()

✅ 典型输出：准确率约 90%，可用于教学或原型开发。

五、小结：AI 开发流程对比

阶段	机器学习（scikit-learn）	深度学习（TensorFlow）
数据规模	小到中等（<10万样本）	大规模（图像、语音）
特征工程	手动设计特征	自动学习特征
模型复杂度	决策树、SVM、K-Means	CNN、RNN、Transformer
硬件需求	CPU 即可	推荐 GPU 加速
典型应用	客户分群、回归预测	图像识别、NLP

六、在 openEuler 上的最佳实践建议

架构适配：
- ARM64 用户优先使用 conda-forge 安装 TensorFlow
- 避免直接 pip install tensorflow（可能无 ARM 包）

环境隔离：

conda create -n tf_gpu python=3.11  # 如有 NVIDIA GPU
conda create -n ml_cpu python=3.11  # 纯 CPU 环境

性能优化：
- 使用 tf.data 提升数据加载效率
- 启用混合精度训练（tf.keras.mixed_precision）
部署延伸：
- 训练后可导出为 SavedModel：
```
model.save('fashion_model')
```
- 使用 TensorFlow Serving 或 ONNX 进行生产部署