用github的Actions来自动训练AI模型（包含YAML的详细配置过程）

最新推荐文章于 2024-08-02 10:49:14 发布

ybdesire

最新推荐文章于 2024-08-02 10:49:14 发布

阅读量1.6k

点赞数

分类专栏： Machine Learning Git 运维文章标签：人工智能 github python

本文链接：https://blog.csdn.net/ybdesire/article/details/121986149

版权

Machine Learning 同时被 3 个专栏收录

110 篇文章 17 订阅

订阅专栏

运维

16 篇文章 0 订阅

订阅专栏

Git

3 篇文章 1 订阅

订阅专栏

1. 引入

github的Actions功能（见参考1），提供了CI的workflow功能，能在“用户push代码后”，自动获取最新代码，放到虚拟机上编译代码，自动化测试，发布到第三方服务等。

从参考2中可以看到，github用2CPU+7G内存的虚拟机来编译、测试代码。这个硬件配置足够用来训练简单的sklearn模型了。下面就讲解如何配置github的Actions，来实现对IRIS数据集训练一个RandomForest模型，并输出其在测试集上的准确率。

2. 模型训练与评估的代码

本文使用的代码如下，都是基于sklearn写的。

from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_breast_cancer, load_iris
from sklearn.model_selection import train_test_split
import pandas as pd
# step-01: load data
data = load_iris(as_frame=True)# load as dataframe
x_data = data.data.to_numpy()
y_data = data.target.values
# step-02: split dataset to train and test
X_train, X_test, y_train, y_test = train_test_split(x_data, y_data, test_size=0.2, random_state=666)
# step-03: train and test model
model = RandomForestClassifier()
model.fit(X_train,y_train)
score = model.score(X_test,y_test)
print('test acc = {0}'.format(score))

将代码保存为main.py文件。

3. github的Actions功能配置

将main.py放到github的repo
选择 Actions->Simple workflow set up，使用默认的YAML配置

默认的YAML配置会保存为 .github/workflows/blank.yml 文件，workflow所完成的任务见下面中文注释。

# This is a basic workflow to help you get started with Actions

name: CI

# Controls when the workflow will run
on:
  # 触发条件：push新代码到main-branch，或者创建pull request到main-branch
  push:
    branches: [ main ]
  pull_request:
    branches: [ main ]

  # Allows you to run this workflow manually from the Actions tab
  workflow_dispatch:

# A workflow run is made up of one or more jobs that can run sequentially or in parallel
jobs:
  # This workflow contains a single job called "build"
  build:
    # 代码运行在ubuntu-latest服务器上
    runs-on: ubuntu-latest

    # Steps represent a sequence of tasks that will be executed as part of the job
    steps:
      # 拉第二个commit的最新代码（拉master的代码要将v2改为master）
      - uses: actions/checkout@v2

      # 在shell上运行一条echo命令
      - name: Run a one-line script
        run: echo Hello, world!

      # 在shell上运行多条命令
      - name: Run a multi-line script
        run: |
          echo Add other actions to build,
          echo test, and deploy your project.

更多YAML文件的配置方法，见参考1。

增加安装依赖的过程

仿照上面在shell上运行命令的写法，可以配置运行pip命令来安装依赖

      - name: Run pip
        run: |
          pip install scikit-learn
          pip install pandas

运行代码

仿照上面在shell上运行命令的写法，可以配置运行python程序的过程

      - name: Run python code
        run: |
          python main.py

完整的blank.yml配置

完整的配置见

https://github.com/ybdesire/action_test_build_ai_model/blob/main/.github/workflows/blank.yml

4. 如何触发训练

在main-branch上任何有commit的改动都能触发训练，并能从Actions里点进去看workflow的运行过程与结果输出。

在这里插入图片描述

参考

GitHub Actions 入门教程。 https://www.ruanyifeng.com/blog/2019/09/getting-started-with-github-actions.html
https://p3terx.com/archives/github-actions-virtual-environment-simple-evaluation.html