MNIST Examples for GGML - Convolutional network
1. Build
https://github.com/ggml-org/ggml
git clone https://github.com/ggml-org/ggml
cd ggml
# install python dependencies in a virtual environment
python3.10 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
# build the examples
mkdir build && cd build
cmake ..
cmake --build . --config Release -j 8
(base) yongqiang@yongqiang:~$ cd llm_work/
(base) yongqiang@yongqiang:~/llm_work$ mkdir ggml_25_02_15
(base) yongqiang@yongqiang:~/llm_work$ cd ggml_25_02_15/
(base) yongqiang@yongqiang:~/llm_work/ggml_25_02_15$ git clone https://github.com/ggml-org/ggml.git
Cloning into 'ggml'...
remote: Enumerating objects: 13755, done.
remote: Counting objects: 100% (498/498), done.
remote: Compressing objects: 100% (193/193), done.
remote: Total 13755 (delta 331), reused 335 (delta 303), pack-reused 13257 (from 3)
Receiving objects: 100% (13755/13755), 12.88 MiB | 213.00 KiB/s, done.
Resolving deltas: 100% (9411/9411), done.
(base) yongqiang@yongqiang:~/llm_work/ggml_25_02_15$ cd ggml/
(base) yongqiang@yongqiang:~/llm_work/ggml_25_02_15/ggml$
(base) yongqiang@yongqiang:~/llm_work/ggml_25_02_15/ggml$ pip install -r requirements.txt
(base) yongqiang@yongqiang:~/llm_work/ggml_25_02_15/ggml$ vim build_ggml_linux_cpu.sh
(base) yongqiang@yongqiang:~/llm_work/ggml_25_02_15/ggml$ chmod a+x build_ggml_linux_cpu.sh
(base) yongqiang@yongqiang:~/llm_work/ggml_25_02_15/ggml$ cat build_ggml_linux_cpu.sh
#! /bin/bash
# build the examples
mkdir build && cd build
cmake ..
cmake --build . --config Debug -j 8
(base) yongqiang@yongqiang:~/llm_work/ggml_25_02_15/ggml$
(base) yongqiang@yongqiang:~/llm_work/ggml_25_02_15/ggml$ bash build_ggml_linux_cpu.sh
-- The C compiler identification is GNU 9.4.0
-- The CXX compiler identification is GNU 9.4.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Check if compiler accepts -pthread
-- Check if compiler accepts -pthread - yes
-- Found Threads: TRUE
-- Warning: ccache not found - consider installing it for faster compilation or disable this warning with GGML_CCACHE=OFF
-- CMAKE_SYSTEM_PROCESSOR: x86_64
-- Including CPU backend
-- Found OpenMP_C: -fopenmp (found version "4.5")
-- Found OpenMP_CXX: -fopenmp (found version "4.5")
-- Found OpenMP: TRUE (found version "4.5")
-- x86 detected
-- Adding CPU backend variant ggml-cpu: -march=native
-- x86 detected
-- Linux detected
-- Configuring done (10.1s)
-- Generating done (0.1s)
-- Build files have been written to: /home/yongqiang/llm_work/ggml_25_02_15/ggml/build
...
[ 99%] Linking CXX executable ../../bin/gpt-j
[ 99%] Built target gpt-j
[100%] Linking CXX executable ../../bin/sam
[100%] Built target sam
(base) yongqiang@yongqiang:~/llm_work/ggml_25_02_15/ggml$
2. MNIST Examples for GGML
https://github.com/ggml-org/ggml/tree/master/examples/mnist
This directory contains simple examples of how to use GGML for training and inference using the MNIST dataset. All commands listed in this README assume the working directory to be examples/mnist
.
MNIST dataset
https://yann.lecun.com/exdb/mnist/
Please note that training in GGML is a work-in-progress and not production ready.
2.1. Obtaining the data
A description of the dataset can be found on Yann LeCun’s website.
While it is also in principle possible to download the dataset from this website these downloads are frequently throttled and
it is recommended to use HuggingFace instead.
The dataset will be downloaded automatically when running mnist-train-fc.py
.
2.2. Convolutional network
2.2.1. To train a convolutional network using TensorFlow
$ python3 mnist-train-cnn.py mnist-cnn-f32.gguf
...
Test loss: 0.047947
Test accuracy: 98.46%
GGUF model saved to 'mnist-cnn-f32.gguf'
(base) yongqiang@yongqiang:~/llm_work/ggml_25_02_15/ggml/examples/mnist$ python3 mnist-train-cnn.py mnist-cnn-f32.gguf
2025-02-16 00:31:24.538461: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2025-02-16 00:31:25.845492: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2025-02-16 00:31:26.586782: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1739637087.450067 13320 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1739637087.641610 13320 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-02-16 00:31:29.615138: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
11490434/11490434 ━━━━━━━━━━━━━━━━━━━━ 3s 0us/step
x_train shape: (60000, 28, 28, 1)
60000 train samples
10000 test samples
W0000 00:00:1739637115.587508 13320 gpu_device.cc:2344] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
Model: "sequential"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
┃ Layer (type) ┃ Output Shape ┃ Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
│ conv2d (Conv2D) │ (None, 28, 28, 8) │ 80 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ max_pooling2d (MaxPooling2D) │ (None, 14, 14, 8) │ 0 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ conv2d_1 (Conv2D) │ (None, 14, 14, 16) │ 1,168 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ max_pooling2d_1 (MaxPooling2D) │ (None, 7, 7, 16) │ 0 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ flatten (Flatten) │ (None, 784) │ 0 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dense (Dense) │ (None, 10) │ 7,850 │
└──────────────────────────────────────┴─────────────────────────────┴─────────────────┘
Total params: 9,098 (35.54 KB)
Trainable params: 9,098 (35.54 KB)
Non-trainable params: 0 (0.00 B)
2025-02-16 00:31:56.042097: W external/local_xla/xla/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 169344000 exceeds 10% of free system memory.
Epoch 1/30
54/54 ━━━━━━━━━━━━━━━━━━━━ 8s 108ms/step - accuracy: 0.4444 - loss: 1.9731 - val_accuracy: 0.8597 - val_loss: 0.5530
Epoch 2/30
54/54 ━━━━━━━━━━━━━━━━━━━━ 7s 134ms/step - accuracy: 0.8565 - loss: 0.5076 - val_accuracy: 0.9253 - val_loss: 0.2565
Epoch 3/30
54/54 ━━━━━━━━━━━━━━━━━━━━ 6s 115ms/step - accuracy: 0.9145 - loss: 0.2980 - val_accuracy: 0.9482 - val_loss: 0.1905
Epoch 4/30
54/54 ━━━━━━━━━━━━━━━━━━━━ 5s 86ms/step - accuracy: 0.9342 - loss: 0.2310 - val_accuracy: 0.9577 - val_loss: 0.1591
Epoch 5/30
54/54 ━━━━━━━━━━━━━━━━━━━━ 6s 117ms/step - accuracy: 0.9444 - loss: 0.1877 - val_accuracy: 0.9638 - val_loss: 0.1359
Epoch 6/30
54/54 ━━━━━━━━━━━━━━━━━━━━ 5s 100ms/step - accuracy: 0.9548 - loss: 0.1577 - val_accuracy: 0.9680 - val_loss: 0.1187
Epoch 7/30
54/54 ━━━━━━━━━━━━━━━━━━━━ 5s 88ms/step - accuracy: 0.9576 - loss: 0.1445 - val_accuracy: 0.9730 - val_loss: 0.1073
Epoch 8/30
54/54 ━━━━━━━━━━━━━━━━━━━━ 5s 87ms/step - accuracy: 0.9628 - loss: 0.1256 - val_accuracy: 0.9740 - val_loss: 0.0994
Epoch 9/30
54/54 ━━━━━━━━━━━━━━━━━━━━ 5s 96ms/step - accuracy: 0.9668 - loss: 0.1131 - val_accuracy: 0.9757 - val_loss: 0.0912
Epoch 10/30
54/54 ━━━━━━━━━━━━━━━━━━━━ 5s 88ms/step - accuracy: 0.9672 - loss: 0.1094 - val_accuracy: 0.9767 - val_loss: 0.0882
Epoch 11/30
54/54 ━━━━━━━━━━━━━━━━━━━━ 5s 84ms/step - accuracy: 0.9717 - loss: 0.0985 - val_accuracy: 0.9780 - val_loss: 0.0820
Epoch 12/30
54/54 ━━━━━━━━━━━━━━━━━━━━ 6s 102ms/step - accuracy: 0.9732 - loss: 0.0901 - val_accuracy: 0.9792 - val_loss: 0.0793
Epoch 13/30
54/54 ━━━━━━━━━━━━━━━━━━━━ 7s 130ms/step - accuracy: 0.9745 - loss: 0.0882 - val_accuracy: 0.9800 - val_loss: 0.0771
Epoch 14/30
54/54 ━━━━━━━━━━━━━━━━━━━━ 5s 91ms/step - accuracy: 0.9747 - loss: 0.0870 - val_accuracy: 0.9808 - val_loss: 0.0742
Epoch 15/30
54/54 ━━━━━━━━━━━━━━━━━━━━ 5s 87ms/step - accuracy: 0.9762 - loss: 0.0785 - val_accuracy: 0.9812 - val_loss: 0.0716
Epoch 16/30
54/54 ━━━━━━━━━━━━━━━━━━━━ 5s 86ms/step - accuracy: 0.9771 - loss: 0.0766 - val_accuracy: 0.9817 - val_loss: 0.0699
Epoch 17/30
54/54 ━━━━━━━━━━━━━━━━━━━━ 5s 93ms/step - accuracy: 0.9787 - loss: 0.0720 - val_accuracy: 0.9785 - val_loss: 0.0735
Epoch 18/30
54/54 ━━━━━━━━━━━━━━━━━━━━ 5s 83ms/step - accuracy: 0.9783 - loss: 0.0716 - val_accuracy: 0.9832 - val_loss: 0.0666
Epoch 19/30
54/54 ━━━━━━━━━━━━━━━━━━━━ 5s 99ms/step - accuracy: 0.9797 - loss: 0.0676 - val_accuracy: 0.9830 - val_loss: 0.0648
Epoch 20/30
54/54 ━━━━━━━━━━━━━━━━━━━━ 5s 83ms/step - accuracy: 0.9806 - loss: 0.0652 - val_accuracy: 0.9825 - val_loss: 0.0673
Epoch 21/30
54/54 ━━━━━━━━━━━━━━━━━━━━ 5s 86ms/step - accuracy: 0.9810 - loss: 0.0654 - val_accuracy: 0.9830 - val_loss: 0.0623
Epoch 22/30
54/54 ━━━━━━━━━━━━━━━━━━━━ 5s 87ms/step - accuracy: 0.9816 - loss: 0.0601 - val_accuracy: 0.9838 - val_loss: 0.0598
Epoch 23/30
54/54 ━━━━━━━━━━━━━━━━━━━━ 5s 83ms/step - accuracy: 0.9828 - loss: 0.0587 - val_accuracy: 0.9845 - val_loss: 0.0615
Epoch 24/30
54/54 ━━━━━━━━━━━━━━━━━━━━ 4s 81ms/step - accuracy: 0.9815 - loss: 0.0574 - val_accuracy: 0.9822 - val_loss: 0.0644
Epoch 25/30
54/54 ━━━━━━━━━━━━━━━━━━━━ 5s 85ms/step - accuracy: 0.9825 - loss: 0.0573 - val_accuracy: 0.9842 - val_loss: 0.0586
Epoch 26/30
54/54 ━━━━━━━━━━━━━━━━━━━━ 5s 84ms/step - accuracy: 0.9829 - loss: 0.0570 - val_accuracy: 0.9843 - val_loss: 0.0575
Epoch 27/30
54/54 ━━━━━━━━━━━━━━━━━━━━ 5s 88ms/step - accuracy: 0.9843 - loss: 0.0522 - val_accuracy: 0.9847 - val_loss: 0.0565
Epoch 28/30
54/54 ━━━━━━━━━━━━━━━━━━━━ 5s 89ms/step - accuracy: 0.9836 - loss: 0.0536 - val_accuracy: 0.9838 - val_loss: 0.0578
Epoch 29/30
54/54 ━━━━━━━━━━━━━━━━━━━━ 5s 101ms/step - accuracy: 0.9851 - loss: 0.0502 - val_accuracy: 0.9852 - val_loss: 0.0562
Epoch 30/30
54/54 ━━━━━━━━━━━━━━━━━━━━ 5s 84ms/step - accuracy: 0.9849 - loss: 0.0491 - val_accuracy: 0.9863 - val_loss: 0.0532
Training took 156.50s
Test loss: 0.050165
Test accuracy: 98.31%
GGUF model saved to 'mnist-cnn-f32.gguf'
(base) yongqiang@yongqiang:~/llm_work/ggml_25_02_15/ggml/examples/mnist$
2.2.2. To evaluate the model on the CPU using GGML
The saved model can be evaluated on the CPU using the mnist-eval
binary:
$ ../../build/bin/mnist-eval mnist-fc-f32.gguf data/MNIST/raw/t10k-images-idx3-ubyte data/MNIST/raw/t10k-labels-idx1-ubyte
________________________________________________________
________________________________________________________
________________________________________________________
________________________________________________________
________________________________________________________
______________________________________##________________
______________________________________##________________
______________________________________##________________
____________________________________##__________________
__________________________________####__________________
__________________________________##____________________
________________________________##______________________
______________________________##________________________
____________________________####________________________
____________________________##__________________________
__________________________##____________________________
________________________##______________________________
______________________##________________________________
____________________####________________________________
____________________##__________________________________
__________________##____________________________________
________________##______________________________________
________________________________________________________
________________________________________________________
________________________________________________________
________________________________________________________
________________________________________________________
________________________________________________________
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
mnist_model: using CUDA0 (NVIDIA GeForce RTX 3090) as primary backend
mnist_model: unsupported operations will be executed on the following fallback backends (in order of priority):
mnist_model: - CPU (AMD Ryzen 9 5950X 16-Core Processor)
mnist_model_init_from_file: loading model weights from 'mnist-cnn-f32.gguf'
mnist_model_init_from_file: model arch is mnist-cnn
mnist_model_init_from_file: successfully loaded weights from mnist-cnn-f32.gguf
main: loaded model in 91.99 ms
mnist_model_eval: model evaluation on 10000 images took 267.61 ms, 26.76 us/image
main: predicted digit is 1
main: test_loss=0.047955+-0.007029
main: test_acc=98.46+-0.12%
(base) yongqiang@yongqiang:~/llm_work/ggml_25_02_15/ggml/examples/mnist$ ../../build/bin/mnist-eval mnist-fc-f32.gguf data/MNIST/raw/t10k-images-idx3-ubyte data/MNIST/raw/t10k-labels-idx1-ubyte
...
mnist_model: using CPU (Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz) as primary backend
mnist_model_init_from_file: loading model weights from 'mnist-fc-f32.gguf'
mnist_model_init_from_file: model arch is mnist-fc
mnist_model_init_from_file: successfully loaded weights from mnist-fc-f32.gguf
main: loaded model in 48.56 ms
mnist_model_eval: model evaluation on 10000 images took 86.95 ms, 8.69 us/image
main: predicted digit is 7
main: test_loss=0.065847+-0.008569
main: test_acc=97.97+-0.14%
(base) yongqiang@yongqiang:~/llm_work/ggml_25_02_15/ggml/examples/mnist$
2.2.3. To train a convolutional model on the CPU using GGML
Like with the fully connected network the convolutional network can also be trained using GGML:
$ ../../build/bin/mnist-train mnist-cnn mnist-cnn-f32.gguf data/MNIST/raw/train-images-idx3-ubyte data/MNIST/raw/train-labels-idx1-ubyte
(base) yongqiang@yongqiang:~/llm_work/ggml_25_02_15/ggml/examples/mnist$ ../../build/bin/mnist-train mnist-cnn mnist-cnn-f32.gguf data/MNIST/raw/train-images-idx3-ubyte data/MNIST/raw/train-labels-idx1-ubyte
mnist_model: using CPU (Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz) as primary backend
ggml_opt_fit: epoch 0001/0030:
train: [=========================| data=057000/057000, loss=2.181899+-0.015151, accuracy=29.55+-0.19%, t=00:00:17, ETA=00:00:00]
val: [=========================| data=003000/003000, loss=1.672272+-0.005185, accuracy=74.50+-0.80%, t=00:00:00, ETA=00:00:00]
ggml_opt_fit: epoch 0002/0030:
train: [=========================| data=057000/057000, loss=0.815365+-0.034044, accuracy=82.48+-0.16%, t=00:00:17, ETA=00:00:00]
val: [=========================| data=003000/003000, loss=0.450107+-0.007070, accuracy=86.77+-0.62%, t=00:00:00, ETA=00:00:00]
ggml_opt_fit: epoch 0003/0030:
train: [=========================| data=057000/057000, loss=0.365206+-0.004038, accuracy=89.72+-0.13%, t=00:00:17, ETA=00:00:00]
val: [=========================| data=003000/003000, loss=0.320627+-0.004686, accuracy=91.50+-0.51%, t=00:00:00, ETA=00:00:00]
ggml_opt_fit: epoch 0004/0030:
train: [=========================| data=057000/057000, loss=0.286422+-0.003434, accuracy=91.89+-0.11%, t=00:00:18, ETA=00:00:00]
val: [=========================| data=003000/003000, loss=0.267885+-0.004228, accuracy=92.37+-0.48%, t=00:00:00, ETA=00:00:00]
ggml_opt_fit: epoch 0005/0030:
train: [=========================| data=057000/057000, loss=0.242734+-0.002871, accuracy=93.12+-0.11%, t=00:00:18, ETA=00:00:00]
val: [=========================| data=003000/003000, loss=0.230954+-0.004311, accuracy=93.60+-0.45%, t=00:00:00, ETA=00:00:00]
ggml_opt_fit: epoch 0006/0030:
train: [=========================| data=057000/057000, loss=0.211461+-0.002635, accuracy=94.13+-0.10%, t=00:00:18, ETA=00:00:00]
val: [=========================| data=003000/003000, loss=0.203311+-0.004524, accuracy=94.37+-0.42%, t=00:00:00, ETA=00:00:00]
ggml_opt_fit: epoch 0007/0030:
train: [=========================| data=057000/057000, loss=0.187291+-0.002687, accuracy=94.71+-0.09%, t=00:00:18, ETA=00:00:00]
val: [=========================| data=003000/003000, loss=0.176855+-0.004875, accuracy=94.93+-0.40%, t=00:00:00, ETA=00:00:00]
ggml_opt_fit: epoch 0008/0030:
train: [=========================| data=057000/057000, loss=0.166853+-0.002464, accuracy=95.31+-0.09%, t=00:00:19, ETA=00:00:00]
val: [=========================| data=003000/003000, loss=0.162126+-0.005521, accuracy=95.20+-0.39%, t=00:00:00, ETA=00:00:00]
ggml_opt_fit: epoch 0009/0030:
train: [=========================| data=057000/057000, loss=0.154119+-0.002628, accuracy=95.56+-0.09%, t=00:00:18, ETA=00:00:00]
val: [=========================| data=003000/003000, loss=0.149167+-0.004722, accuracy=95.47+-0.38%, t=00:00:00, ETA=00:00:00]
ggml_opt_fit: epoch 0010/0030:
train: [=========================| data=057000/057000, loss=0.141465+-0.002342, accuracy=95.97+-0.08%, t=00:00:19, ETA=00:00:00]
val: [=========================| data=003000/003000, loss=0.139933+-0.005049, accuracy=96.10+-0.35%, t=00:00:00, ETA=00:00:00]
ggml_opt_fit: epoch 0011/0030:
train: [=========================| data=057000/057000, loss=0.131444+-0.002324, accuracy=96.17+-0.08%, t=00:00:19, ETA=00:00:00]
val: [=========================| data=003000/003000, loss=0.127554+-0.004995, accuracy=96.30+-0.34%, t=00:00:00, ETA=00:00:00]
ggml_opt_fit: epoch 0012/0030:
train: [=========================| data=057000/057000, loss=0.122645+-0.002175, accuracy=96.49+-0.08%, t=00:00:18, ETA=00:00:00]
val: [=========================| data=003000/003000, loss=0.122382+-0.005499, accuracy=96.70+-0.33%, t=00:00:00, ETA=00:00:00]
ggml_opt_fit: epoch 0013/0030:
train: [=========================| data=057000/057000, loss=0.115014+-0.002062, accuracy=96.65+-0.08%, t=00:00:19, ETA=00:00:00]
val: [=========================| data=003000/003000, loss=0.115404+-0.004303, accuracy=96.60+-0.33%, t=00:00:00, ETA=00:00:00]
ggml_opt_fit: epoch 0014/0030:
train: [=========================| data=057000/057000, loss=0.108763+-0.002293, accuracy=96.88+-0.07%, t=00:00:19, ETA=00:00:00]
val: [=========================| data=003000/003000, loss=0.113306+-0.004818, accuracy=96.63+-0.33%, t=00:00:00, ETA=00:00:00]
ggml_opt_fit: epoch 0015/0030:
train: [=========================| data=057000/057000, loss=0.103556+-0.002066, accuracy=97.01+-0.07%, t=00:00:19, ETA=00:00:00]
val: [=========================| data=003000/003000, loss=0.107781+-0.006073, accuracy=96.83+-0.32%, t=00:00:00, ETA=00:00:00]
ggml_opt_fit: epoch 0016/0030:
train: [=========================| data=057000/057000, loss=0.099061+-0.002315, accuracy=97.15+-0.07%, t=00:00:20, ETA=00:00:00]
val: [=========================| data=003000/003000, loss=0.102977+-0.004760, accuracy=97.00+-0.31%, t=00:00:00, ETA=00:00:00]
ggml_opt_fit: epoch 0017/0030:
train: [=========================| data=057000/057000, loss=0.095613+-0.001962, accuracy=97.26+-0.07%, t=00:00:21, ETA=00:00:00]
val: [=========================| data=003000/003000, loss=0.098364+-0.003851, accuracy=96.97+-0.31%, t=00:00:00, ETA=00:00:00]
ggml_opt_fit: epoch 0018/0030:
train: [=========================| data=057000/057000, loss=0.091659+-0.001890, accuracy=97.29+-0.07%, t=00:00:23, ETA=00:00:00]
val: [=========================| data=003000/003000, loss=0.096844+-0.004442, accuracy=97.03+-0.31%, t=00:00:00, ETA=00:00:00]
ggml_opt_fit: epoch 0019/0030:
train: [=========================| data=057000/057000, loss=0.088625+-0.002074, accuracy=97.44+-0.07%, t=00:00:21, ETA=00:00:00]
val: [=========================| data=003000/003000, loss=0.093172+-0.004085, accuracy=97.17+-0.30%, t=00:00:00, ETA=00:00:00]
ggml_opt_fit: epoch 0020/0030:
train: [=========================| data=057000/057000, loss=0.086569+-0.001856, accuracy=97.41+-0.07%, t=00:00:20, ETA=00:00:00]
val: [=========================| data=003000/003000, loss=0.090506+-0.004352, accuracy=97.47+-0.29%, t=00:00:00, ETA=00:00:00]
ggml_opt_fit: epoch 0021/0030:
train: [=========================| data=057000/057000, loss=0.082892+-0.001994, accuracy=97.62+-0.06%, t=00:00:22, ETA=00:00:00]
val: [=========================| data=003000/003000, loss=0.090384+-0.005110, accuracy=97.43+-0.29%, t=00:00:00, ETA=00:00:00]
ggml_opt_fit: epoch 0022/0030:
train: [=========================| data=057000/057000, loss=0.081264+-0.001867, accuracy=97.64+-0.06%, t=00:00:20, ETA=00:00:00]
val: [=========================| data=003000/003000, loss=0.086740+-0.004202, accuracy=97.53+-0.28%, t=00:00:00, ETA=00:00:00]
ggml_opt_fit: epoch 0023/0030:
train: [=========================| data=057000/057000, loss=0.078324+-0.001870, accuracy=97.71+-0.06%, t=00:00:20, ETA=00:00:00]
val: [=========================| data=003000/003000, loss=0.084126+-0.004497, accuracy=97.63+-0.28%, t=00:00:00, ETA=00:00:00]
ggml_opt_fit: epoch 0024/0030:
train: [=========================| data=057000/057000, loss=0.075900+-0.001737, accuracy=97.76+-0.06%, t=00:00:21, ETA=00:00:00]
val: [=========================| data=003000/003000, loss=0.081060+-0.004879, accuracy=97.57+-0.28%, t=00:00:00, ETA=00:00:00]
ggml_opt_fit: epoch 0025/0030:
train: [=========================| data=057000/057000, loss=0.074312+-0.001811, accuracy=97.84+-0.06%, t=00:00:22, ETA=00:00:00]
val: [=========================| data=003000/003000, loss=0.082016+-0.004447, accuracy=97.93+-0.26%, t=00:00:00, ETA=00:00:00]
ggml_opt_fit: epoch 0026/0030:
train: [=========================| data=057000/057000, loss=0.074381+-0.001647, accuracy=97.81+-0.06%, t=00:00:20, ETA=00:00:00]
val: [=========================| data=003000/003000, loss=0.080011+-0.004273, accuracy=98.10+-0.25%, t=00:00:00, ETA=00:00:00]
ggml_opt_fit: epoch 0027/0030:
train: [=========================| data=057000/057000, loss=0.070492+-0.001811, accuracy=97.89+-0.06%, t=00:00:18, ETA=00:00:00]
val: [=========================| data=003000/003000, loss=0.077424+-0.005114, accuracy=97.90+-0.26%, t=00:00:00, ETA=00:00:00]
ggml_opt_fit: epoch 0028/0030:
train: [=========================| data=057000/057000, loss=0.068885+-0.001675, accuracy=97.97+-0.06%, t=00:00:18, ETA=00:00:00]
val: [=========================| data=003000/003000, loss=0.078224+-0.005973, accuracy=97.90+-0.26%, t=00:00:00, ETA=00:00:00]
ggml_opt_fit: epoch 0029/0030:
train: [=========================| data=057000/057000, loss=0.067410+-0.001954, accuracy=98.01+-0.06%, t=00:00:18, ETA=00:00:00]
val: [=========================| data=003000/003000, loss=0.077414+-0.005510, accuracy=98.00+-0.26%, t=00:00:00, ETA=00:00:00]
ggml_opt_fit: epoch 0030/0030:
train: [=========================| data=057000/057000, loss=0.066525+-0.001795, accuracy=98.03+-0.06%, t=00:00:18, ETA=00:00:00]
val: [=========================| data=003000/003000, loss=0.075325+-0.004926, accuracy=97.87+-0.26%, t=00:00:00, ETA=00:00:00]
ggml_opt_fit: training took 00:10:03
mnist_model_save: saving model to 'mnist-cnn-f32.gguf'
(base) yongqiang@yongqiang:~/llm_work/ggml_25_02_15/ggml/examples/mnist$
As always, the evaluation is done using mnist-eval
and like with the fully connected network the GGML graph is exported to mnist-cnn-f32.ggml
.
2.3. Hardware Acceleration
Both the training and evaluation code is agnostic in terms of hardware as long as the corresponding GGML backend has implemented the necessary operations.
A specific backend can be selected by appending the above commands with a backend name.
The compute graphs then schedule the operations to preferentially use the specified backend.
Note that if a backend does not implement some of the necessary operations a CPU fallback is used instead which may result in bad performance.
References
[1] Yongqiang Cheng, https://yongqiang.blog.csdn.net/