日撸 Java 三百行(81-90天,CNN 卷积神经网络)

目录

总述
01-10天,基本语法
11-20天,线性数据结构
21-30天,树与二叉树
31-40天,图
41-50天,查找与排序
51-60天,kNN 与 NB
61-70天,决策树与集成学习
71-80天,BP 神经网络
81-90天,CNN 卷积神经网络

最初的代码来自 https://github.com/DeepCompute/cnn. 我曾经在 深度学习基础技术分析4:CNN(含代码分析) 写了一些代码分析, 并在 深度学习C++代码配套教程(5. CNN 卷积神经网络) 将其转成了 C++ 版本. 这里再将某些代码进行修改 (特别是变量名、注释, 并精减掉了并行计算功能), 搞成自己的版本.
数据 train.format 也可以在 https://github.com/FanSmale/sampledata/ 下载.

第 81 天: 数据集读取与存储

以前我们用过 arff (第 51 天) 文件和压缩的评分文件 (第 54 天), 再来一个存储数据的方式. 图片数据还是按结构化的方式来存取 (m*n 点阵和类别).

  1. 这里使用了 java.util.List 类. 我们在前面实现的数据结构, 很多可以直接在 java.util 包中找到. 不过自己写一遍总是好的. 与自己读取并使用 double[][] 来管理数据相比, List 类允许添加数据, 更加灵活. 当然, 效率上会有点影响, 可能是常数倍.
  2. tempLine.split 是我们以前没有使用过的功能. 其实 String 类是的方法是比较丰富的, 为了防止开发者乱改, 该类是 final 的, 不允许继承.
  3. 这个代码的原作者很喜欢用 final, 包括参数列表中的变量, 以避免其值被修改. 我喜欢将它们取掉. 其实没啥区别.
  4. List<Instance> 是指列表里面只准存储 Instance 类型的变量. 当然, Instance 的子类也行.
  5. Instance 类也是 public 的, 其实我对访问控制也不是很清楚, 编译不出错就还好.
  6. Instance 的 label 是 Double 类型的, 其实换成 double 也可以, 如果已经确定是分类问题, 换成 int 更好.
  7. 最长的就是构造函数, 从文件中读入数据.
package machinelearning.cnn;

import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;

/**
 * Manage the dataset.
 * 
 * @author Fan Min minfanphd@163.com.
 */
public class Dataset {

	/**
	 * All instances organized by a list.
	 */
	private List<Instance> instances;

	/**
	 * The label index.
	 */
	private int labelIndex;

	/**
	 * The max label (label start from 0).
	 */
	private double maxLabel = -1;

	/**
	 *********************** 
	 * The first constructor.
	 *********************** 
	 */
	public Dataset() {
		labelIndex = -1;
		instances = new ArrayList<Instance>();
	}// Of the first constructor

	/**
	 *********************** 
	 * The second constructor.
	 * 
	 * @param paraFilename
	 *            The filename.
	 * @param paraSplitSign
	 *            Often comma.
	 * @param paraLabelIndex
	 *            Often the last column.
	 *********************** 
	 */
	public Dataset(String paraFilename, String paraSplitSign, int paraLabelIndex) {
		instances = new ArrayList<Instance>();
		labelIndex = paraLabelIndex;

		File tempFile = new File(paraFilename);
		try {
			BufferedReader tempReader = new BufferedReader(new FileReader(tempFile));
			String tempLine;
			while ((tempLine = tempReader.readLine()) != null) {
				String[] tempDatum = tempLine.split(paraSplitSign);
				if (tempDatum.length == 0) {
					continue;
				} // Of if

				double[] tempData = new double[tempDatum.length];
				for (int i = 0; i < tempDatum.length; i++)
					tempData[i] = Double.parseDouble(tempDatum[i]);
				Instance tempInstance = new Instance(tempData);
				append(tempInstance);
			} // Of while
			tempReader.close();
		} catch (IOException e) {
			e.printStackTrace();
			System.out.println("Unable to load " + paraFilename);
			System.exit(0);
		}//Of try
	}// Of the second constructor

	/**
	 *********************** 
	 * Append an instance.
	 * 
	 * @param paraInstance
	 *            The given record.
	 *********************** 
	 */
	public void append(Instance paraInstance) {
		instances.add(paraInstance);
	}// Of append

	/**
	 *********************** 
	 * Append an instance  specified by double values.
	 *********************** 
	 */
	public void append(double[] paraAttributes, Double paraLabel) {
		instances.add(new Instance(paraAttributes, paraLabel));
	}// Of append

	/**
	 *********************** 
	 * Getter.
	 *********************** 
	 */
	public Instance getInstance(int paraIndex) {
		return instances.get(paraIndex);
	}// Of getInstance

	/**
	 *********************** 
	 * Getter.
	 *********************** 
	 */
	public int size() {
		return instances.size();
	}// Of size

	/**
	 *********************** 
	 * Getter.
	 *********************** 
	 */
	public double[] getAttributes(int paraIndex) {
		return instances.get(paraIndex).getAttributes();
	}// Of getAttrs

	/**
	 *********************** 
	 * Getter.
	 *********************** 
	 */
	public Double getLabel(int paraIndex) {
		return instances.get(paraIndex).getLabel();
	}// Of getLabel

	/**
	 *********************** 
	 * Unit test.
	 *********************** 
	 */
	public static void main(String args[]) {
		Dataset tempData = new Dataset("d:/c/cann/data/mnist/train.format", ",", 784);
		Instance tempInstance = tempData.getInstance(0);
		System.out.println("The first instance is: " + tempInstance);
	}// Of main

	/**
	 *********************** 
	 * An instance.
	 *********************** 
	 */
	public class Instance {
		/**
		 * Conditional attributes.
		 */
		private double[] attributes;

		/**
		 * Label.
		 */
		private Double label;

		/**
		 *********************** 
		 * The first constructor.
		 *********************** 
		 */
		private Instance(double[] paraAttrs, Double paraLabel) {
			attributes = paraAttrs;
			label = paraLabel;
		}//Of the first constructor

		/**
		 *********************** 
		 * The second constructor.
		 *********************** 
		 */
		public Instance(double[] paraData) {
			if (labelIndex == -1)
				// No label
				attributes = paraData;
			else {
				label = paraData[labelIndex];
				if (label > maxLabel) {
					// It is a new label
					maxLabel = label;
				} // Of if

				if (labelIndex == 0) {
					// The first column is the label
					attributes = Arrays.copyOfRange(paraData, 1, paraData.length);
				} else {
					// The last column is the label
					attributes = Arrays.copyOfRange(paraData, 0, paraData.length - 1);
				} // Of if
			} // Of if
		}// Of the second constructor

		/**
		 *********************** 
		 * Getter.
		 *********************** 
		 */
		public double[] getAttributes() {
			return attributes;
		}// Of getAttributes

		/**
		 *********************** 
		 * Getter.
		 *********************** 
		 */
		public Double getLabel() {
			if (labelIndex == -1)
				return null;
			return label;
		}// Of getLabel

		/**
		 *********************** 
		 * toString.
		 *********************** 
		 */
		public String toString(){
			return Arrays.toString(attributes) + ", " + label;
		}//Of toString
	}// Of class Instance
}// Of class Dataset

一个管理卷积核尺寸的类. 基础代码, 在网络运行时才能理解它们的作用.

  1. 支持 Size 的相除. 但并不是所有的 Size 都可以除, 方法前面给了例子.
  2. 支持 Size 的相减.
package machinelearning.cnn;

/**
 * The size of a convolution core.
 * 
 * @author Fan Min minfanphd@163.com.
 */
public class Size {
	/**
	 * Cannot be changed after initialization.
	 */
	public final int width;

	/**
	 * Cannot be changed after initialization.
	 */
	public final int height;

	/**
	 *********************** 
	 * The first constructor.
	 * 
	 * @param paraWidth
	 *            The given width.
	 * @param paraHeight
	 *            The given height.
	 *********************** 
	 */
	public Size(int paraWidth, int paraHeight) {
		width = paraWidth;
		height = paraHeight;
	}// Of the first constructor

	/**
	 *********************** 
	 * Divide a scale with another one. For example (4, 12) / (2, 3) = (2, 4).
	 * 
	 * @param paraScaleSize
	 *            The given scale size.
	 * @return The new size.
	 *********************** 
	 */
	public Size divide(Size paraScaleSize) {
		int resultWidth = width / paraScaleSize.width;
		int resultHeight = height / paraScaleSize.height;
		if (resultWidth * paraScaleSize.width != width
				|| resultHeight * paraScaleSize.height != height)
			throw new RuntimeException("Unable to divide " + this + " with " + paraScaleSize);
		return new Size(resultWidth, resultHeight);
	}// Of divide

	/**
	 *********************** 
	 * Subtract a scale with another one, and add a value. For example (4, 12) -
	 * (2, 3) + 1 = (3, 10).
	 * 
	 * @param paraScaleSize
	 *            The given scale size.
	 * @param paraAppend
	 *            The appended size to both dimensions.
	 * @return The new size.
	 *********************** 
	 */
	public Size subtract(Size paraScaleSize, int paraAppend) {
		int resultWidth = width - paraScaleSize.width + paraAppend;
		int resultHeight = height - paraScaleSize.height + paraAppend;
		return new Size(resultWidth, resultHeight);
	}// Of subtract

	/**
	 *********************** 
	 * @param The
	 *            string showing itself.
	 *********************** 
	 */
	public String toString() {
		String resultString = "(" + width + ", " + height + ")";
		return resultString;
	}// Of toString

	/**
	 *********************** 
	 * Unit test.
	 *********************** 
	 */
	public static void main(String[] args) {
		Size tempSize1 = new Size(4, 6);
		Size tempSize2 = new Size(2, 2);
		System.out.println(
				"" + tempSize1 + " divide " + tempSize2 + " = " + tempSize1.divide(tempSize2));

		System.out.printf("a");

		try {
			System.out.println(
					"" + tempSize2 + " divide " + tempSize1 + " = " + tempSize2.divide(tempSize1));
		} catch (Exception ee) {
			System.out.println(ee);
		} // Of try

		System.out.println(
				"" + tempSize1 + " - " + tempSize2 + " + 1 = " + tempSize1.subtract(tempSize2, 1));
	}// Of main
}// Of class Size

以前我们使用整数型常量 (第 51 天) 和字符型常量 (第 74 天), 其实还可以有枚举类型. 后面的程序我们才能看到其用法.

package machinelearning.cnn;

/**
 * Enumerate all layer types.
 * 
 * @author Fan Min minfanphd@163.com.
 */
public enum LayerTypeEnum {
	INPUT, CONVOLUTION, SAMPLING, OUTPUT;
}//Of enum LayerTypeEnum

第 82 天: 数学操作

把一些常用的数学操作, 特别是矩阵操作集中在这里管理. 今天要找贴子学习一下 CNN 的原理, 否则很多操作都看不懂.

  1. interface Operator 定义了一个算子, 其主要目的是为了矩阵操作时对每个元素都做一遍, 所以要看 matrixOp 方法, 以及相应的调用才能明白其作用. 这种算子的写法比较绕, 其优点是灵活, 可以增加代码的复用性. 以 Operator 类型的变量 one_value 为例, 其最终目的是获得 1 − A \mathbf{1} - \mathbf{A} 1A 这种矩阵, 即用 1 减去矩阵的各个元素, 获得新的矩阵.
  2. interface OperatorOnTwo 与上一个类似, 不过它支持两个操作数, 进一步支持两个矩阵, 这样, 矩阵加法、减法就不需要单独写代码了.
  3. matrixOp 被重载了以支持不同的参数列表.
  4. rot180 将矩阵放置 180 度. 通过两次翻转实现.
  5. convnValid 是卷积操作. convnFull 为其逆向操作.
  6. scaleMatrix 是均值池化.
  7. kronecker 是池化的逆向操作.
package machinelearning.cnn;

import java.io.Serializable;
import java.util.Arrays;
import java.util.HashSet;
import java.util.Random;
import java.util.Set;

/**
 * Math operations.
 * 
 * Adopted from cnn-master.
 */
public class MathUtils {

	/**
	 * An interface for different on-demand operators.
	 */
	public interface Operator extends Serializable {
		public double process(double value);
	}// Of interfact Operator

	/**
	 * The one-minus-the-value operator.
	 */
	public static final Operator one_value = new Operator() {
		private static final long serialVersionUID = 3752139491940330714L;

		@Override
		public double process(double value) {
			return 1 - value;
		}// Of process
	};

	/**
	 * The sigmoid operator.
	 */
	public static final Operator sigmoid = new Operator() {
		private static final long serialVersionUID = -1952718905019847589L;

		@Override
		public double process(double value) {
			return 1 / (1 + Math.pow(Math.E, -value));
		}// Of process
	};

	/**
	 * An interface for operations with two operators.
	 */
	interface OperatorOnTwo extends Serializable {
		public double process(double a, double b);
	}// Of interface OperatorOnTwo

	/**
	 * Plus.
	 */
	public static final OperatorOnTwo plus = new OperatorOnTwo() {
		private static final long serialVersionUID = -6298144029766839945L;

		@Override
		public double process(double a, double b) {
			return a + b;
		}// Of process
	};

	/**
	 * Multiply.
	 */
	public static OperatorOnTwo multiply = new OperatorOnTwo() {

		private static final long serialVersionUID = -7053767821858820698L;

		@Override
		public double process(double a, double b) {
			return a * b;
		}// Of process
	};

	/**
	 * Minus.
	 */
	public static OperatorOnTwo minus = new OperatorOnTwo() {

		private static final long serialVersionUID = 7346065545555093912L;

		@Override
		public double process(double a, double b) {
			return a - b;
		}// Of process
	};

	/**
	 *********************** 
	 * Print a matrix
	 *********************** 
	 */
	public static void printMatrix(double[][] matrix) {
		for (int i = 0; i < matrix.length; i++) {
			String line = Arrays.toString(matrix[i]);
			line = line.replaceAll(", ", "\t");
			System.out.println(line);
		} // Of for i
		System.out.println();
	}// Of printMatrix

	/**
	 *********************** 
	 * Rotate the matrix 180 degrees.
	 *********************** 
	 */
	public static double[][] rot180(double[][] matrix) {
		matrix = cloneMatrix(matrix);
		int m = matrix.length;
		int n = matrix[0].length;
		for (int i = 0; i < m; i++) {
			for (int j = 0; j < n / 2; j++) {
				double tmp = matrix[i][j];
				matrix[i][j] = matrix[i][n - 1 - j];
				matrix[i][n - 1 - j] = tmp;
			}
		}
		for (int j = 0; j < n; j++) {
			for (int i = 0; i < m / 2; i++) {
				double tmp = matrix[i][j];
				matrix[i][j] = matrix[m - 1 - i][j];
				matrix[m - 1 - i][j] = tmp;
			}
		}
		return matrix;
	}// Of rot180

	private static Random myRandom = new Random(2);

	/**
	 *********************** 
	 * Generate a random matrix with the given size. Each value takes value in
	 * [-0.005, 0.095].
	 *********************** 
	 */
	public static double[][] randomMatrix(int x, int y, boolean b) {
		double[][] matrix = new double[x][y];
		// int tag = 1;
		for (int i = 0; i < x; i++) {
			for (int j = 0; j < y; j++) {
				matrix[i][j] = (myRandom.nextDouble() - 0.05) / 10;
			} // Of for j
		} // Of for i
		return matrix;
	}// Of randomMatrix

	/**
	 *********************** 
	 * Generate a random array with the given length. Each value takes value in
	 * [-0.005, 0.095].
	 *********************** 
	 */
	public static double[] randomArray(int len) {
		double[] data = new double[len];
		for (int i = 0; i < len; i++) {
			//data[i] = myRandom.nextDouble() / 10 - 0.05;
			data[i] = 0;
		} // Of for i
		return data;
	}// Of randomArray

	/**
	 *********************** 
	 * Generate a random perm with the batch size.
	 *********************** 
	 */
	public static int[] randomPerm(int size, int batchSize) {
		Set<Integer> set = new HashSet<Integer>();
		while (set.size() < batchSize) {
			set.add(myRandom.nextInt(size));
		}
		int[] randPerm = new int[batchSize];
		int i = 0;
		for (Integer value : set)
			randPerm[i++] = value;
		return randPerm;
	}// Of randomPerm

	/**
	 *********************** 
	 * Clone a matrix. Do not use it reference directly.
	 *********************** 
	 */
	public static double[][] cloneMatrix(final double[][] matrix) {
		final int m = matrix.length;
		int n = matrix[0].length;
		final double[][] outMatrix = new double[m][n];

		for (int i = 0; i < m; i++) {
			for (int j = 0; j < n; j++) {
				outMatrix[i][j] = matrix[i][j];
			} // Of for j
		} // Of for i
		return outMatrix;
	}// Of cloneMatrix

	/**
	 *********************** 
	 * Matrix operation with the given operator on single operand.
	 *********************** 
	 */
	public static double[][] matrixOp(final double[][] ma, Operator operator) {
		final int m = ma.length;
		int n = ma[0].length;
		for (int i = 0; i < m; i++) {
			for (int j = 0; j < n; j++) {
				ma[i][j] = operator.process(ma[i][j]);
			} // Of for j
		} // Of for i
		return ma;
	}// Of matrixOp

	/**
	 *********************** 
	 * Matrix operation with the given operator on two operands.
	 *********************** 
	 */
	public static double[][] matrixOp(final double[][] ma, final double[][] mb,
			final Operator operatorA, final Operator operatorB, OperatorOnTwo operator) {
		final int m = ma.length;
		int n = ma[0].length;
		if (m != mb.length || n != mb[0].length)
			throw new RuntimeException("ma.length:" + ma.length + "  mb.length:" + mb.length);

		for (int i = 0; i < m; i++) {
			for (int j = 0; j < n; j++) {
				double a = ma[i][j];
				if (operatorA != null)
					a = operatorA.process(a);
				double b = mb[i][j];
				if (operatorB != null)
					b = operatorB.process(b);
				mb[i][j] = operator.process(a, b);
			} // Of for j
		} // Of for i
		return mb;
	}// Of matrixOp

	/**
	 *********************** 
	 * Extend the matrix to a bigger one (a number of times).
	 *********************** 
	 */
	public static double[][] kronecker(final double[][] matrix, final Size scale) {
		final int m = matrix.length;
		int n = matrix[0].length;
		final double[][] outMatrix = new double[m * scale.width][n * scale.height];

		for (int i = 0; i < m; i++) {
			for (int j = 0; j < n; j++) {
				for (int ki = i * scale.width; ki < (i + 1) * scale.width; ki++) {
					for (int kj = j * scale.height; kj < (j + 1) * scale.height; kj++) {
						outMatrix[ki][kj] = matrix[i][j];
					}
				}
			}
		}
		return outMatrix;
	}// Of kronecker

	/**
	 *********************** 
	 * Scale the matrix.
	 *********************** 
	 */
	public static double[][] scaleMatrix(final double[][] matrix, final Size scale) {
		int m = matrix.length;
		int n = matrix[0].length;
		final int sm = m / scale.width;
		final int sn = n / scale.height;
		final double[][] outMatrix = new double[sm][sn];
		if (sm * scale.width != m || sn * scale.height != n)
			throw new RuntimeException("scale matrix");
		final int size = scale.width * scale.height;
		for (int i = 0; i < sm; i++) {
			for (int j = 0; j < sn; j++) {
				double sum = 0.0;
				for (int si = i * scale.width; si < (i + 1) * scale.width; si++) {
					for (int sj = j * scale.height; sj < (j + 1) * scale.height; sj++) {
						sum += matrix[si][sj];
					} // Of for sj
				} // Of for si
				outMatrix[i][j] = sum / size;
			} // Of for j
		} // Of for i
		return outMatrix;
	}// Of scaleMatrix

	/**
	 *********************** 
	 * Convolution full to obtain a bigger size. It is used in back-propagation.
	 *********************** 
	 */
	public static double[][] convnFull(double[][] matrix, final double[][] kernel) {
		int m = matrix.length;
		int n = matrix[0].length;
		final int km = kernel.length;
		final int kn = kernel[0].length;
		final double[][] extendMatrix = new double[m + 2 * (km - 1)][n + 2 * (kn - 1)];
		for (int i = 0; i < m; i++) {
			for (int j = 0; j < n; j++) {
				extendMatrix[i + km - 1][j + kn - 1] = matrix[i][j];
			} // Of for j
		} // Of for i
		return convnValid(extendMatrix, kernel);
	}// Of convnFull

	/**
	 *********************** 
	 * Convolution operation, from a given matrix and a kernel, sliding and sum
	 * to obtain the result matrix. It is used in forward.
	 *********************** 
	 */
	public static double[][] convnValid(final double[][] matrix, double[][] kernel) {
		// kernel = rot180(kernel);
		int m = matrix.length;
		int n = matrix[0].length;
		final int km = kernel.length;
		final int kn = kernel[0].length;
		int kns = n - kn + 1;
		final int kms = m - km + 1;
		final double[][] outMatrix = new double[kms][kns];

		for (int i = 0; i < kms; i++) {
			for (int j = 0; j < kns; j++) {
				double sum = 0.0;
				for (int ki = 0; ki < km; ki++) {
					for (int kj = 0; kj < kn; kj++)
						sum += matrix[i + ki][j + kj] * kernel[ki][kj];
				}
				outMatrix[i][j] = sum;

			}
		}
		return outMatrix;
	}// Of convnValid

	/**
	 *********************** 
	 * Convolution on a tensor.
	 *********************** 
	 */
	public static double[][] convnValid(final double[][][][] matrix, int mapNoX,
			double[][][][] kernel, int mapNoY) {
		int m = matrix.length;
		int n = matrix[0][mapNoX].length;
		int h = matrix[0][mapNoX][0].length;
		int km = kernel.length;
		int kn = kernel[0][mapNoY].length;
		int kh = kernel[0][mapNoY][0].length;
		int kms = m - km + 1;
		int kns = n - kn + 1;
		int khs = h - kh + 1;
		if (matrix.length != kernel.length)
			throw new RuntimeException("length");
		final double[][][] outMatrix = new double[kms][kns][khs];
		for (int i = 0; i < kms; i++) {
			for (int j = 0; j < kns; j++)
				for (int k = 0; k < khs; k++) {
					double sum = 0.0;
					for (int ki = 0; ki < km; ki++) {
						for (int kj = 0; kj < kn; kj++)
							for (int kk = 0; kk < kh; kk++) {
								sum += matrix[i + ki][mapNoX][j + kj][k + kk]
										* kernel[ki][mapNoY][kj][kk];
							}
					}
					outMatrix[i][j][k] = sum;
				}
		}
		return outMatrix[0];
	}// Of convnValid

	/**
	 *********************** 
	 * The sigmod operation.
	 *********************** 
	 */
	public static double sigmod(double x) {
		return 1 / (1 + Math.pow(Math.E, -x));
	}// Of sigmod

	/**
	 *********************** 
	 * Sum all values of a matrix.
	 *********************** 
	 */
	public static double sum(double[][] error) {
		int m = error.length;
		int n = error[0].length;
		double sum = 0.0;
		for (int i = 0; i < m; i++) {
			for (int j = 0; j < n; j++) {
				sum += error[i][j];
			}
		}
		return sum;
	}// Of sum

	/**
	 *********************** 
	 * Ad hoc sum.
	 *********************** 
	 */
	public static double[][] sum(double[][][][] errors, int j) {
		int m = errors[0][j].length;
		int n = errors[0][j][0].length;
		double[][] result = new double[m][n];
		for (int mi = 0; mi < m; mi++) {
			for (int nj = 0; nj < n; nj++) {
				double sum = 0;
				for (int i = 0; i < errors.length; i++)
					sum += errors[i][j][mi][nj];
				result[mi][nj] = sum;
			}
		}
		return result;
	}// Of sum

	/**
	 *********************** 
	 * Get the index of the maximal value for the final classification.
	 *********************** 
	 */
	public static int getMaxIndex(double[] out) {
		double max = out[0];
		int index = 0;
		for (int i = 1; i < out.length; i++)
			if (out[i] > max) {
				max = out[i];
				index = i;
			}
		return index;
	}// Of getMaxIndex
}// Of MathUtils

第 83 天: 数学操作 (续)

昨天的操作比较多, 今天可以自己写些代码来测试下. 例如, 测试用例最好是 8*8 之内的矩阵, 这样比较容易验证正确性.

第 84 天: 网络结构与参数

先看单层的数据.

  1. 对于多维数组, 需要理解每个维度的意义.
  2. 不同类型的层, 初始化操作不同.
  3. 与 ANN 的代码风格不同, 主要操作并未在这里写. 原因在于: 这些操作可能涉及到前一层或后一层的数据. 当然, 如果你愿意, 也可以把主要操作的代码移到本类, 调拭的工作量应该比较大.
package machinelearning.cnn;

/**
 * One layer, support all four layer types. The code mainly initializes, gets,
 * and sets variables. Essentially no algorithm is implemented.
 *
 * @author Fan Min minfanphd@163.com.
 */
public class CnnLayer {
	/**
	 * The type of the layer.
	 */
	LayerTypeEnum type;

	/**
	 * The number of out map.
	 */
	int outMapNum;

	/**
	 * The map size.
	 */
	Size mapSize;

	/**
	 * The kernel size.
	 */
	Size kernelSize;

	/**
	 * The scale size.
	 */
	Size scaleSize;

	/**
	 * The index of the class (label) attribute.
	 */
	int classNum = -1;

	/**
	 * Kernel. Dimensions: [front map][out map][width][height].
	 */
	private double[][][][] kernel;

	/**
	 * Bias. The length is outMapNum.
	 */
	private double[] bias;

	/**
	 * Out maps. Dimensions:
	 * [batchSize][outMapNum][mapSize.width][mapSize.height].
	 */
	private double[][][][] outMaps;

	/**
	 * Errors.
	 */
	private double[][][][] errors;

	/**
	 * For batch processing.
	 */
	private static int recordInBatch = 0;

	/**
	 *********************** 
	 * The first constructor.
	 * 
	 * @param paraNum
	 *            When the type is CONVOLUTION, it is the out map number. when
	 *            the type is OUTPUT, it is the class number.
	 * @param paraSize
	 *            When the type is INPUT, it is the map size; when the type is
	 *            CONVOLUTION, it is the kernel size; when the type is SAMPLING,
	 *            it is the scale size.
	 *********************** 
	 */
	public CnnLayer(LayerTypeEnum paraType, int paraNum, Size paraSize) {
		type = paraType;
		switch (type) {
		case INPUT:
			outMapNum = 1;
			mapSize = paraSize; // No deep copy.
			break;
		case CONVOLUTION:
			outMapNum = paraNum;
			kernelSize = paraSize;
			break;
		case SAMPLING:
			scaleSize = paraSize;
			break;
		case OUTPUT:
			classNum = paraNum;
			mapSize = new Size(1, 1);
			outMapNum = classNum;
			break;
		default:
			System.out.println("Internal error occurred in AbstractLayer.java constructor.");
		}// Of switch
	}// Of the first constructor

	/**
	 *********************** 
	 * Initialize the kernel.
	 * 
	 * @param paraNum
	 *            When the type is CONVOLUTION, it is the out map number. when
	 *********************** 
	 */
	public void initKernel(int paraFrontMapNum) {
		kernel = new double[paraFrontMapNum][outMapNum][][];
		for (int i = 0; i < paraFrontMapNum; i++) {
			for (int j = 0; j < outMapNum; j++) {
				kernel[i][j] = MathUtils.randomMatrix(kernelSize.width, kernelSize.height, true);
			} // Of for j
		} // Of for i
	}// Of initKernel

	/**
	 *********************** 
	 * Initialize the output kernel. The code is revised to invoke
	 * initKernel(int).
	 *********************** 
	 */
	public void initOutputKernel(int paraFrontMapNum, Size paraSize) {
		kernelSize = paraSize;
		initKernel(paraFrontMapNum);
	}// Of initOutputKernel

	/**
	 *********************** 
	 * Initialize the bias. No parameter. "int frontMapNum" is claimed however
	 * not used.
	 *********************** 
	 */
	public void initBias() {
		bias = MathUtils.randomArray(outMapNum);
	}// Of initBias

	/**
	 *********************** 
	 * Initialize the errors.
	 * 
	 * @param paraBatchSize
	 *            The batch size.
	 *********************** 
	 */
	public void initErrors(int paraBatchSize) {
		errors = new double[paraBatchSize][outMapNum][mapSize.width][mapSize.height];
	}// Of initErrors

	/**
	 *********************** 
	 * Initialize out maps.
	 * 
	 * @param paraBatchSize
	 *            The batch size.
	 *********************** 
	 */
	public void initOutMaps(int paraBatchSize) {
		outMaps = new double[paraBatchSize][outMapNum][mapSize.width][mapSize.height];
	}// Of initOutMaps

	/**
	 *********************** 
	 * Prepare for a new batch.
	 *********************** 
	 */
	public static void prepareForNewBatch() {
		recordInBatch = 0;
	}// Of prepareForNewBatch

	/**
	 *********************** 
	 * Prepare for a new record.
	 *********************** 
	 */
	public static void prepareForNewRecord() {
		recordInBatch++;
	}// Of prepareForNewRecord

	/**
	 *********************** 
	 * Set one value of outMaps.
	 *********************** 
	 */
	public void setMapValue(int paraMapNo, int paraX, int paraY, double paraValue) {
		outMaps[recordInBatch][paraMapNo][paraX][paraY] = paraValue;
	}// Of setMapValue

	/**
	 *********************** 
	 * Set values of the whole map.
	 *********************** 
	 */
	public void setMapValue(int paraMapNo, double[][] paraOutMatrix) {
		outMaps[recordInBatch][paraMapNo] = paraOutMatrix;
	}// Of setMapValue

	/**
	 *********************** 
	 * Getter.
	 *********************** 
	 */
	public Size getMapSize() {
		return mapSize;
	}// Of getMapSize

	/**
	 *********************** 
	 * Setter.
	 *********************** 
	 */
	public void setMapSize(Size paraMapSize) {
		mapSize = paraMapSize;
	}// Of setMapSize

	/**
	 *********************** 
	 * Getter.
	 *********************** 
	 */
	public LayerTypeEnum getType() {
		return type;
	}// Of getType

	/**
	 *********************** 
	 * Getter.
	 *********************** 
	 */
	public int getOutMapNum() {
		return outMapNum;
	}// Of getOutMapNum

	/**
	 *********************** 
	 * Setter.
	 *********************** 
	 */
	public void setOutMapNum(int paraOutMapNum) {
		outMapNum = paraOutMapNum;
	}// Of setOutMapNum

	/**
	 *********************** 
	 * Getter.
	 *********************** 
	 */
	public Size getKernelSize() {
		return kernelSize;
	}// Of getKernelSize

	/**
	 *********************** 
	 * Getter.
	 *********************** 
	 */
	public Size getScaleSize() {
		return scaleSize;
	}// Of getScaleSize

	/**
	 *********************** 
	 * Getter.
	 *********************** 
	 */
	public double[][] getMap(int paraIndex) {
		return outMaps[recordInBatch][paraIndex];
	}// Of getMap

	/**
	 *********************** 
	 * Getter.
	 *********************** 
	 */
	public double[][] getKernel(int paraFrontMap, int paraOutMap) {
		return kernel[paraFrontMap][paraOutMap];
	}// Of getKernel

	/**
	 *********************** 
	 * Setter. Set one error.
	 *********************** 
	 */
	public void setError(int paraMapNo, int paraMapX, int paraMapY, double paraValue) {
		errors[recordInBatch][paraMapNo][paraMapX][paraMapY] = paraValue;
	}// Of setError

	/**
	 *********************** 
	 * Setter. Set one error matrix.
	 *********************** 
	 */
	public void setError(int paraMapNo, double[][] paraMatrix) {
		errors[recordInBatch][paraMapNo] = paraMatrix;
	}// Of setError

	/**
	 *********************** 
	 * Getter. Get one error matrix.
	 *********************** 
	 */
	public double[][] getError(int paraMapNo) {
		return errors[recordInBatch][paraMapNo];
	}// Of getError

	/**
	 *********************** 
	 * Getter. Get the whole error tensor.
	 *********************** 
	 */
	public double[][][][] getErrors() {
		return errors;
	}// Of getErrors

	/**
	 *********************** 
	 * Setter. Set one kernel.
	 *********************** 
	 */
	public void setKernel(int paraLastMapNo, int paraMapNo, double[][] paraKernel) {
		kernel[paraLastMapNo][paraMapNo] = paraKernel;
	}// Of setKernel

	/**
	 *********************** 
	 * Getter.
	 *********************** 
	 */
	public double getBias(int paraMapNo) {
		return bias[paraMapNo];
	}// Of getBias

	/**
	 *********************** 
	 * Setter.
	 *********************** 
	 */
	public void setBias(int paraMapNo, double paraValue) {
		bias[paraMapNo] = paraValue;
	}// Of setBias

	/**
	 *********************** 
	 * Getter.
	 *********************** 
	 */
	public double[][][][] getMaps() {
		return outMaps;
	}// Of getMaps

	/**
	 *********************** 
	 * Getter.
	 *********************** 
	 */
	public double[][] getError(int paraRecordId, int paraMapNo) {
		return errors[paraRecordId][paraMapNo];
	}// Of getError

	/**
	 *********************** 
	 * Getter.
	 *********************** 
	 */
	public double[][] getMap(int paraRecordId, int paraMapNo) {
		return outMaps[paraRecordId][paraMapNo];
	}// Of getMap

	/**
	 *********************** 
	 * Getter.
	 *********************** 
	 */
	public int getClassNum() {
		return classNum;
	}// Of getClassNum

	/**
	 *********************** 
	 * Getter. Get the whole kernel tensor.
	 *********************** 
	 */
	public double[][][][] getKernel() {
		return kernel;
	} // Of getKernel
}// Of class CnnLayer

然后是多层管理

package machinelearning.cnn;

import java.util.ArrayList;
import java.util.List;

/**
 * CnnLayer builder. 
 * 
 * @author Fan Min minfanphd@163.com.
 */
public class LayerBuilder {
	/**
	 * Layers.
	 */
	private List<CnnLayer> layers;

	/**
	 *********************** 
	 * The first constructor.
	 *********************** 
	 */
	public LayerBuilder() {
		layers = new ArrayList<CnnLayer>();
	}// Of the first constructor

	/**
	 *********************** 
	 * The second constructor.
	 *********************** 
	 */
	public LayerBuilder(CnnLayer paraLayer) {
		this();
		layers.add(paraLayer);
	}// Of the second constructor

	/**
	 *********************** 
	 * Add a layer.
	 * 
	 * @param paraLayer
	 *            The new layer.
	 *********************** 
	 */
	public void addLayer(CnnLayer paraLayer) {
		layers.add(paraLayer);
	}// Of addLayer
	
	/**
	 *********************** 
	 * Get the specified layer.
	 * 
	 * @param paraIndex
	 *            The index of the layer.
	 *********************** 
	 */
	public CnnLayer getLayer(int paraIndex) throws RuntimeException{
		if (paraIndex >= layers.size()) {
			throw new RuntimeException("CnnLayer " + paraIndex + " is out of range: "
					+ layers.size() + ".");
		}//Of if
		
		return layers.get(paraIndex);
	}//Of getLayer
	
	/**
	 *********************** 
	 * Get the output layer.
	 *********************** 
	 */
	public CnnLayer getOutputLayer() {
		return layers.get(layers.size() - 1);
	}//Of getOutputLayer

	/**
	 *********************** 
	 * Get the number of layers.
	 *********************** 
	 */
	public int getNumLayers() {
		return layers.size();
	}//Of getNumLayers
}// Of class LayerBuilder

第 85 天: 网络构建 (1. 代码抄写)

代码有点多, 先抄写一遍, 并尽量理解一些.

  1. initOperators 又初始化若干的算子. 注意到它们与已经初始化的成员变量有关. 这种灵活的方式 (interface) 还是值得学习的.
  2. ALPHA 和 LAMBDA 是超参数, 可以自己设置.
  3. setup 进行整个网络的初始化.
  4. forward 和 backPropagation 与 ANN 同理, 但运算不同了.
  5. 一批数据进行 forward 和 backPropagation 后, 才进行一次 updateParameters.
  6. rangeCheck 是我调拭时写的, 现在没啥用了.
package machinelearning.cnn;

import java.util.Arrays;
import machinelearning.cnn.Dataset.Instance;
import machinelearning.cnn.MathUtils.Operator;

/**
 * CNN.
 * 
 * @author Fan Min minfanphd@163.com.
 */
public class FullCnn {
	/**
	 * The value changes.
	 */
	private static double ALPHA = 0.85;

	/**
	 * A constant.
	 */
	public static double LAMBDA = 0;

	/**
	 * Manage layers.
	 */
	private static LayerBuilder layerBuilder;

	/**
	 * Train using a number of instances simultaneously.
	 */
	private int batchSize;

	/**
	 * Divide the batch size with the given value.
	 */
	private Operator divideBatchSize;

	/**
	 * Multiply alpha with the given value.
	 */
	private Operator multiplyAlpha;

	/**
	 * Multiply lambda and alpha with the given value.
	 */
	private Operator multiplyLambda;

	/**
	 *********************** 
	 * The first constructor.
	 * 
	 *********************** 
	 */
	public FullCnn(LayerBuilder paraLayerBuilder, int paraBatchSize) {
		layerBuilder = paraLayerBuilder;
		batchSize = paraBatchSize;
		setup();
		initOperators();
	}// Of the first constructor

	/**
	 *********************** 
	 * Initialize operators using temporary classes.
	 *********************** 
	 */
	private void initOperators() {
		divideBatchSize = new Operator() {
			private static final long serialVersionUID = 7424011281732651055L;

			@Override
			public double process(double value) {
				return value / batchSize;
			}// Of process
		};

		multiplyAlpha = new Operator() {
			private static final long serialVersionUID = 5761368499808006552L;

			@Override
			public double process(double value) {
				return value * ALPHA;
			}// Of process
		};

		multiplyLambda = new Operator() {
			private static final long serialVersionUID = 4499087728362870577L;

			@Override
			public double process(double value) {
				return value * (1 - LAMBDA * ALPHA);
			}// Of process
		};
	}// Of initOperators

	/**
	 *********************** 
	 * Setup according to the layer builder.
	 *********************** 
	 */
	public void setup() {
		CnnLayer tempInputLayer = layerBuilder.getLayer(0);
		tempInputLayer.initOutMaps(batchSize);

		for (int i = 1; i < layerBuilder.getNumLayers(); i++) {
			CnnLayer tempLayer = layerBuilder.getLayer(i);
			CnnLayer tempFrontLayer = layerBuilder.getLayer(i - 1);
			int tempFrontMapNum = tempFrontLayer.getOutMapNum();
			switch (tempLayer.getType()) {
			case INPUT:
				// Should not be input. Maybe an error should be thrown out.
				break;
			case CONVOLUTION:
				tempLayer.setMapSize(
						tempFrontLayer.getMapSize().subtract(tempLayer.getKernelSize(), 1));
				tempLayer.initKernel(tempFrontMapNum);
				tempLayer.initBias();
				tempLayer.initErrors(batchSize);
				tempLayer.initOutMaps(batchSize);
				break;
			case SAMPLING:
				tempLayer.setOutMapNum(tempFrontMapNum);
				tempLayer.setMapSize(tempFrontLayer.getMapSize().divide(tempLayer.getScaleSize()));
				tempLayer.initErrors(batchSize);
				tempLayer.initOutMaps(batchSize);
				break;
			case OUTPUT:
				tempLayer.initOutputKernel(tempFrontMapNum, tempFrontLayer.getMapSize());
				tempLayer.initBias();
				tempLayer.initErrors(batchSize);
				tempLayer.initOutMaps(batchSize);
				break;
			}// Of switch
		} // Of for i
	}// Of setup

	/**
	 *********************** 
	 * Forward computing.
	 *********************** 
	 */
	private void forward(Instance instance) {
		setInputLayerOutput(instance);
		for (int l = 1; l < layerBuilder.getNumLayers(); l++) {
			CnnLayer tempCurrentLayer = layerBuilder.getLayer(l);
			CnnLayer tempLastLayer = layerBuilder.getLayer(l - 1);
			switch (tempCurrentLayer.getType()) {
			case CONVOLUTION:
				setConvolutionOutput(tempCurrentLayer, tempLastLayer);
				break;
			case SAMPLING:
				setSampOutput(tempCurrentLayer, tempLastLayer);
				break;
			case OUTPUT:
				setConvolutionOutput(tempCurrentLayer, tempLastLayer);
				break;
			default:
				break;
			}// Of switch
		} // Of for l
	}// Of forward

	/**
	 *********************** 
	 * Set the in layer output. Given a record, copy its values to the input
	 * map.
	 *********************** 
	 */
	private void setInputLayerOutput(Instance paraRecord) {
		CnnLayer tempInputLayer = layerBuilder.getLayer(0);
		Size tempMapSize = tempInputLayer.getMapSize();
		double[] tempAttributes = paraRecord.getAttributes();
		if (tempAttributes.length != tempMapSize.width * tempMapSize.height)
			throw new RuntimeException("input record does not match the map size.");

		for (int i = 0; i < tempMapSize.width; i++) {
			for (int j = 0; j < tempMapSize.height; j++) {
				tempInputLayer.setMapValue(0, i, j, tempAttributes[tempMapSize.height * i + j]);
			} // Of for j
		} // Of for i
	}// Of setInputLayerOutput

	/**
	 *********************** 
	 * Compute the convolution output according to the output of the last layer.
	 * 
	 * @param paraLastLayer
	 *            the last layer.
	 * @param paraLayer
	 *            the current layer.
	 *********************** 
	 */
	private void setConvolutionOutput(final CnnLayer paraLayer, final CnnLayer paraLastLayer) {
		// int mapNum = paraLayer.getOutMapNum();
		final int lastMapNum = paraLastLayer.getOutMapNum();

		// Attention: paraLayer.getOutMapNum() may not be right.
		for (int j = 0; j < paraLayer.getOutMapNum(); j++) {
			double[][] tempSumMatrix = null;
			for (int i = 0; i < lastMapNum; i++) {
				double[][] lastMap = paraLastLayer.getMap(i);
				double[][] kernel = paraLayer.getKernel(i, j);
				if (tempSumMatrix == null) {
					// On the first map.
					tempSumMatrix = MathUtils.convnValid(lastMap, kernel);
				} else {
					// Sum up convolution maps
					tempSumMatrix = MathUtils.matrixOp(MathUtils.convnValid(lastMap, kernel),
							tempSumMatrix, null, null, MathUtils.plus);
				} // Of if
			} // Of for i

			// Activation.
			final double bias = paraLayer.getBias(j);
			tempSumMatrix = MathUtils.matrixOp(tempSumMatrix, new Operator() {
				private static final long serialVersionUID = 2469461972825890810L;

				@Override
				public double process(double value) {
					return MathUtils.sigmod(value + bias);
				}

			});

			paraLayer.setMapValue(j, tempSumMatrix);
		} // Of for j
	}// Of setConvolutionOutput

	/**
	 *********************** 
	 * Compute the convolution output according to the output of the last layer.
	 * 
	 * @param paraLastLayer
	 *            the last layer.
	 * @param paraLayer
	 *            the current layer.
	 *********************** 
	 */
	private void setSampOutput(final CnnLayer paraLayer, final CnnLayer paraLastLayer) {
		// int tempLastMapNum = paraLastLayer.getOutMapNum();

		// Attention: paraLayer.outMapNum may not be right.
		for (int i = 0; i < paraLayer.outMapNum; i++) {
			double[][] lastMap = paraLastLayer.getMap(i);
			Size scaleSize = paraLayer.getScaleSize();
			double[][] sampMatrix = MathUtils.scaleMatrix(lastMap, scaleSize);
			paraLayer.setMapValue(i, sampMatrix);
		} // Of for i
	}// Of setSampOutput

	/**
	 *********************** 
	 * Train the cnn.
	 *********************** 
	 */
	public void train(Dataset paraDataset, int paraRounds) {
		for (int t = 0; t < paraRounds; t++) {
			System.out.println("Iteration: " + t);
			int tempNumEpochs = paraDataset.size() / batchSize;
			if (paraDataset.size() % batchSize != 0)
				tempNumEpochs++;
			// logger.info("第{}次迭代,epochsNum: {}", t, epochsNum);
			double tempNumCorrect = 0;
			int tempCount = 0;
			for (int i = 0; i < tempNumEpochs; i++) {
				int[] tempRandomPerm = MathUtils.randomPerm(paraDataset.size(), batchSize);
				CnnLayer.prepareForNewBatch();

				for (int index : tempRandomPerm) {
					boolean isRight = train(paraDataset.getInstance(index));
					if (isRight)
						tempNumCorrect++;
					tempCount++;
					CnnLayer.prepareForNewRecord();
				} // Of for index

				updateParameters();
				if (i % 50 == 0) {
					System.out.print("..");
					if (i + 50 > tempNumEpochs)
						System.out.println();
				}
			}
			double p = 1.0 * tempNumCorrect / tempCount;
			if (t % 10 == 1 && p > 0.96) {
				ALPHA = 0.001 + ALPHA * 0.9;
				// logger.info("设置 alpha = {}", ALPHA);
			} // Of iff
			System.out.println("Training precision: " + p);
			// logger.info("计算精度: {}/{}={}.", right, count, p);
		} // Of for i
	}// Of train

	/**
	 *********************** 
	 * Train the cnn with only one record.
	 * 
	 * @param paraRecord
	 *            The given record.
	 *********************** 
	 */
	private boolean train(Instance paraRecord) {
		forward(paraRecord);
		boolean result = backPropagation(paraRecord);
		return result;
	}// Of train

	/**
	 *********************** 
	 * Back-propagation.
	 * 
	 * @param paraRecord
	 *            The given record.
	 *********************** 
	 */
	private boolean backPropagation(Instance paraRecord) {
		boolean result = setOutputLayerErrors(paraRecord);
		setHiddenLayerErrors();
		return result;
	}// Of backPropagation

	/**
	 *********************** 
	 * Update parameters.
	 *********************** 
	 */
	private void updateParameters() {
		for (int l = 1; l < layerBuilder.getNumLayers(); l++) {
			CnnLayer layer = layerBuilder.getLayer(l);
			CnnLayer lastLayer = layerBuilder.getLayer(l - 1);
			switch (layer.getType()) {
			case CONVOLUTION:
			case OUTPUT:
				updateKernels(layer, lastLayer);
				updateBias(layer, lastLayer);
				break;
			default:
				break;
			}// Of switch
		} // Of for l
	}// Of updateParameters

	/**
	 *********************** 
	 * Update bias.
	 *********************** 
	 */
	private void updateBias(final CnnLayer paraLayer, CnnLayer paraLastLayer) {
		final double[][][][] errors = paraLayer.getErrors();
		// int mapNum = paraLayer.getOutMapNum();

		// Attention: getOutMapNum() may not be correct.
		for (int j = 0; j < paraLayer.getOutMapNum(); j++) {
			double[][] error = MathUtils.sum(errors, j);
			double deltaBias = MathUtils.sum(error) / batchSize;
			double bias = paraLayer.getBias(j) + ALPHA * deltaBias;
			paraLayer.setBias(j, bias);
		} // Of for j
	}// Of updateBias

	/**
	 *********************** 
	 * Update kernels.
	 *********************** 
	 */
	private void updateKernels(final CnnLayer paraLayer, final CnnLayer paraLastLayer) {
		// int mapNum = paraLayer.getOutMapNum();
		int tempLastMapNum = paraLastLayer.getOutMapNum();

		// Attention: getOutMapNum() may not be right
		for (int j = 0; j < paraLayer.getOutMapNum(); j++) {
			for (int i = 0; i < tempLastMapNum; i++) {
				double[][] tempDeltaKernel = null;
				for (int r = 0; r < batchSize; r++) {
					double[][] error = paraLayer.getError(r, j);
					if (tempDeltaKernel == null)
						tempDeltaKernel = MathUtils.convnValid(paraLastLayer.getMap(r, i), error);
					else {
						tempDeltaKernel = MathUtils.matrixOp(
								MathUtils.convnValid(paraLastLayer.getMap(r, i), error),
								tempDeltaKernel, null, null, MathUtils.plus);
					} // Of if
				} // Of for r

				tempDeltaKernel = MathUtils.matrixOp(tempDeltaKernel, divideBatchSize);
				if (!rangeCheck(tempDeltaKernel, -10, 10)) {
					System.exit(0);
				} // Of if
				double[][] kernel = paraLayer.getKernel(i, j);
				tempDeltaKernel = MathUtils.matrixOp(kernel, tempDeltaKernel, multiplyLambda,
						multiplyAlpha, MathUtils.plus);
				paraLayer.setKernel(i, j, tempDeltaKernel);
			} // Of for i
		} // Of for j
	}// Of updateKernels

	/**
	 *********************** 
	 * Set errors of all hidden layers.
	 *********************** 
	 */
	private void setHiddenLayerErrors() {
		// System.out.println("setHiddenLayerErrors");
		for (int l = layerBuilder.getNumLayers() - 2; l > 0; l--) {
			CnnLayer layer = layerBuilder.getLayer(l);
			CnnLayer nextLayer = layerBuilder.getLayer(l + 1);
			// System.out.println("layertype = " + layer.getType());
			switch (layer.getType()) {
			case SAMPLING:
				setSamplingErrors(layer, nextLayer);
				break;
			case CONVOLUTION:
				setConvolutionErrors(layer, nextLayer);
				break;
			default:
				break;
			}// Of switch
		} // Of for l
	}// Of setHiddenLayerErrors

	/**
	 *********************** 
	 * Set errors of a sampling layer.
	 *********************** 
	 */
	private void setSamplingErrors(final CnnLayer paraLayer, final CnnLayer paraNextLayer) {
		// int mapNum = layer.getOutMapNum();
		int tempNextMapNum = paraNextLayer.getOutMapNum();
		// Attention: getOutMapNum() may not be correct
		for (int i = 0; i < paraLayer.getOutMapNum(); i++) {
			double[][] sum = null;
			for (int j = 0; j < tempNextMapNum; j++) {
				double[][] nextError = paraNextLayer.getError(j);
				double[][] kernel = paraNextLayer.getKernel(i, j);
				if (sum == null) {
					sum = MathUtils.convnFull(nextError, MathUtils.rot180(kernel));
				} else {
					sum = MathUtils.matrixOp(
							MathUtils.convnFull(nextError, MathUtils.rot180(kernel)), sum, null,
							null, MathUtils.plus);
				} // Of if
			} // Of for j
			paraLayer.setError(i, sum);

			if (!rangeCheck(sum, -2, 2)) {
				System.out.println(
						"setSampErrors, error out of range.\r\n" + Arrays.deepToString(sum));
			} // Of if
		} // Of for i
	}// Of setSamplingErrors

	/**
	 *********************** 
	 * Set errors of a sampling layer.
	 *********************** 
	 */
	private void setConvolutionErrors(final CnnLayer paraLayer, final CnnLayer paraNextLayer) {
		// System.out.println("setConvErrors");
		for (int m = 0; m < paraLayer.getOutMapNum(); m++) {
			Size tempScale = paraNextLayer.getScaleSize();
			double[][] tempNextLayerErrors = paraNextLayer.getError(m);
			double[][] tempMap = paraLayer.getMap(m);
			double[][] tempOutMatrix = MathUtils.matrixOp(tempMap, MathUtils.cloneMatrix(tempMap),
					null, MathUtils.one_value, MathUtils.multiply);
			tempOutMatrix = MathUtils.matrixOp(tempOutMatrix,
					MathUtils.kronecker(tempNextLayerErrors, tempScale), null, null,
					MathUtils.multiply);
			paraLayer.setError(m, tempOutMatrix);

			// System.out.println("range check nextError");
			if (!rangeCheck(tempNextLayerErrors, -10, 10)) {
				System.out.println("setConvErrors, nextError out of range:\r\n"
						+ Arrays.deepToString(tempNextLayerErrors));
				System.out.println("the new errors are:\r\n" + Arrays.deepToString(tempOutMatrix));

				System.exit(0);
			} // Of if

			if (!rangeCheck(tempOutMatrix, -10, 10)) {
				System.out.println("setConvErrors, error out of range.");
				System.exit(0);
			} // Of if
		} // Of for m
	}// Of setConvolutionErrors

	/**
	 *********************** 
	 * Set errors of a sampling layer.
	 *********************** 
	 */
	private boolean setOutputLayerErrors(Instance paraRecord) {
		CnnLayer tempOutputLayer = layerBuilder.getOutputLayer();
		int tempMapNum = tempOutputLayer.getOutMapNum();

		double[] tempTarget = new double[tempMapNum];
		double[] tempOutMaps = new double[tempMapNum];
		for (int m = 0; m < tempMapNum; m++) {
			double[][] outmap = tempOutputLayer.getMap(m);
			tempOutMaps[m] = outmap[0][0];
		} // Of for m

		int tempLabel = paraRecord.getLabel().intValue();
		tempTarget[tempLabel] = 1;
		// Log.i(record.getLable() + "outmaps:" +
		// Util.fomart(outmaps)
		// + Arrays.toString(target));
		for (int m = 0; m < tempMapNum; m++) {
			tempOutputLayer.setError(m, 0, 0,
					tempOutMaps[m] * (1 - tempOutMaps[m]) * (tempTarget[m] - tempOutMaps[m]));
		} // Of for m

		return tempLabel == MathUtils.getMaxIndex(tempOutMaps);
	}// Of setOutputLayerErrors

	/**
	 *********************** 
	 * Setup the network.
	 *********************** 
	 */
	public void setup(int paraBatchSize) {
		CnnLayer tempInputLayer = layerBuilder.getLayer(0);
		tempInputLayer.initOutMaps(paraBatchSize);

		for (int i = 1; i < layerBuilder.getNumLayers(); i++) {
			CnnLayer tempLayer = layerBuilder.getLayer(i);
			CnnLayer tempLastLayer = layerBuilder.getLayer(i - 1);
			int tempLastMapNum = tempLastLayer.getOutMapNum();
			switch (tempLayer.getType()) {
			case INPUT:
				break;
			case CONVOLUTION:
				tempLayer.setMapSize(
						tempLastLayer.getMapSize().subtract(tempLayer.getKernelSize(), 1));
				tempLayer.initKernel(tempLastMapNum);
				tempLayer.initBias();
				tempLayer.initErrors(paraBatchSize);
				tempLayer.initOutMaps(paraBatchSize);
				break;
			case SAMPLING:
				tempLayer.setOutMapNum(tempLastMapNum);
				tempLayer.setMapSize(tempLastLayer.getMapSize().divide(tempLayer.getScaleSize()));
				tempLayer.initErrors(paraBatchSize);
				tempLayer.initOutMaps(paraBatchSize);
				break;
			case OUTPUT:
				tempLayer.initOutputKernel(tempLastMapNum, tempLastLayer.getMapSize());
				tempLayer.initBias();
				tempLayer.initErrors(paraBatchSize);
				tempLayer.initOutMaps(paraBatchSize);
				break;
			}// Of switch
		} // Of for i
	}// Of setup

	/**
	 *********************** 
	 * Predict for the dataset.
	 *********************** 
	 */
	public int[] predict(Dataset paraDataset) {
		System.out.println("Predicting ... ");
		CnnLayer.prepareForNewBatch();

		int[] resultPredictions = new int[paraDataset.size()];
		double tempCorrect = 0.0;

		Instance tempRecord;
		for (int i = 0; i < paraDataset.size(); i++) {
			tempRecord = paraDataset.getInstance(i);
			forward(tempRecord);
			CnnLayer outputLayer = layerBuilder.getOutputLayer();

			int tempMapNum = outputLayer.getOutMapNum();
			double[] tempOut = new double[tempMapNum];
			for (int m = 0; m < tempMapNum; m++) {
				double[][] outmap = outputLayer.getMap(m);
				tempOut[m] = outmap[0][0];
			} // Of for m

			resultPredictions[i] = MathUtils.getMaxIndex(tempOut);
			if (resultPredictions[i] == tempRecord.getLabel().intValue()) {
				tempCorrect++;
			} // Of if
		} // Of for

		System.out.println("Accuracy: " + tempCorrect / paraDataset.size());
		return resultPredictions;
	}// Of predict

	/**
	 *********************** 
	 * Range check, only for debugging.
	 * 
	 * @param paraMatix
	 *            The given matrix.
	 * @param paraLowerBound
	 * @param paraUpperBound
	 *********************** 
	 */
	public boolean rangeCheck(double[][] paraMatrix, double paraLowerBound, double paraUpperBound) {
		for (int i = 0; i < paraMatrix.length; i++) {
			for (int j = 0; j < paraMatrix[0].length; j++) {
				if ((paraMatrix[i][j] < paraLowerBound) || (paraMatrix[i][j] > paraUpperBound)) {
					System.out.println("" + paraMatrix[i][j] + " out of range (" + paraLowerBound
							+ ", " + paraUpperBound + ")\r\n");
					return false;
				} // Of if
			} // Of for j
		} // Of for i

		return true;
	}// Of rangeCheck

	/**
	 *********************** 
	 * The main entrance.
	 *********************** 
	 */
	public static void main(String[] args) {
		LayerBuilder builder = new LayerBuilder();
		// Input layer, the maps are 28*28
		builder.addLayer(new CnnLayer(LayerTypeEnum.INPUT, -1, new Size(28, 28)));
		// Convolution output has size 24*24, 24=28+1-5
		builder.addLayer(new CnnLayer(LayerTypeEnum.CONVOLUTION, 6, new Size(5, 5)));
		// Sampling output has size 12*12,12=24/2
		builder.addLayer(new CnnLayer(LayerTypeEnum.SAMPLING, -1, new Size(2, 2)));
		// Convolution output has size 8*8, 8=12+1-5
		builder.addLayer(new CnnLayer(LayerTypeEnum.CONVOLUTION, 12, new Size(5, 5)));
		// Sampling output has size4×4,4=8/2
		builder.addLayer(new CnnLayer(LayerTypeEnum.SAMPLING, -1, new Size(2, 2)));
		// output layer, digits 0 - 9.
		builder.addLayer(new CnnLayer(LayerTypeEnum.OUTPUT, 10, null));
		// Construct the full CNN.
		FullCnn tempCnn = new FullCnn(builder, 10);

		Dataset tempTrainingSet = new Dataset("d:/c/cann/data/mnist/train.format", ",", 784);

		// Train the model.
		tempCnn.train(tempTrainingSet, 10);
		// tempCnn.predict(tempTrainingSet);
	}// Of main
}// Of class MfCnn

第 86 天: 网络构建 (2. 代码理解)

CNN 的结构与 ANN 其实是一样的.

  1. 根据 main 里面的设置, 各层节点数依次是 1, 6, 6, 12, 12, 10. 这里的每个节点存储一个矩阵.
  2. 这里的 6 和 12 为超参数, 可以自己设置.
  3. 卷积层中, 每条边对应一个卷积核. 池化层不需要参数.
  4. 第一层只有 1 个节点, 是因为它为 binary 的. 如果是彩色图 (RGB), 则第一层应该有 3 个节点.
  5. 最后一层有 10 个节点, 因为 0-9 共 10 种可能的数字. 它们均为 1*1 的矩阵, 分类时通过比较哪个值最大, 确定是哪一类, 这和 ANN 也是一致的.

第 87 天: 实验

  1. 可以看出, 虽然网络不大, 但是效果不错. 用测试集效果也相当.
  2. 自己另外找一个数据集, 构建网络, 进行实验并给出结果.

第 88 天: 总结 (01 - 10 天: 基本语法)

第 89 天: 总结 (11 - 50 天: 数据结构)

第 90 天: 总结 (51 - 87 天: 机器学习)

  • 4
    点赞
  • 7
    收藏
    觉得还不错? 一键收藏
  • 2
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值