关于神经网络无法解决模3问题的困惑
最近尝试用全连接神经网络解决这样一个问题:一个16位二进制数模3的余数。
网络输入维数是16,分别代表一个数值的二进制位;输出维数是3,分别代表模3余0、1、2
中间尝试用3层、10层、20层256为神经网络进行训练。
代码如下
from keras.layers import Dense
from keras.models import Sequential
# 数据集准备
nums = np.random.randint(0, 65536, 32768)
nums = list(set(nums))
_outs = [x % 3 for x in nums]
nums = [[eval(x) for x in list(bin(elem)[2:])] for elem in nums]
nums = [(16 - len(elem)) * [0] + elem for elem in nums]
nums = np.array(nums)
outs = np.zeros((len(nums), 3))
for i in range(len(_outs)):
outs[[i], _outs[i]] = 1
# 模型构建
model = Sequential([
Dense(256, input_dim=16, activation="tanh"),
Dense(256, activation="tanh"),
Dense(256, activation="tanh"),
Dense(256, activation="tanh"), # 此处尝试过用更多层Dense
Dense(3, activation="softmax")
])
model.compile(optimizer="adam", loss="categorical_crossentropy")
model.fit(nums, outs, epochs=10, batch_size=1000)
# 预测
model.predict(np.array([[0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0]]))
输出:
Epoch 1/10
25760/25760 [==============================] - 5s 197us/step - loss: 1.2339
Epoch 2/10
25760/25760 [==============================] - 3s 128us/step - loss: 1.1033
Epoch 3/10
25760/25760 [==============================] - 3s 129us/step - loss: 1.1089
Epoch 4/10
25760/25760 [==============================] - 3s 127us/step - loss: 1.1034
Epoch 5/10
25760/25760 [==============================] - 3s 129us/step - loss: 1.1055
Epoch 6/10
25760/25760 [==============================] - 5s 186us/step - loss: 1.1030
Epoch 7/10
25760/25760 [==============================] - 5s 192us/step - loss: 1.1010
Epoch 8/10
25760/25760 [==============================] - 5s 188us/step - loss: 1.1053
Epoch 9/10
25760/25760 [==============================] - 5s 176us/step - loss: 1.1043
Epoch 10/10
25760/25760 [==============================] - 4s 138us/step - loss: 1.1053
array([[0.33267307, 0.30762094, 0.35970604]], dtype=float32)
可见数据在训练集上的loss并没有降下去,而相应的预测结果也是“可能性全部1/3”。
目前还没有想到这个现象如何解释,说好了可以拟合任意函数的呢?难道这么多层的网络还是太浅、参数量不够?
或者说这样的问题是不是有什么特殊的性质?