The DeepETA team tested and tuned 7 different neural network architectures: MLP [4], NODE [5], TabNet [6], Sparsely Gated Mixture-of-Experts [7], HyperNetworks [8], Transformer [9] and Linear Transformer [10].
We found that an encoder-decoder architecture with self-attention provided the best accuracy.