C++/CUDA implementation of convolutional (or more generally, feed-forward) neural networks. It can model arbitrary layer connectivity and network depth. Any directed acyclic graph of layers will do. Training is done using the back-propagation algorithm.
Multi-GPU training support implementing data parallelism, model parallelism, and the hybrid approach described in One weird trick for parallelizing convolutional neural networks
Caffe: a fast open framework for deep learning.
Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. It can use GPUs and perform efficient symbolic differentiation.
A Machine Learning library based on Theano
Deep Learning for Java, Scala & Clojure on Hadoop, Spark & GPUs
purine version 2. This framework is described in Purine: A bi-graph based deep learning framework
Petuum is a distributed machine learning framework. It aims to provide a generic algorithmic and systems interface to large scale machine learning, and takes care of difficult systems "plumbing work" and algorithmic acceleration, while simplifying the distributed implementation of ML programs - allowing you to focus on model perfection and Big Data Analytics. Petuum runs efficiently at scale on research clusters and cloud compute like Amazon EC2 and Google GCE.
A Community of Awesome Distributed Machine Learning C++ Projects
There's lots of treasure of DL.