背景:TensorFlow性能差强人意,但DeepLearning领域优势,spark则反之。TensorFlow训练好的模型在spark中调用各种小问题不断。
基于spark相关的DeepLearning大致看了下面这些,各有优缺点:
elephas:参考:https://github.com/maxpumperla/elephas
dist-keras:参考:https://github.com/cerndb/dist-keras
sparknet:低活跃,https://github.com/amplab/sparknet
dl4j:没有内置推荐相关算法支持,https://github.com/deeplearning4j/dl4j-spark-ml 或 https://github.com/eclipse/deeplearning4j-examples
TensorFlowOnSpark:活跃度低,https://github.com/yahoo/TensorFlowOnSpark
spark-deep-learning:有点慢,只支持Python,https://github.com/databricks/spark-deep-learning
H2O:一般吧,https://github.com/h2oai/sparkling-water/tree/master/
analytics-zoo:不支持Windows,但自带NCF等算法,支持TensorFlow,https://github.com/intel-analytics/analytics-zoo
BigDL:进度、活跃度还行,不支持TensorFlow、kreas,自带算法:LeNet/Inception/VGG/ResNet/RNN/自动编码器,https://github.com/intel-analytics/BigDL