我们要做的 大数据平台 打算使用spark 来做 ,我很开心
spark 软件栈丰富全面,涵盖了离线数据清洗、流处理、迭代的机器学习
想不起来了暂时
Databricks是Berkeley AMPLab Spark大牛们的新作,
定位是”Databricks is a managed platform for running Apache Spark”
- It’s a point and click platform for those that prefer a user interface like data scientists or data analysts.
- However, this UI is accompanied by a sophisticated API for those that want to automate aspects of their data workloads with automated jobs.
- To meet the needs of enterprises, Databricks also includes features such as role-based access control and other intelligent optimizations that not only improve usability for users but also reduce costs and complexity for administrators.
也就是说提供了 数据清洗、机器学习、用户管理功能,能够很好的满足我们的需要
databricks 同时提供了 webUI 与 REST api