zeppelin是spark的web版本notebook编辑器,相当于ipython的notebook编辑器。
一Zeppelin安装
(前提是spark已经安装好)
1 下载https://zeppelin.apache.org/download.html(下载编译好的bin版)
2 解压运行:sh bin/zeppelin-daemon.sh start
3 权限问题:chown –R –v mapr:mapr zeppelin
4 异常:jackson版本冲突
4.1报错:
com.fasterxml.jackson.databind.JsonMappingException: Could not find creator property with name 'id' (in class org.apache.spark.rdd.RDDOperationScope)
at [Source: {"id":"5","name":"textFile"}; line: 1, column: 1]
at com.fasterxml.jackson.databind.JsonMappingException.from(JsonMappingException.java:148)
at com.fasterxml.jackson.databind.DeserializationContext.mappingException(DeserializationContext.java:843)
at com.fasterxml.jackson.databind.deser.BeanDeserializerFactory.addBeanProps(BeanDeserializerFactory.java:533)
4.2原因:jackson版本冲突,查看自己spark的pow文件,下载需要的jackson版本
<fasterxml.jackson.version>2.4.4</fasterxml.jackson.version>
,依赖2.4.4,而zeppelin加载2.5.3。
[mapr@apm1 zeppelin-0.6.0-bin-netinst]$ find . | grep jackson
./lib/jackson-annotations-2.5.0.jar
./lib/jackson-core-2.5.3.jar
./lib/jackson-databind-2.5.3.jar
4.3方案:
把上面三个jar包替换这三个为2.4.4,在maven的依赖包中找到下面三个文件:
/lib/jackson-annotations-2.4.4.jar
/lib/jackson-databind-2.4.4.jar
/lib/jackson-core-2.4.4.jar
并重启zeppelin
5浏览器登陆http://localhost:8080/,设置默认interpretation,点击保存即可。
二 Zeppelin使用
1 加载bank.csv数据集
val bankText = sc.textFile("bank.csv") case class Bank(age: Integer, job: String, marital: String, education: String, balance: Integer) val bank = bankText.map(s => s.split(";")).filter(s => s(0) != "\"age\"").map( s => Bank(s(0).toInt, s(1).replaceAll("\"", ""), s(2).replaceAll("\"", ""), s(3).replaceAll("\"", ""), s(5).replaceAll("\"", "").toInt ) ).toDF() bank.registerTempTable("bank")
2sql统计
3 sql统计
4 sql统计