1. 介绍
jupyter notebook有很多胶水包很费事,比如典型的matplotlib,画张图要写很多代码:
在pixieDust中仅需要display即可,硬生生的把jupyter变成了excel
首先要import pixiedust(皮鞋灰尘🤪)
如果想要方便使用,可以用jupyter pixiedust install安装spark内核
使用pixiedust.enableJobMonitor()查看spark进度
使用%pixiedustLog -l debug查看日志
2. 包管理
一些常见的包见https://spark-packages.org/
pixiedust.installPackage("org.apache.commons:commons-csv:0")
3. 大数据
直接安装pyspark,会自动把spark也装上。有两种调用方式,推荐使用SparkSession
from pyspark import SparkContext
from pyspark.sql.context import SQLContext
sc = SparkContext('local', 'wordcount')
sqlContext = SQLContext(sc)
dd = sqlContext.createDataFrame(
[(2010, 'Camping Equipment', 3),
(2010, 'Golf Equipment', 1),
(2010, 'Mountaineering Equipment', 1),
(2010, 'Outdoor Protection', 2)])
另一种方式是直接起一个新的内核,参考pixiedust的安装教程,会自带一个sc,sqlContext什么的也都是自带的,可以打个s和S看看有哪些spark自带的变量。
4. 加载数据
pixiedust.sampleData(…),里面可以是本地文件,url,或者是示例文件:
pixiedust.sampleData('file:///Users/bradfordnoble/pixiedust/data/nz.csv')
home_df = pixiedust.sampleData("https://openobjectstore.mybluemix.net/misc/milliondollarhomes.csv")
5. 展示与分析数据
使用display展示spark object,包括表格、图表、地图等等
6. Debug
使用%%pixie_debugger进行断点调试界面
7. 自制app
其实就是写网页,来看一个最简单的例子:
from pixiedust.display.app import *
@PixieApp
class HelloWorldPixieApp:
@route()
def main(self):
return"""
<input pd_options="clicked=true" type="button" value="Click Me">
"""
@route(clicked="true")
def _clicked(self):
return """
<input pd_options="clicked=false" type="button" value="You Clicked, Now Go back">
"""
#run the app
HelloWorldPixieApp().run(runInDialog='false')
下面是个带data的例子:
@PixieApp
class HelloWorldPixieAppWithData:
@route()
def main(self):
return"""
<div class="row">
<div class="col-sm-2">
<input pd_options="handlerId=dataframe"
pd_entity
pd_target="target{{prefix}}"
type="button" value="Preview Data">
</div>
<div class="col-sm-10" id="target{{prefix}}"></div>
</div>
"""
#Create dataframe
df = SQLContext(sc).createDataFrame(
[(2010, 'Camping Equipment', 3, 200),(2010, 'Camping Equipment', 10, 200),(2010, 'Golf Equipment', 1, 240),
(2010, 'Mountaineering Equipment', 1, 348),(2010, 'Outdoor Protection',2,200),(2010, 'Personal Accessories', 2, 200),
(2011, 'Camping Equipment', 4, 489),(2011, 'Golf Equipment', 5, 234),(2011, 'Mountaineering Equipment',2, 123),
(2011, 'Outdoor Protection', 4, 654),(2011, 'Personal Accessories', 2, 234),(2012, 'Camping Equipment', 5, 876),
(2012, 'Golf Equipment', 5, 200),(2012, 'Mountaineering Equipment', 3, 156),(2012, 'Outdoor Protection', 5, 200),
(2012, 'Personal Accessories', 3, 345),(2013, 'Camping Equipment', 8, 987),(2013, 'Golf Equipment', 5, 434),
(2013, 'Mountaineering Equipment', 3, 278),(2013, 'Outdoor Protection', 8, 134),(2013,'Personal Accessories',4, 200)],
["year","zone","unique_customers", "revenue"])
#run the app
HelloWorldPixieAppWithData().run(df, runInDialog='false')