EMR sample
create cluster
add s3FullAccess
edit security group in ec2 panel
ssh to cluster
command: spark-shell line scala
copy and run:
val file = sc.textFile("s3://support.elasticmapreduce/bigdatademo/sample/wiki")
val reducedList = file.map(l => l.split(" ")).map(l => (l(1), l(2).toInt)).reduceByKey(_+_, 3)
reducedList.cache
val sortedList = reducedList.map(x => (x._2, x._1)).sortByKey(false).take(50)
result =>