An example of Integrating Spark and Cassandra

2 篇文章 0 订阅
2 篇文章 0 订阅
export SPARK_CLASSPATH=/usr/local/cassandra/current/lib/*
export MASTER=mesos://hadoop1:5050
./spark-shell

import java.nio.ByteBuffer
import java.util.{ Map => JMap }
import org.apache.cassandra.hadoop.ConfigHelper
import org.apache.cassandra.hadoop.cql3.CqlPagingInputFormat
import org.apache.cassandra.utils.ByteBufferUtil
import org.apache.hadoop.conf.Configuration

def cql3RDD(host : String, port : Int)(ks : String, table : String)  = {
  val conf = new Configuration(sc.hadoopConfiguration)
  ConfigHelper.setInputPartitioner(conf, "Murmur3Partitioner")
  ConfigHelper.setInputInitialAddress(conf, host)
  ConfigHelper.setInputRpcPort(conf, port.toString)
  ConfigHelper.setInputColumnFamily(conf, ks, table)
  sc.newAPIHadoopRDD(conf, classOf[CqlPagingInputFormat], classOf[JMap[String,ByteBuffer]], classOf[JMap[String,ByteBuffer]])
}

val rdd = cql3RDD("hadoop1", 9160)("webtrans_tm_tdb", "d_1")
val filtered1 = rdd filter { case (k,v) => ByteBufferUtil.string(k.get("slang"))=="en" }
val filtered2 = filtered1 filter { case (k,v) => ByteBufferUtil.string(k.get("tlang"))=="zh" }
val filtered3 = filtered2 filter { case (k,v) => ByteBufferUtil.string(v.get("scntn")).toLowerCase.contains("macau") }
filtered3.count
filtered3.map(/*do some formatting*/).saveAsTextFile("hdfs://hadoop1:54310/containsMacau.txt")
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值