spark读取solr java_Spark Solr(1)Read Data from SOLR

Spark Solr(1)Read Data from SOLR

I get a lot of data-bind version conflict with spark and solr. So I clone the project and made some version updates there.

Originally it is fork from lucidworks/spark-solr

Only some dependencies update in pom.xml

4.0.0

com.lucidworks.spark

spark-solr

- 3.4.0-SNAPSHOT

+ 3.4.0.1

jar

spark-solr

Tools for reading data from Spark into Solr

@@ -39,11 +39,10 @@

1.8

2.2.1

7.1.0

- 2.4.0

+ 2.6.7

2.11.8

2.11

1.1.1

- 2.4.0

128m

Command to build that package

>mvn clean compile install -DskipTests

After build that, I get a driver versioned as 3.4.0.1

Set Up SOLR Spark Task

pom.xml to build the dependencies

4.0.0

com.sillycat

sillycat-spark-solr

1.0

Fetch the Events from Kafka

Spark Streaming System

jar

2.2.1

3.4.0.1

org.apache.spark

spark-core_2.11

${spark.version}

com.lucidworks.spark

spark-solr

${spark.solr.version}

org.apache.httpcomponents

httpclient

4.5.3

junit

junit

4.12

test

org.apache.maven.plugins

maven-compiler-plugin

3.6.1

1.8

1.8

org.apache.maven.plugins

maven-assembly-plugin

2.4.1

jar-with-dependencies

com.sillycat.sparkjava.SparkJavaApp

assemble-all

package

single

Here is the major implementation class to connect to the zookeeper and SOLR

import java.util.List;

import org.apache.solr.common.SolrDocument;

import org.apache.spark.SparkConf;

import org.apache.spark.SparkContext;

import org.apache.spark.api.java.JavaRDD;

import org.apache.spark.api.java.function.Function;

import com.lucidworks.spark.rdd.SolrJavaRDD;

import com.sillycat.sparkjava.base.SparkBaseApp;

public class SeniorJavaFeedApp extends SparkBaseApp {

private static final long serialVersionUID = -1219898501920199612L;

protected String getAppName() {

return "SeniorJavaFeedApp";

}

public void executeTask(List params) {

SparkConf conf = this.getSparkConf();

SparkContext sc = new SparkContext(conf);

String collection = "allJobs";

String solrQuery = "expired: false AND title: Java* AND source_id: 4675";

String keyword = "Architect";

logger.info("Prepare the resource from " + solrQuery);

JavaRDD rdd = this.generateRdd(sc, zkHost, collection, solrQuery);

logger.info("Executing the calculation based on keyword " + keyword);

List results = processRows(rdd, keyword);

for (SolrDocument result : results) {

logger.info("Find some jobs for you:" + result);

}

sc.stop();

}

private JavaRDD generateRdd(SparkContext sc, String zkHost, String collection, String solrQuery) {

SolrJavaRDD solrRDD = SolrJavaRDD.get(zkHost, collection, sc);

JavaRDD resultsRDD = solrRDD.queryShards(solrQuery);

return resultsRDD;

}

private List processRows(JavaRDD rows, String keyword) {

JavaRDD lines = rows.filter(new Function() {

private static final long serialVersionUID = 1L;

public Boolean call(SolrDocument s) throws Exception {

Object titleObj = s.getFieldValue("title");

if (titleObj != null) {

String title = titleObj.toString();

if (title.contains(keyword)) {

return true;

}

}

return false;

}

});

return lines.collect();

}

}

Here is the class to run the Spark task on Cluster and Local

#Run the local#

>java -jar target/sillycat-spark-solr-1.0-jar-with-dependencies.jar com.sillycat.sparkjava.app.CountLinesOfKeywordApp

>java -jar target/sillycat-spark-solr-1.0-jar-with-dependencies.jar com.sillycat.sparkjava.app.SeniorJavaFeedApp

#Run binary on local#

>bin/spark-submit --class com.sillycat.sparkjava.SparkJavaApp /Users/carl/work/sillycat/sillycat-spark-java/sillycat-spark-solr/target/sillycat-spark-solr-1.0-jar-with-dependencies.jar com.sillycat.sparkjava.app.CountLinesOfKeywordApp

>bin/spark-submit --class com.sillycat.sparkjava.SparkJavaApp /Users/carl/work/sillycat/sillycat-spark-java/sillycat-spark-solr/target/sillycat-spark-solr-1.0-jar-with-dependencies.jar com.sillycat.sparkjava.app.SeniorJavaFeedApp

#Run binary on Remote YARN Cluster#

>bin/spark-submit --class com.sillycat.sparkjava.SparkJavaApp --master yarn-client /home/ec2-user/users/carl/sillycat-spark-java/sillycat-spark-solr/target/sillycat-spark-solr-1.0-jar-with-dependencies.jar com.sillycat.sparkjava.app.CountLinesOfKeywordApp

>bin/spark-submit --class com.sillycat.sparkjava.SparkJavaApp --master yarn-client /home/ec2-user/users/carl/sillycat-spark-java/sillycat-spark-solr/target/sillycat-spark-solr-1.0-jar-with-dependencies.jar com.sillycat.sparkjava.app.SeniorJavaFeedApp

References:

Spark library

Write to XML - stax

Spark to s3

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值