spark和elasticsearch集成

  1. 在spark程序中引入elasticsearch
  • 引入elasticsearch的依赖,将elasticsearch-hadoop上传到集群中,这里scope范围为provided即可。
<dependencies>
	<dependency>
		<groupId>org.elasticsearch</groupId>
		<artifactId>elasticsearch-hadoop</artifactId>
		<version>2.4.0</version>
		<scope>provided</scope>
		<exclusions>
			<exclusion>
			<groupId>org.apache.spark</groupId>
			<artifactId>spark-core_2.10</artifactId>
			</exclusion>
			<exclusion>
			<groupId>org.apache.spark</groupId>
			<artifactId>spark-sql_2.10</artifactId>
			</exclusion>
			<exclusion>
			<groupId>org.apache.storm</groupId>
			<artifactId>storm-core</artifactId>
			</exclusion>
			<exclusion>
			<groupId>cascading</groupId>
			<artifactId>cascading-hadoop</artifactId>
			</exclusion>
		</exclusions>
	</dependency>
</dependencies>
<repositories>
	<repository>
	<id>cloudera-repos</id>
	<name>Cloudera Repos</name>
	<url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
	</repository>
	<repository>
	<id>Akka repository</id>
	<url>http://repo.akka.io/releases</url>
	</repository>
	<repository>
	<id>jboss</id>
	<url>http://repository.jboss.org/nexus/content/groups/public-jboss</url>
	</repository>
	<repository>
	<id>Sonatype snapshots</id>
	<url>http://oss.sonatype.org/content/repositories/snapshots/</url>
	</repository>
	<repository>
	<id>sonatype-oss</id>
	<url>http://oss.sonatype.org/content/repositories/snapshots</url>
	<snapshots><enabled>true</enabled></snapshots>
	</repository>
</repositories>
  • 在代码中使用elasticsearch
import org.elasticsearch.spark.sql._
def main(args: Array[String]): Unit ={
	val conf = new SparkConf()
	conf.setAppName("Spark Action ElasticSearch")
	conf.set("es.index.auto.create", "true")
	conf.set("es.nodes","192.168.1.11")
	conf.set("es.port","9200")
	val sc: SparkContext = new SparkContext(conf)
	val sqlContext = new HiveContext(sc)
	val df: DataFrame = sqlContext.sql("select * from info limit 50")
	//保存数据到ES
	df.saveToEs("myindex/info")
	从ES中读取数据
	val esdf = sqlContext.read.format("org.elasticsearch.spark.sql").load("myindex/info")
	esdf.count
    sc.stop()
}
  1. 在spark-shell中引入elasticsearch,以cdh为例。
  • 去maven中央仓库下载elasticsearch-hadoop的jar包,将jar包上传到目录:/opt/cloudera/parcels/CDH-5.5.0-1.cdh5.5.0.p0.8/jars/中,在/opt/cloudera/parcels/CDH/lib/spark/conf/classpath.txt(spark的classpath配置文件)文件中最后添加如下内容:
/opt/cloudera/parcels/CDH-5.5.0-1.cdh5.5.0.p0.8/jars/elasticsearch-hadoop-2.4.0.jar
  • 启动spark-shell,命令如下:
spark-shell --master yarn --conf spark.es.nodes=192.168.1.11  spark.es.port=9200  spark.es.index.auto.create=true
  1. 使用jdbc连接elasticsearch查询
  • 引入maven依赖
<dependency>
	<groupId>org.elasticsearch</groupId>
	<artifactId>elasticsearch</artifactId>
	<version>2.4.0</version>
</dependency>  
<dependency>
	<groupId>org.nlpcn</groupId>
	<artifactId>elasticsearch-sql</artifactId>
	<version>2.4.0</version>
</dependency>
<dependency>
	<groupId>com.alibaba</groupId>
	<artifactId>druid</artifactId>
	<version>1.0.15</version>
</dependency>
<dependency>
	<groupId>mysql</groupId>
	<artifactId>mysql-connector-java</artifactId>
	<version>5.1.35</version>
</dependency>
  • 使用代码查询
public static void query(){
	try {
		Connection connection = getConnection();
		String sql = "select * from bigdata/student where usertype > 5 limit 5";
		PreparedStatement ps = connection.prepareStatement(sql);
		ResultSet rs = ps.executeQuery();
		while(rs.next()){
			System.out.println(rs.getString("_id") +" "+rs.getString("recordtime")
			+"  "+rs.getInt("area")+"  "+rs.getInt("usertype")+"  "+rs.getInt("count"));
		}
		ps.close();
		connection.close();
	} catch (Exception e) {
		e.printStackTrace();
	}
}
/**
 * 获取 ES jdbc连接
 */
public static Connection getConnection() throws Exception{
	String url = "jdbc:elasticsearch://192.168.1.11:9300";
	Properties properties = new Properties();
	properties.put("url", url);
	DruidDataSource dds = (DruidDataSource) ElasticSearchDruidDataSourceFactory.createDataSource(properties);
	Connection connection = dds.getConnection();
	return connection;
}

官网参考资料:Elasticsearch for Apache Hadoop

转载于:https://my.oschina.net/cjun/blog/803613

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值