1、创建maven project,这个就在不说了。project结构如下所示,
2、添加pom文件的spark依赖。
<properties> <spark.version>2.2.0</spark.version> <scala.version>2.11</scala.version> </properties> <dependencies> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-core_${scala.version}</artifactId> <version>${spark.version}</version> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-streaming_${scala.version}</artifactId> <version>${spark.version}</version> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-sql_${scala.version}</artifactId> <version>${spark.version}</version> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-hive_${scala.version}</artifactId> <version>${spark.version}</version> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-mllib_${scala.version}</artifactId> <version>${spark.version}</version> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-streaming-kafka-0-8_2.11</artifactId> <version>${spark.version}</version> </dependency> <dependency> <groupId>org.apache.kafka</groupId> <artifactId>kafka-clients</artifactId> <version>0.10.0.0</version> </dependency> <dependency> <groupId>mysql</groupId> <artifactId>mysql-connector-java</artifactId> <version>5.1.38</version> </dependency> <!--<dependency>--> <!--<groupId>commons-dbutils</groupId>--> <!--<artifactId>commons-dbutils</artifactId>--> <!--<version>1.6</version>--> <!--</dependency>--> <dependency> <groupId>log4j</groupId> <artifactId>log4j</artifactId> <version>1.2.17</version> </dependency>
3、我的hive版本是1.2.2,Hadoop-2.7.1 把hive的配置文件hive-site.xml拷贝到maven project 下面的src/resource/下面。同时在该目录下面创建log4j.properties的配置文件,方便我们查看运行的日志信息:下面是log4j的配置信息,可以直接拷贝使用,
###\u8BBE\u7F6E ###
log4j.rootLogger = info,stdout
###\u8F93\u51FA\u4FE1\u606F\u5230\u63A7\u5236\u62AC ###
log4j.appender.stdout = org.apache.log4j.ConsoleAppender
log4j.appender.stdout.Target = System.out
log4j.appender.stdout.layout = org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern = [%-5p] %d{yyyy-MM-dd HH:mm:ss,SSS} method:%l%n%m%n
下面给出hive-site.xml文件的内容。
<configuration> <property> <name>hive.metastore.warehouse.dir</name> <value>hdfs://172.16.0.37:9000/user/hive/warehouse</value> # 这里的地址要改成你自己的hadoop的namenode的地址 </property> <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://172.16.0.37:3306/hive?createDatabaseIfNotExist=true</value> # 这个地址是hive元数据保存的mysql的主机地址。 </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> </property> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>root</value> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>root</value> </property> </configuration>
下面是demo测试程序。
下面是运行的结果显示: