从零搭建IDEA环境:
1)安装Java JDK(jdk-8u101-windows-x64.exe)并配置环境变量,安装scala JDK(scala-2.11.0.msi)并配置环境变量;
2)安装maven(apache-maven-3.6.0-bin)仓库并配置本地仓库以及选用镜像:
<localRepository>E:\maven</localRepository>
<mirrors>
<mirror>
<id>alimaven</id>
<name>aliyun maven</name>
<url>http://maven.aliyun.com/nexus/content/groups/public/</url>
<mirrorOf>central</mirrorOf>
</mirror>
</mirrors>
3)破解IDEA并在IDEA上设置maven和JavaJDK和ScalaJDK
a.在File | Settings | Build, Execution, Deployment | Build Tools | Maven下设置maven home directory和User settings file。
b.将scala-intellij-bin-2018.3.7.zip放到IDEA安装目录的plugins目录底下(E:\RunSoft\IDEA\IntelliJ IDEA 2018.3.5\plugins)
,在File | Settings | Plugins下选择从本地安装该插件。
4)新建工程并配置POM.xml
例如:
<?xml version="1.0" encoding="UTF-8"?> <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>com.guande</groupId> <artifactId>sparkTest</artifactId> <version>1.0-SNAPSHOT</version> <properties> <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding> <project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding> <scala.binary.version>2.11</scala.binary.version> <PermGen>64m</PermGen> <MaxPermGen>512m</MaxPermGen> <spark.version>2.0.0</spark.version> <scala.version>2.11</scala.version> </properties> <dependencies> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-core_${scala.version}</artifactId> <version>${spark.version}</version> <!--<exclusions>--> <!--<exclusion>--> <!--<groupId>org.slf4j</groupId>--> <!--<artifactId>slf4j-log4j12</artifactId>--> <!--</exclusion>--> <!--</exclusions>--> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-streaming_${scala.version}</artifactId> <version>${spark.version}</version> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-sql_${scala.version}</artifactId> <version>${spark.version}</version> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-hive_${scala.version}</artifactId> <version>${spark.version}</version> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-mllib_${scala.version}</artifactId> <version>${spark.version}</version> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-streaming-kafka-0-8_${scala.version}</artifactId> <version>${spark.version}</version> </dependency> <dependency> <groupId>com.alibaba</groupId> <artifactId>fastjson</artifactId> <version>1.2.17</version> </dependency> <dependency> <groupId>com.huaban</groupId> <artifactId>jieba-analysis</artifactId> <version>1.0.2</version> </dependency> <dependency> <groupId>mysql</groupId> <artifactId>mysql-connector-java</artifactId> <version>5.1.31</version> </dependency> <dependency> <groupId>org.scalatest</groupId> <artifactId>scalatest_2.11</artifactId> <version>3.2.0-SNAP5</version> <scope>test</scope> </dependency> <!-- https://mvnrepository.com/artifact/junit/junit --> <dependency> <groupId>junit</groupId> <artifactId>junit</artifactId> <version>4.12</version> <scope>test</scope> </dependency> <!-- https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-client --> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-client</artifactId> <version>2.6.1</version> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-common</artifactId> <version>2.6.1</version> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-hdfs</artifactId> <version>2.6.1</version> </dependency> <dependency> <groupId>org.apache.hbase</groupId> <artifactId>hbase-server</artifactId> <version>1.3.1</version> </dependency> <dependency> <groupId>org.apache.hbase</groupId> <artifactId>hbase-client</artifactId> <version>1.3.1</version> <exclusions> <exclusion> <groupId>log4j</groupId> <artifactId>log4j</artifactId> </exclusion> <exclusion> <groupId>org.apache.thrift</groupId> <artifactId>thrift</artifactId> </exclusion> <exclusion> <groupId>org.jruby</groupId> <artifactId>jruby-complete</artifactId> </exclusion> <exclusion> <groupId>org.slf4j</groupId> <artifactId>slf4j-log4j12</artifactId> </exclusion> <exclusion> <groupId>org.mortbay.jetty</groupId> <artifactId>jsp-2.1</artifactId> </exclusion> <exclusion> <groupId>org.mortbay.jetty</groupId> <artifactId>jsp-api-2.1</artifactId> </exclusion> <exclusion> <groupId>org.mortbay.jetty</groupId> <artifactId>servlet-api-2.5</artifactId> </exclusion> <exclusion> <groupId>com.sun.jersey</groupId> <artifactId>jersey-core</artifactId> </exclusion> <exclusion> <groupId>com.sun.jersey</groupId> <artifactId>jersey-json</artifactId> </exclusion> <exclusion> <groupId>com.sun.jersey</groupId> <artifactId>jersey-server</artifactId> </exclusion> <exclusion> <groupId>org.mortbay.jetty</groupId> <artifactId>jetty</artifactId> </exclusion> <exclusion> <groupId>org.mortbay.jetty</groupId> <artifactId>jetty-util</artifactId> </exclusion> <exclusion> <groupId>tomcat</groupId> <artifactId>jasper-runtime</artifactId> </exclusion> <exclusion> <groupId>tomcat</groupId> <artifactId>jasper-compiler</artifactId> </exclusion> <exclusion> <groupId>org.jruby</groupId> <artifactId>jruby-complete</artifactId> </exclusion> <exclusion> <groupId>org.jboss.netty</groupId> <artifactId>netty</artifactId> </exclusion> <exclusion> <groupId>io.netty</groupId> <artifactId>netty</artifactId> </exclusion> </exclusions> </dependency> <dependency> <groupId>org.apache.hbase</groupId> <artifactId>hbase-protocol</artifactId> <version>1.3.1</version> </dependency> <dependency> <groupId>org.apache.hbase</groupId> <artifactId>hbase-annotations</artifactId> <version>1.3.1</version> <type>test-jar</type> <scope>test</scope> </dependency> <dependency> <groupId>org.apache.hbase</groupId> <artifactId>hbase-hadoop-compat</artifactId> <version>1.3.1</version> <scope>test</scope> <type>test-jar</type> <exclusions> <exclusion> <groupId>log4j</groupId> <artifactId>log4j</artifactId> </exclusion> <exclusion> <groupId>org.apache.thrift</groupId> <artifactId>thrift</artifactId> </exclusion> <exclusion> <groupId>org.jruby</groupId> <artifactId>jruby-complete</artifactId> </exclusion> <exclusion> <groupId>org.slf4j</groupId> <artifactId>slf4j-log4j12</artifactId> </exclusion> <exclusion> <groupId>org.mortbay.jetty</groupId> <artifactId>jsp-2.1</artifactId> </exclusion> <exclusion> <groupId>org.mortbay.jetty</groupId> <artifactId>jsp-api-2.1</artifactId> </exclusion> <exclusion> <groupId>org.mortbay.jetty</groupId> <artifactId>servlet-api-2.5</artifactId> </exclusion> <exclusion> <groupId>com.sun.jersey</groupId> <artifactId>jersey-core</artifactId> </exclusion> <exclusion> <groupId>com.sun.jersey</groupId> <artifactId>jersey-json</artifactId> </exclusion> <exclusion> <groupId>com.sun.jersey</groupId> <artifactId>jersey-server</artifactId> </exclusion> <exclusion> <groupId>org.mortbay.jetty</groupId> <artifactId>jetty</artifactId> </exclusion> <exclusion> <groupId>org.mortbay.jetty</groupId> <artifactId>jetty-util</artifactId> </exclusion> <exclusion> <groupId>tomcat</groupId> <artifactId>jasper-runtime</artifactId> </exclusion> <exclusion> <groupId>tomcat</groupId> <artifactId>jasper-compiler</artifactId> </exclusion> <exclusion> <groupId>org.jruby</groupId> <artifactId>jruby-complete</artifactId> </exclusion> <exclusion> <groupId>org.jboss.netty</groupId> <artifactId>netty</artifactId> </exclusion> <exclusion> <groupId>io.netty</groupId> <artifactId>netty</artifactId> </exclusion> </exclusions> </dependency> <dependency> <groupId>org.apache.hbase</groupId> <artifactId>hbase-hadoop2-compat</artifactId> <version>1.3.1</version> <scope>test</scope> <type>test-jar</type> <exclusions> <exclusion> <groupId>log4j</groupId> <artifactId>log4j</artifactId> </exclusion> <exclusion> <groupId>org.apache.thrift</groupId> <artifactId>thrift</artifactId> </exclusion> <exclusion> <groupId>org.jruby</groupId> <artifactId>jruby-complete</artifactId> </exclusion> <exclusion> <groupId>org.slf4j</groupId> <artifactId>slf4j-log4j12</artifactId> </exclusion> <exclusion> <groupId>org.mortbay.jetty</groupId> <artifactId>jsp-2.1</artifactId> </exclusion> <exclusion> <groupId>org.mortbay.jetty</groupId> <artifactId>jsp-api-2.1</artifactId> </exclusion> <exclusion> <groupId>org.mortbay.jetty</groupId> <artifactId>servlet-api-2.5</artifactId> </exclusion> <exclusion> <groupId>com.sun.jersey</groupId> <artifactId>jersey-core</artifactId> </exclusion> <exclusion> <groupId>com.sun.jersey</groupId> <artifactId>jersey-json</artifactId> </exclusion> <exclusion> <groupId>com.sun.jersey</groupId> <artifactId>jersey-server</artifactId> </exclusion> <exclusion> <groupId>org.mortbay.jetty</groupId> <artifactId>jetty</artifactId> </exclusion> <exclusion> <groupId>org.mortbay.jetty</groupId> <artifactId>jetty-util</artifactId> </exclusion> <exclusion> <groupId>tomcat</groupId> <artifactId>jasper-runtime</artifactId> </exclusion> <exclusion> <groupId>tomcat</groupId> <artifactId>jasper-compiler</artifactId> </exclusion> <exclusion> <groupId>org.jruby</groupId> <artifactId>jruby-complete</artifactId> </exclusion> <exclusion> <groupId>org.jboss.netty</groupId> <artifactId>netty</artifactId> </exclusion> <exclusion> <groupId>io.netty</groupId> <artifactId>netty</artifactId> </exclusion> </exclusions> </dependency> <dependency> <groupId>org.apache.hbase</groupId> <artifactId>hbase-client</artifactId> <version>1.3.1</version> </dependency> <dependency> <groupId>org.apache.hbase</groupId> <artifactId>hbase-server</artifactId> <version>1.3.1</version> <!--<scope>provided</scope>--> </dependency> </dependencies> <build> <plugins> <plugin> <groupId>org.scala-tools</groupId> <artifactId>maven-scala-plugin</artifactId> <version>2.15.2</version> <executions> <execution> <goals> <goal>compile</goal> <goal>testCompile</goal> </goals> </execution> </executions> </plugin> <plugin> <artifactId>maven-compiler-plugin</artifactId> <version>3.6.0</version> <configuration> <source>1.8</source> <target>1.8</target> </configuration> </plugin> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-assembly-plugin</artifactId> <version>2.3</version> <configuration> <descriptorRefs> <descriptorRef>jar-with-dependencies</descriptorRef> </descriptorRefs> </configuration> </plugin> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-surefire-plugin</artifactId> <version>2.19</version> <configuration> <skip>true</skip> </configuration> </plugin> </plugins> <defaultGoal>compile</defaultGoal> </build> </project>
4)本地安装spark(spark-2.0.0-bin-hadoop2.6.tgz)和Hadoop(hadoop-2.6.1.tar.gz)并各自配置环境变量和path路径,例如:
SPARK_HOME:E:\RunSoft\spark-2.0.0-bin-hadoop2.6\spark-2.0.0-bin-hadoop2.6
HADOOP_HOME:E:\RunSoft\hadoop-2.6.1\hadoop-2.6.1
PATH:%SPARK_HOME%\bin;%HADOOP_HOME%\bin;
需要下载Hadoop对应版本的hadoop2.6-winutils&hadoop.dll放到Hadoop安装目录的bin文件夹和sbin文件夹下。
至此,win7下SPARK和HADOOP已经搭建完毕。
5)IDEA下运行SPARK程序需要SPARKSESSION设置如下:
val spark = SparkSession .builder() .master("local[2]") .appName("example") .config("spark.sql.warehouse.dir", "file:///e:/tmp/spark-warehouse") .getOrCreate()
master设置本地两个线程运行,
config设置本地仓库位置,不然用到HDFS地方会报错。
6)本地测试程序:
import java.sql.{DriverManager, ResultSet} import org.apache.spark.sql.SparkSession import org.apache.spark.{SparkConf, SparkContext} import scala.collection.mutable.ArrayBuffer object SparkTest { def main(args: Array[String]): Unit = { val spark = SparkSession .builder() .master("local[2]") .appName("example") .config("spark.sql.warehouse.dir", "file:///e:/tmp/spark-warehouse") .getOrCreate() import spark.implicits._ val jdbcDF = spark.read.format("jdbc") .option("url","jdbc:mysql://localhost:3306/luluTest") .option("driver","com.mysql.jdbc.Driver") .option("dbtable","users") .option("user","root") .option("password","root") .load() jdbcDF.show() } }