win7的IDEA上运行spark程序以及windows下搭建搭建单机版本的spark

从零搭建IDEA环境:

1)安装Java JDK(jdk-8u101-windows-x64.exe)并配置环境变量,安装scala JDK(scala-2.11.0.msi)并配置环境变量;

 

2)安装maven(apache-maven-3.6.0-bin)仓库并配置本地仓库以及选用镜像:

<localRepository>E:\maven</localRepository>

 

<mirrors>
<mirror>

<id>alimaven</id>

<name>aliyun maven</name>

<url>http://maven.aliyun.com/nexus/content/groups/public/</url>

<mirrorOf>central</mirrorOf>

</mirror>

</mirrors>

 

3)破解IDEA并在IDEA上设置maven和JavaJDK和ScalaJDK

a.在File | Settings | Build, Execution, Deployment | Build Tools | Maven下设置maven home directory和User settings file。

b.将scala-intellij-bin-2018.3.7.zip放到IDEA安装目录的plugins目录底下(E:\RunSoft\IDEA\IntelliJ IDEA 2018.3.5\plugins)

,在File | Settings | Plugins下选择从本地安装该插件。

 

4)新建工程并配置POM.xml

例如:

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>com.guande</groupId>
    <artifactId>sparkTest</artifactId>
    <version>1.0-SNAPSHOT</version>
    <properties>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding>
        <scala.binary.version>2.11</scala.binary.version>
        <PermGen>64m</PermGen>
        <MaxPermGen>512m</MaxPermGen>
        <spark.version>2.0.0</spark.version>
        <scala.version>2.11</scala.version>
    </properties>
    <dependencies>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_${scala.version}</artifactId>
            <version>${spark.version}</version>
            <!--<exclusions>-->
            <!--<exclusion>-->
            <!--<groupId>org.slf4j</groupId>-->
            <!--<artifactId>slf4j-log4j12</artifactId>-->
            <!--</exclusion>-->
            <!--</exclusions>-->
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-streaming_${scala.version}</artifactId>
            <version>${spark.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-sql_${scala.version}</artifactId>
            <version>${spark.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-hive_${scala.version}</artifactId>
            <version>${spark.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-mllib_${scala.version}</artifactId>
            <version>${spark.version}</version>
        </dependency>

        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-streaming-kafka-0-8_${scala.version}</artifactId>
            <version>${spark.version}</version>
        </dependency>
        <dependency>
            <groupId>com.alibaba</groupId>
            <artifactId>fastjson</artifactId>
            <version>1.2.17</version>
        </dependency>
        <dependency>
            <groupId>com.huaban</groupId>
            <artifactId>jieba-analysis</artifactId>
            <version>1.0.2</version>
        </dependency>
        <dependency>
            <groupId>mysql</groupId>
            <artifactId>mysql-connector-java</artifactId>
            <version>5.1.31</version>
        </dependency>
        <dependency>
            <groupId>org.scalatest</groupId>
            <artifactId>scalatest_2.11</artifactId>
            <version>3.2.0-SNAP5</version>
            <scope>test</scope>
        </dependency>
        <!-- https://mvnrepository.com/artifact/junit/junit -->
        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <version>4.12</version>
            <scope>test</scope>
        </dependency>
        <!-- https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-client -->
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-client</artifactId>
            <version>2.6.1</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-common</artifactId>
            <version>2.6.1</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-hdfs</artifactId>
            <version>2.6.1</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hbase</groupId>
            <artifactId>hbase-server</artifactId>
            <version>1.3.1</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hbase</groupId>
            <artifactId>hbase-client</artifactId>
            <version>1.3.1</version>
            <exclusions>
                <exclusion>
                    <groupId>log4j</groupId>
                    <artifactId>log4j</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>org.apache.thrift</groupId>
                    <artifactId>thrift</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>org.jruby</groupId>
                    <artifactId>jruby-complete</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>org.slf4j</groupId>
                    <artifactId>slf4j-log4j12</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>org.mortbay.jetty</groupId>
                    <artifactId>jsp-2.1</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>org.mortbay.jetty</groupId>
                    <artifactId>jsp-api-2.1</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>org.mortbay.jetty</groupId>
                    <artifactId>servlet-api-2.5</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>com.sun.jersey</groupId>
                    <artifactId>jersey-core</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>com.sun.jersey</groupId>
                    <artifactId>jersey-json</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>com.sun.jersey</groupId>
                    <artifactId>jersey-server</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>org.mortbay.jetty</groupId>
                    <artifactId>jetty</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>org.mortbay.jetty</groupId>
                    <artifactId>jetty-util</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>tomcat</groupId>
                    <artifactId>jasper-runtime</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>tomcat</groupId>
                    <artifactId>jasper-compiler</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>org.jruby</groupId>
                    <artifactId>jruby-complete</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>org.jboss.netty</groupId>
                    <artifactId>netty</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>io.netty</groupId>
                    <artifactId>netty</artifactId>
                </exclusion>
            </exclusions>
        </dependency>
        <dependency>
            <groupId>org.apache.hbase</groupId>
            <artifactId>hbase-protocol</artifactId>
            <version>1.3.1</version>
        </dependency>

        <dependency>
            <groupId>org.apache.hbase</groupId>
            <artifactId>hbase-annotations</artifactId>
            <version>1.3.1</version>
            <type>test-jar</type>
            <scope>test</scope>
        </dependency>

        <dependency>
            <groupId>org.apache.hbase</groupId>
            <artifactId>hbase-hadoop-compat</artifactId>
            <version>1.3.1</version>
            <scope>test</scope>
            <type>test-jar</type>
            <exclusions>
                <exclusion>
                    <groupId>log4j</groupId>
                    <artifactId>log4j</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>org.apache.thrift</groupId>
                    <artifactId>thrift</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>org.jruby</groupId>
                    <artifactId>jruby-complete</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>org.slf4j</groupId>
                    <artifactId>slf4j-log4j12</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>org.mortbay.jetty</groupId>
                    <artifactId>jsp-2.1</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>org.mortbay.jetty</groupId>
                    <artifactId>jsp-api-2.1</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>org.mortbay.jetty</groupId>
                    <artifactId>servlet-api-2.5</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>com.sun.jersey</groupId>
                    <artifactId>jersey-core</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>com.sun.jersey</groupId>
                    <artifactId>jersey-json</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>com.sun.jersey</groupId>
                    <artifactId>jersey-server</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>org.mortbay.jetty</groupId>
                    <artifactId>jetty</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>org.mortbay.jetty</groupId>
                    <artifactId>jetty-util</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>tomcat</groupId>
                    <artifactId>jasper-runtime</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>tomcat</groupId>
                    <artifactId>jasper-compiler</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>org.jruby</groupId>
                    <artifactId>jruby-complete</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>org.jboss.netty</groupId>
                    <artifactId>netty</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>io.netty</groupId>
                    <artifactId>netty</artifactId>
                </exclusion>
            </exclusions>
        </dependency>

        <dependency>
            <groupId>org.apache.hbase</groupId>
            <artifactId>hbase-hadoop2-compat</artifactId>
            <version>1.3.1</version>
            <scope>test</scope>
            <type>test-jar</type>
            <exclusions>
                <exclusion>
                    <groupId>log4j</groupId>
                    <artifactId>log4j</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>org.apache.thrift</groupId>
                    <artifactId>thrift</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>org.jruby</groupId>
                    <artifactId>jruby-complete</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>org.slf4j</groupId>
                    <artifactId>slf4j-log4j12</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>org.mortbay.jetty</groupId>
                    <artifactId>jsp-2.1</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>org.mortbay.jetty</groupId>
                    <artifactId>jsp-api-2.1</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>org.mortbay.jetty</groupId>
                    <artifactId>servlet-api-2.5</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>com.sun.jersey</groupId>
                    <artifactId>jersey-core</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>com.sun.jersey</groupId>
                    <artifactId>jersey-json</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>com.sun.jersey</groupId>
                    <artifactId>jersey-server</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>org.mortbay.jetty</groupId>
                    <artifactId>jetty</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>org.mortbay.jetty</groupId>
                    <artifactId>jetty-util</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>tomcat</groupId>
                    <artifactId>jasper-runtime</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>tomcat</groupId>
                    <artifactId>jasper-compiler</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>org.jruby</groupId>
                    <artifactId>jruby-complete</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>org.jboss.netty</groupId>
                    <artifactId>netty</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>io.netty</groupId>
                    <artifactId>netty</artifactId>
                </exclusion>
            </exclusions>
        </dependency>
        <dependency>
            <groupId>org.apache.hbase</groupId>
            <artifactId>hbase-client</artifactId>
            <version>1.3.1</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hbase</groupId>
            <artifactId>hbase-server</artifactId>
            <version>1.3.1</version>
            <!--<scope>provided</scope>-->
        </dependency>
    </dependencies>
    <build>
        <plugins>
            <plugin>
                <groupId>org.scala-tools</groupId>
                <artifactId>maven-scala-plugin</artifactId>
                <version>2.15.2</version>
                <executions>
                    <execution>
                        <goals>
                            <goal>compile</goal>
                            <goal>testCompile</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>

            <plugin>
                <artifactId>maven-compiler-plugin</artifactId>
                <version>3.6.0</version>
                <configuration>
                    <source>1.8</source>
                    <target>1.8</target>
                </configuration>
            </plugin>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-assembly-plugin</artifactId>
                <version>2.3</version>
                <configuration>
                    <descriptorRefs>
                        <descriptorRef>jar-with-dependencies</descriptorRef>
                    </descriptorRefs>
                </configuration>
            </plugin>

            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-surefire-plugin</artifactId>
                <version>2.19</version>
                <configuration>
                    <skip>true</skip>
                </configuration>
            </plugin>

        </plugins>
        <defaultGoal>compile</defaultGoal>
    </build>
</project>

4)本地安装spark(spark-2.0.0-bin-hadoop2.6.tgz)和Hadoop(hadoop-2.6.1.tar.gz)并各自配置环境变量和path路径,例如:

SPARK_HOME:E:\RunSoft\spark-2.0.0-bin-hadoop2.6\spark-2.0.0-bin-hadoop2.6

HADOOP_HOME:E:\RunSoft\hadoop-2.6.1\hadoop-2.6.1

PATH:%SPARK_HOME%\bin;%HADOOP_HOME%\bin;

 

需要下载Hadoop对应版本的hadoop2.6-winutils&hadoop.dll放到Hadoop安装目录的bin文件夹和sbin文件夹下。

 

至此,win7下SPARK和HADOOP已经搭建完毕。

 

5)IDEA下运行SPARK程序需要SPARKSESSION设置如下:

val spark = SparkSession
  .builder()
  .master("local[2]")
  .appName("example")
  .config("spark.sql.warehouse.dir", "file:///e:/tmp/spark-warehouse")
  .getOrCreate()

master设置本地两个线程运行,

config设置本地仓库位置,不然用到HDFS地方会报错。

 

6)本地测试程序:

import java.sql.{DriverManager, ResultSet}

import org.apache.spark.sql.SparkSession
import org.apache.spark.{SparkConf, SparkContext}

import scala.collection.mutable.ArrayBuffer

object SparkTest {
  def main(args: Array[String]): Unit = {
       val spark = SparkSession
      .builder()
      .master("local[2]")
      .appName("example")
      .config("spark.sql.warehouse.dir", "file:///e:/tmp/spark-warehouse")
      .getOrCreate()
    import spark.implicits._

   val jdbcDF = spark.read.format("jdbc")
      .option("url","jdbc:mysql://localhost:3306/luluTest")
      .option("driver","com.mysql.jdbc.Driver")
      .option("dbtable","users")
      .option("user","root")
      .option("password","root")
      .load()
    jdbcDF.show()
  }
}

 

 

 

没有更多推荐了,返回首页