IDEA WordCount jar包上传spark调试及排错

最新推荐文章于 2020-09-26 15:43:15 发布

cqi024442

最新推荐文章于 2020-09-26 15:43:15 发布

阅读量263

点赞数

文章标签：开发工具 scala java

Based on:

Mac os

Spark 2.4.3

(Spark running on a standalone mode reference blog : http://blog.itpub.net/69908925/viewspace-2644303/ )

scala 2.12.8

IDEA 2019

1 IDEA-File-Project Structure-Libarary-Scala SDK

select version 2.11.12

这处选择的版本需要跟spark scala运行版本一致，默认的是本机装的Scala版本2.12.8，spark上运行会报主类错误

2 新建project ，pom.xml添加依赖


<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>
    <groupId>com.ny.service</groupId>
    <artifactId>scala517</artifactId>
    <version>1.0</version>
    <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-core -->
    <dependencies>
        <!-- https://mvnrepository.com/artifact/org.scala-lang/scala-library -->
        <!-- 以下dependency都要修改成自己的scala,spark,hadoop版本-->
        <dependency>
            <groupId>org.scala-lang</groupId>
            <artifactId>scala-library</artifactId>
            <version>2.11.12</version>
        </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_2.11</artifactId>
        <version>2.4.3</version>
    </dependency>
    </dependencies>
    <build>
        <!--程序主目录,按照自己的路径修改，如果有测试文件还要加一个testDirectory-->
        <sourceDirectory>src/main/scala</sourceDirectory>
        <plugins>
            <plugin>
                <groupId>org.scala-tools</groupId>
                <artifactId>maven-scala-plugin</artifactId>
                <version>2.15.2</version>
                <executions>
                    <execution>
                        <goals>
                            <goal>compile</goal>
                            <goal>testCompile</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-shade-plugin</artifactId>
                <version>2.4.3</version>
                <executions>
                    <execution>
                        <phase>package</phase>
                        <goals>
                            <goal>shade</goal>
                        </goals>
                        <configuration>
                            <filters>
                                <filter>
                                    <artifact>*:*</artifact>
                                    <excludes>
                                        <exclude>META-INF/*.SF</exclude>
                                        <exclude>META-INF/*.DSA</exclude>
                                        <exclude>META-INF/*.RSA</exclude>
                                    </excludes>
                                </filter>
                            </filters>
                            <!--<transformers>-->
                            <!--<transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">-->
                            <!--<mainClass></mainClass>-->
                            <!--</transformer>-->
                            <!--</transformers>-->
                        </configuration>
                    </execution>
                </executions>
            </plugin>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <configuration>
                    <source>1.8</source>
                    <target>1.8</target>
                </configuration>
            </plugin>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-jar-plugin</artifactId>
                <configuration>
                    <archive>
                        <manifest>
                            <addClasspath>true</addClasspath>
                            <useUniqueVersions>false</useUniqueVersions>
                            <classpathPrefix>lib/</classpathPrefix>
                            <!--修改为自己的包名.类名,右键类->copy reference-->
                            <mainClass>com.ny.service.WordCount</mainClass>
                        </manifest>
                    </archive>
                </configuration>
            </plugin>
        </plugins>
    </build>
</project>

scala library 选择spark中的Scala版本 2.11.12 也是目前支持的最近版本

org.apache.spark 也选择2.11

否则会出现主类错误：

19/05/16 10:52:03 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:60010 (size: 22.9 KB, free: 366.3 MB)

19/05/16 10:52:03 INFO SparkContext: Created broadcast 0 from textFile at WordCount.scala:18

Exception in thread "main" java.lang.BootstrapMethodError: java.lang.NoClassDefFoundError: scala/runtime/java8/JFunction2$mcIII$sp

at com.nyc.WordCount$.main(WordCount.scala:24)

at com.nyc.WordCount.main(WordCount.scala)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

如何查看spark 中Scala版本号

进入路径：

/usr/local/opt/spark-2.4.3/jars

3 word count测试脚本


package com.ny.service
import org.apache.spark.{SparkConf, SparkContext}
object WordCount{
  def main(args: Array[String]): Unit = {
    // 1 创建配置信息
    val conf = new SparkConf().setAppName("wc")
    // 2 创建spark context sc
     val  sc = new SparkContext(conf)
    // 3 处理逻辑
    //读取文件
    val lines = sc.textFile(args(0))
    //压平
    val words = lines.flatMap(_.split(" "))
    //map
    val k2v = words.map((_,1))
    val results = k2v.reduceByKey(_+_)
    //保存数据
    results.saveAsTextFile(args(1))
    // 4 关闭连接
    sc.stop()
  }
}

4 打包

复制到spark家目录下，因为standalone模式所以没有启动Hadoop集群

nancylulululu:spark-2.4.3 nancy$ mv /Users/nancy/IdeaProjects/scala517/target/original-scala517-1.0.jar wc.jar

5 spark submit 执行

bin/spark-submit \
--class com.ny.service.WordCount \
--master spark://localhost:7077 \
./wc.jar \
file:///usr/local/opt/spark-2.4.3/test/1test \
file:///usr/local/opt/spark-2.4.3/test/out

如果是Hadoop file改为hdfs文件系统路径

查看执行结果文件：

nancylulululu:out nancy$ ls
_SUCCESSpart-00000part-00001
nancylulululu:out nancy$ cat part-00000
(scala,2)
(hive,1)
(mysql,1)
(hello,5)
(java,2)

来自 “ ITPUB博客 ” ，链接：http://blog.itpub.net/69908925/viewspace-2644643/，如需转载，请注明出处，否则将追究法律责任。

转载于:http://blog.itpub.net/69908925/viewspace-2644643/

cqi024442

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
IDEA WordCount jar包上传spark调试及排错

Based on:Mac osSpark 2.4.3(Spark running on a standalone mode reference blog :http:/...
复制链接

扫一扫