使用IDEA 构建入门级程序以及问题解决

最新推荐文章于 2024-08-02 13:50:36 发布

巴岸

最新推荐文章于 2024-08-02 13:50:36 发布

阅读量2.4k

点赞数

分类专栏：云计算文章标签： idea spark scala 入门级学习

本文链接：https://blog.csdn.net/tu5213xf/article/details/50483571

版权

云计算专栏收录该内容

3 篇文章 0 订阅

订阅专栏

本文介绍了如何使用IDEA构建Spark Scala入门级程序，包括下载安装Spark、IDEA和Scala，配置环境变量，解决启动Spark时遇到的问题。在IDEA中打包jar文件时遇到错误，如'job has not accepted any resources'，通过设置master URL为'spark://192.168.1.118:7077'来解决。此外，还提到了Spark与Scala版本兼容性问题，将Scala从2.11.5降至2.10.4以解决问题。最后，针对本地测试时出现的'ClassNotFoundException'，提供了相应的解决方法。

摘要由CSDN通过智能技术生成

首先下载并且安装

spark

IDEA

scala

配置环境变量

vim /etc/profile 的配置

   export JAVA_HOME=/usr/java/jdk1.7.0_60
   export HADOOP_HOME=/itcast/hadoop-2.2.0
   export apache=/java/apache-tomcat-7.0.27
   export SCALA_HOME=/itcast/scala-2.10.5
   export SPARK_HOME=/itcast/spark-1.3.0-bin-hadoop2.4
   export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$apache/bin:$SCALA_HOME/bin:$SPARK_HOME/bin

spark conf 配置文件 spark-env.sh.template 修改spark-env.sh

        export SCALA_HOME=/itcast/scala-2.10.5
       export JAVA_HOME=/usr/java/jdk1.7.0_60
       export SPARK_MASTER_IP=192.168.1.118
       export SPARK_WORKER_MEMORY=3000m
       export master=spark://192.168.1.118:7070

      slaves 配置worker 工作节点
       由于采用本地模式，因此我们master 跟worker 节点在同一台机器上。
       192.168.1.118

启动spark

       start-all.sh 启动完成后，jps 查看进程 master 跟worker 进程。
       然后通过浏览器查看

       http://192.168.1.118:8080

(2) idea 打包jar 文件。

    新建项目，然后把自己写的内容复制到新的项目，重新打包。

   idea--file---project-Structure--Artifctor 新建打入jar.


(3)在提交任务到集群java no pointException 解决方法。

         conf.setAppName("WorldCount")
         conf.setMaster("spark://192.168.1.118:7077")

(4)解决如下错误 job has not accepted any resources; check your cluster UI to

spark WARNTaskSchedulerImpl:Initial job has not accepted any resources; check your cluster UI to
   15/03/26 22:29:36 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory 15/03/26 22:29:51 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory 15/03/26 22:30:06 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory 15/03/26 22:30:21 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory

   从警告信息上看，初始化job时没有获取到任何资源；提示检查集群，确保workers可以被注册并有足够的内存资源。

如上问题产生的原因是多方面的，可能原因如下：

   1.因为提交任务的节点不能和spark工作节点交互，因为提交完任务后提交任务节点上会起一个进程，展示任务进度，大多端口为4044，工作节点需要反馈进度给该该端口，所以如果主机名或者IP在hosts中配置不正确。所以检查下主机名和ip是否配置正确

   2.有可能是内存不足

      检查内存

       conf.set("spark.executor.memory", "3000m")

      Make sure to set SPARK_LOCAL_IP andSPARK_MASTER_IP.

      查看8080端口，确保一些workers保持Alive状态，确保 some cores 是可利用的

注意本次解决方式是：
      export SPARK_WORKER_MEMORY=3000m 增加spark 内存。
      slaves 文件中把loclalhost改为 ip 地址。

最后提交任务成功。

(5) IDEA 工具本地测试遇到问题 exception in thread "main" org.apache.spark.SparkException: A master URL must be set in your configuration

今天初步看了部分Spark的官方文档，在IntelliJ中，我尝试运行了SparkPi，整个过程遇到一些问题
xception in thread "main" org.apache.spark.SparkException: A master URL must be set in your configuration
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:185)
    at SparkDemo.SimpleApp$.main(SimpleApp.scala:13)
    at SparkDemo.SimpleApp.main(SimpleApp.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)

Using Spark‘s default log4j profile: org/apache/spark/log4j-defaults.propertie

解决办法：在IDE中点击Run -> Edit Configuration，在右侧VM options中输入“-Dspark.master=local”，指示本程序本地单线程运行
再次运行，依旧出错：

Exception in thread "main" java.lang.NoSuchMethodError: scala.collection.immutable.HashSet$.empty()Lscala/collection/immutable/HashSet;
at akka.actor.ActorCell$.<init>(ActorCell.scala:336)
at akka.actor.ActorCell$.<clinit>(ActorCell.scala)
at akka.actor.RootActorPath.$div(ActorPath.scala:159)
at akka.actor.LocalActorRefProvider.<init>(ActorRefProvider.scala:464)
at akka.actor.LocalActorRefProvider.<init>(ActorRefProvider.scala:452)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at akka.actor.ReflectiveDynamicAccess$$anonfun$createInstanceFor$2.apply(DynamicAccess.scala:78)
at scala.util.Try$.apply(Try.scala:191)
at akka.actor.ReflectiveDynamicAccess.createInstanceFor(DynamicAccess.scala:73)
at akka.actor.ReflectiveDynamicAccess$$anonfun$createInstanceFor$3.apply(DynamicAccess.scala:84)
at akka.actor.ReflectiveDynamicAccess$$anonfun$createInstanceFor$3.apply(DynamicAccess.scala:84)
at scala.util.Success.flatMap(Try.scala:230)
at akka.actor.ReflectiveDynamicAccess.createInstanceFor(DynamicAccess.scala:84)
at akka.actor.ActorSystemImpl.liftedTree1$1(ActorSystem.scala:584)
at akka.actor.ActorSystemImpl.<init>(ActorSystem.scala:577)
at akka.actor.ActorSystem$.apply(ActorSystem.scala:141)
at akka.actor.ActorSystem$.apply(ActorSystem.scala:108)
at akka.Akka$.delayedEndpoint$akka$Akka$1(Akka.scala:11)
at akka.Akka$delayedInit$body.apply(Akka.scala:9)
at scala.Function0$class.apply$mcV$sp(Function0.scala:40)
at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
at scala.App$$anonfun$main$1.apply(App.scala:76)
at scala.App$$anonfun$main$1.apply(App.scala:76)
at scala.collection.immutable.List.foreach(List.scala:383)
at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35)
at scala.App$class.main(App.scala:76)
at akka.Akka$.main(Akka.scala:9)
at akka.Akka.main(Akka.scala)

之前的一篇博客提到过，我安装的Scala版本为2.11.5，Spark版本为1.2.0，看来Spark版本和Scala版本还是存在一些兼容性问题，将Scala改为2.10.4问题就解决了，程序运行的结

(6)INFO AppClient$ClientActor: Executor updated: app-20151221220543-0003/29 is

15/12/21 22:06:01 ERROR SparkDeploySchedulerBackend: Asked to remove non-existent executor 18
15/12/21 22:06:01 INFO AppClient$ClientActor: Executor added: app-20151221220543-0003/19 on worker-20151221042816-itcastdd-56547 (itcastdd:56547) with 2 cores
15/12/21 22:06:01 INFO SparkDeploySchedulerBackend: Granted executor ID app-20151221220543-0003/19 on hostPort itcastdd:56547 with 2 cores, 2.9 GB RAM
15/12/21 22:06:01 INFO AppClient$ClientActor: Executor updated: app-20151221220543-0003/19 is now LOADING
15/12/21 22:06:01 INFO AppClient$ClientActor: Executor updated: app-20151221220543-0003/19 is now RUNNING
15/12/21 22:06:02 INFO AppClient$ClientActor: Executor updated: app-20151221220543-0003/19 is now EXITED (Command exited with code 1)
15/12/21 22:06:02 INFO SparkDeploySchedulerBackend: Executor app-20151221220543-0003/19 removed: Command exited with code 1
15/12/21 22:06:02 ERROR SparkDeploySchedulerBackend: Asked to remove non-existent executor 19
15/12/21 22:06:02 INFO AppClient$ClientActor: Executor added: app-20151221220543-0003/20 on worker-20151221042816-itcastdd-56547 (itcastdd:56547) with 2 cores
15/12/21 22:06:02 INFO SparkDeploySchedulerBackend: Granted executor ID app-20151221220543-0003/20 on hostPort itcastdd:56547 with 2 cores, 2.9 GB RAM
15/12/21 22:06:02 INFO AppClient$ClientActor: Executor updated: app-20151221220543-0003/20 is now RUNNING
15/12/21 22:06:02 INFO AppClient$ClientActor: Executor updated: app-20151221220543-0003/20 is now LOADING
15/12/21 22:06:02 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
15/12/21 22:06:02 INFO AppClient$ClientActor: Executor updated: app-20151221220543-0003/20 is now EXITED (Command exited with code 1)
15/12/21 22:06:02 INFO SparkDeploySchedulerBackend: Executor app-20151221220543-0003/20 removed: Command exited with code 1
15/12/21 22:06:02 ERROR SparkDeploySchedulerBackend: Asked to remove non-existent executor 20

spark的master不停的出现以下警告：

Got status update for unknown executor app-20150509185326-0001/11158

解决Spark集群重复出现SparkDeploySchedulerBackend: Asked to remove non-existent executor的问题

直接切入正题，spark版本1.3.1，出现这个问题的原因是因为spark上执行的任务异常终止导致的。

我碰到这个问题是因为hive的metastore出现了故障导致很多任务失败了，然后重启spark后，将应用到部署到集群上运行会应用端重复出现这样的错误：

解决方案

    停掉spark集群
    删除spark集群各节点上spark相关的临时文件，默认位于/tmp/spark-*，具体路径根据spark-env.sh中的SPARK_LOCAL_DIRS配置而定。
    启动spark集群

注意，必须要停掉spark后在删除临时文件才行，不然删除时文件正在使用会出错，删除不成功。
现这个错误的原因还有可能是集群启动的有问题，比如一台服务器上跑了2个Worker进程（可能其中一个Worker没有杀掉），使用jps命令查看会看到大量的类似如下的进程信息：

17048 -- process information unavailable
12914 -- process information unavailable
14540 -- process information unavailable
13579 -- process information unavailable
16809 -- process information unavailable

出现这种现象时，直接把有问题的Worker进程kill掉就行了。

问题排查技巧

放开Spark的日志级别，spark的log4j的默认日志级别是WARN，可以修改为INFO（修改后无须重启Spark），然后使用spark-sql --master spark://your-master-ip:7077去测试，分析日志信息，也能辅助排查错误。

***************************************************************************************************************************************************************** (6)Exception in thread "main" org.apache.spark.SparkException: A master URL must be set in your configuration
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:185)
    at SparkDemo.SimpleApp$.main(SimpleApp.scala:13)
    at SparkDemo.SimpleApp.main(SimpleApp.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)
Using Spark‘s default log4j profile: org/apache/spark/log4j-defaults.propertie

解决办法：

方法(1)在IDE中点击Run -> Edit Configuration，在右侧VM options中输入“-Dspark.master=local”，（指示本程序本地单线程运行）
方法(2)在代码中 conf.setMaster("spark://192.168.1.118:7070")

(7)再次运行，依旧出错：scala.collection.immutable.HashSet$.empty()Lscala/collection/immutable/HashSet;

Exception in thread "main" java.lang.NoSuchMethodError: scala.collection.immutable.HashSet$.empty()Lscala/collection/immutable/HashSet;
at akka.actor.ActorCell$.<init>(ActorCell.scala:336)
at akka.actor.ActorCell$.<clinit>(ActorCell.scala)
at akka.actor.RootActorPath.$div(ActorPath.scala:159)
at akka.actor.LocalActorRefProvider.<init>(ActorRefProvider.scala:464)
at akka.actor.LocalActorRefProvider.<init>(ActorRefProvider.scala:452)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at akka.actor.ReflectiveDynamicAccess$$anonfun$createInstanceFor$2.apply(DynamicAccess.scala:78)
at scala.util.Try$.apply(Try.scala:191)
at akka.actor.ReflectiveDynamicAccess.createInstanceFor(DynamicAccess.scala:73)
at akka.actor.ReflectiveDynamicAccess$$anonfun$createInstanceFor$3.apply(DynamicAccess.scala:84)
at akka.actor.ReflectiveDynamicAccess$$anonfun$createInstanceFor$3.apply(DynamicAccess.scala:84)
at scala.util.Success.flatMap(Try.scala:230)
at akka.actor.ReflectiveDynamicAccess.createInstanceFor(DynamicAccess.scala:84)
at akka.actor.ActorSystemImpl.liftedTree1$1(ActorSystem.scala:584)
at akka.actor.ActorSystemImpl.<init>(ActorSystem.scala:577)
at akka.actor.ActorSystem$.apply(ActorSystem.scala:141)
at akka.actor.ActorSystem$.apply(ActorSystem.scala:108)
at akka.Akka$.delayedEndpoint$akka$Akka$1(Akka.scala:11)
at akka.Akka$delayedInit$body.apply(Akka.scala:9)
at scala.Function0$class.apply$mcV$sp(Function0.scala:40)
at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
at scala.App$$anonfun$main$1.apply(App.scala:76)
at scala.App$$anonfun$main$1.apply(App.scala:76)
at scala.collection.immutable.List.foreach(List.scala:383)
at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35)
at scala.App$class.main(App.scala:76)
at akka.Akka$.main(Akka.scala:9)
at akka.Akka.main(Akka.scala)

首先这个错误是版本不兼容问题导致的（本地测试遇到的），
scala2.11.7 spark1.3
scala2.11.0 spark1.3
scala2.11.5 spark1.3
解决方案

scala 2.10.1 spark 1.4.1 IDEA 测试成功，spark 集群测试成功。
*****************************************************************************************************************************************************************

(8) java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.

解决方法是
https://github.com/srccodes/hadoop-common-2.2.0-bin 下载这个包
     https://github.com/srccodes/hadoop-common-2.2.0-bin/tree/master/bin

   下载后将所有文件都拷贝到hadoop的bin 目录下。

   然后在window/system32 方一份


   然后在idea main 方法第一行放入
        System.setProperty("hadoop.home.dir", "d:\\hadoop-2.2.0")

然后运行成功。

(9) 解决Exception in thread "main" org.apache.hadoop.security.AccessControlException: Permission denied: user=yw, access=EXECUTE, inode="/yw/b.txt":root:supergroup:-rw-r--r--

    设置权限问题。
   因为IDEA 使用hadoop插件提交作业时，会默认以administrator身份去将作业写入hdfs文件系统中，对应的也就是 HDFS 上的/user/xxx , 我的为/user/hadoop ,   由于 administrator 用户对hadoop目录并没有写入权限，所以导致异常的发生

   。解决方法为：
方法1 在 hdfs-site.xml 总添加参数：

<property>
        <name>dfs.permissions</name>
        <value>false</value>
</property>
</configuration>

方法2 设置文件拥有所有读取权限。
放开 hadoop 目录的权限，命令如下：
$hadoop fs -chmod 777 /yv
$hadoop fs -chmod -R 777 /yv

**************************************************************************************************************************************************************

(10)

java.lang.UnsatisfiedLinkError: org.apache.hadoop.util.NativeCrc32.nativeComputeChunkedSums(IILjava/nio/ByteBuffer;ILjava/nio/ByteBuffer;IILjava/lang/String;JZ)V

in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 0.0 failed 4 times, most recent failure: Lost task 1.3 in stage 0.0 (TID 6, 192.168.1.118): java.lang.ClassNotFoundException: com.hq.WorldCount$$anonfun$main$2

   at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
   at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
   at java.security.AccessController.doPrivileged(Native Method)
   at java.net.URLClassLoader.findClass(URLClassLoader.java:354)

运行ieda 本机测试并加入本机测试抛出如上错误。

解决方法：

(1) 代码中加入 conf.setMaster("spark://192.168.1.118:7077")

注意：如果是IDEA 调试用青修改“run--editor configurations---option 选项中输入Dspark.master=loclal” 就可以了。

(2)如果还是解决不行，可以尝试一下配置

在hadoop2.6源码里找到NativeCrc32.java，创建与源码一样的包名，拷贝NativeCrc32.java到该包工程目录下，不做任何修改错误即解决，该错误有可能hadoop安装包里这个.java错误，但下载的源码里这个.Java是正确的