使用IDEA 构建入门级程序以及问题解决

本文介绍了如何使用IDEA构建Spark Scala入门级程序,包括下载安装Spark、IDEA和Scala,配置环境变量,解决启动Spark时遇到的问题。在IDEA中打包jar文件时遇到错误,如'job has not accepted any resources',通过设置master URL为'spark://192.168.1.118:7077'来解决。此外,还提到了Spark与Scala版本兼容性问题,将Scala从2.11.5降至2.10.4以解决问题。最后,针对本地测试时出现的'ClassNotFoundException',提供了相应的解决方法。
摘要由CSDN通过智能技术生成

     首先下载并且安装

    spark

    IDEA

    scala 

    配置环境变量

       vim /etc/profile 的配置

    export JAVA_HOME=/usr/java/jdk1.7.0_60
    export HADOOP_HOME=/itcast/hadoop-2.2.0
    export apache=/java/apache-tomcat-7.0.27
    export SCALA_HOME=/itcast/scala-2.10.5
    export SPARK_HOME=/itcast/spark-1.3.0-bin-hadoop2.4
    export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$apache/bin:$SCALA_HOME/bin:$SPARK_HOME/bin


     

       spark conf 配置文件 spark-env.sh.template 修改spark-env.sh

      
        export SCALA_HOME=/itcast/scala-2.10.5
        export JAVA_HOME=/usr/java/jdk1.7.0_60
        export SPARK_MASTER_IP=192.168.1.118
        export SPARK_WORKER_MEMORY=3000m
        export master=spark://192.168.1.118:7070
        
      slaves 配置worker 工作节点  
       由于采用本地模式,因此我们master 跟worker 节点在同一台机器上。
       192.168.1.118

      

      启动spark

       start-all.sh 启动完成后,jps 查看进程 master 跟worker 进程。
       然后通过浏览器查看
       
       http://192.168.1.118:8080

      


(2) idea 打包jar 文件。


    新建项目,然后把自己写的内容复制到新的项目,重新打包。
    
    idea--file---project-Structure--Artifctor  新建打入jar.
    
    
(3)在 提交任务到集群java no pointException  解决方法。

         conf.setAppName("WorldCount")
         conf.setMaster("spark://192.168.1.118:7077")
        
        

(4)解决如下错误 job has not accepted any resources; check your cluster UI to


spark WARNTaskSchedulerImpl:Initial job has not accepted any resources; check your cluster UI to
   15/03/26 22:29:36 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory 15/03/26 22:29:51 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory 15/03/26 22:30:06 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory 15/03/26 22:30:21 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory
   
   从警告信息上看,初始化job时没有获取到任何资源;提示检查集群,确保workers可以被注册并有足够的内存资源。

如上问题产生的原因是多方面的,可能原因如下:

    1.因为提交任务的节点不能和spark工作节点交互,因为提交完任务后提交任务节点上会起一个进程,展示任务进度,大多端口为4044,工作节点需要反馈进度给该该端口,所以如果主机名或者IP在hosts中配置不正确。所以检查下主机名和ip是否配置正确

    2.有可能是内存不足

      检查内存

       conf.set("spark.executor.memory", "3000m")

      Make sure to set SPARK_LOCAL_IP andSPARK_MASTER_IP.

      查看8080端口,确保一些workers保持Alive状态,确保 some cores 是可利用的

注意本次解决方式是:
      export SPARK_WORKER_MEMORY=3000m 增加spark 内存。
      slaves 文件中把loclalhost改为 ip 地址。

最后提交任务成功。



(5) IDEA 工具本地测试遇到问题  exception in thread "main" org.apache.spark.SparkException: A master URL must be set in your configuration


今天初步看了部分Spark的官方文档,在IntelliJ中,我尝试运行了SparkPi,整个过程遇到一些问题
xception in thread "main" org.apache.spark.SparkException: A master URL must be set in your configuration
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:185)
    at SparkDemo.SimpleApp$.main(SimpleApp.scala:13)
    at SparkDemo.SimpleApp.main(SimpleApp.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)

Using Spark‘s default log4j profile: org/apache/spark/log4j-defaults.propertie


解决办法:在IDE中点击Run -> Edit Configuration,在右侧VM options中输入“-Dspark.master=local”,指示本程序本地单线程运行
再次运行,依旧出错:

Exception in thread "main" java.lang.NoSuchMethodError: scala.collection.immutable.HashSet$.empty()Lscala/collection/immutable/HashSet;
at akka.actor.ActorCell$.<init>(ActorCell.scala:336)
at akka.actor.ActorCell$.<clinit>(ActorCell.scala)
at akka.actor.RootActorPath.$div(ActorPath.scala:159)
at akka.actor.LocalActorRefProvider.<init>(ActorRefProvider.scala:464)
at akka.actor.LocalActorRefProvider.<init>(ActorRefProvider.scala:452)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at akka.actor.ReflectiveDynamicAccess$$anonfun$createInstanceFor$2.apply(DynamicAccess.scala:78)
at scala.util.Try$.apply(Try.scala:191)
at akka.actor.ReflectiveDynamicAccess.createInstanceFor(DynamicAccess.scala:73)
at akka.actor.ReflectiveDynamicAccess$$anonfun$createInstanceFor$3.apply(DynamicAccess.scala:84)
at akka.actor.ReflectiveDynamicAccess$$anonfun$createInstanceFor$3.apply(DynamicAccess.scala:84)
at scala.util.Success.flatMap(Try.scala:230)
at akka.actor.ReflectiveDynamicAccess.createInstanceFor(DynamicAccess.scala:84)
at akka.actor.ActorSystemImpl.liftedTree1$1(ActorSystem.scala:584)
at akka.actor.ActorSystemImpl.<init>(ActorSystem.scala:577)
at akka.actor.ActorSystem$.apply(ActorSystem.scala:141)
at akka.actor.ActorSystem$.apply(ActorSystem.scala:108)
at akka.Akka$.delayedEndpoint$akka$Akka$1(Akka.scala:11)
at akka.Akka$delayedInit$body.apply(Akka.scala:9)
at scala.Function0$class.apply$mcV$sp(Function0.scala:40)
at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
at scala.App$$anonfun$main$1.apply(App.scala:76)
at scala.App$$anonfun$main$1.apply(App.scala:76)
at scala.collection.immutable.List.foreach(List.scala:383)
at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35)
at scala.App$class.main(App.scala:76)
at akka.Akka$.main(Akka.scala:9)
at akka.Akka.main(Akka.scala)

之前的一篇博客提到过,我安装的Scala版本为2.11.5,Spark版本为1.2.0,看来Spark版本和Scala版本还是存在一些兼容性问题,将Scala改为2.10.4问题就解决了,程序运行的结



(6)INFO AppClient$ClientActor: Executor updated: app-20151221220543-0003/29 is

15/12/21 22:06:01 ERROR SparkDeploySchedulerBackend: Asked to remove non-existent executor 18
15/12/21 22:06:01 INFO AppClient$ClientActor: Executor added: app-20151221220543-0003/19 on worker-20151221042816-itcastdd-56547 (itcastdd:56547) with 2 cores
15/12/21 22:06:01 INFO SparkDeploySchedulerBackend: Granted executor ID app-20151221220543-0003/19 on hostPort itcastdd:56547 with 2 cores, 2.9 GB RAM
15/12/21 22:06:01 INFO AppClient$ClientActor: Executor updated: app-20151221220543-0003/19 is now LOADING
15/12/21 22:06:01 INFO AppClient$ClientActor: Executor updated: app-20151221220543-0003/19 is now RUNNING
15/12/21 22:06:02 INFO AppClient$ClientActor: Executor updated: app-20151221220543-0003/19 is now EXITED (Command exited with code 1)
15/12/21 22:06:02 INFO SparkDeploySchedulerBackend: Executor app-20151221220543-0003/19 removed: Command exited with code 1
15/12/21 22:06:02 ERROR SparkDeploySchedulerBackend: Asked to remove non-existent executor 19
15/12/21 22:06:02 INFO AppClient$ClientActor: Executor added: app-20151221220543-0003/20 on worker-20151221042816-itcastdd-56547 (itcastdd:56547) with 2 cores
15/12/21 22:06:02 INFO SparkDeploySchedulerBackend: Granted executor ID app-20151221220543-0003/20 on hostPort itcastdd:56547 with 2 cores, 2.9 GB RAM
15/12/21 22:06:02 INFO AppClient$ClientActor: Executor updated: app-20151221220543-0003/20 is now RUNNING
15/12/21 22:06:02 INFO AppClient$ClientActor: Executor updated: app-20151221220543-0003/20 is now LOADING
15/12/21 22:06:02 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
15/12/21 22:06:02 INFO AppClient$ClientActor: Executor updated: app-20151221220543-0003/20 is now EXITED (Command exited with code 1)
15/12/21 22:06:02 INFO SparkDeploySchedulerBackend: Executor app-20151221220543-0003/20 removed: Command exited with code 1
15/12/21 22:06:02 ERROR SparkDeploySchedulerBackend: Asked to remove non-existent executor 20


spark的master不停的出现以下警告:

Got status update for unknown executor app-20150509185326-0001/11158  

解决Spark集群重复出现SparkDeploySchedulerBackend: Asked to remove non-existent executor的问题

直接切入正题,spark版本1.3.1,出现这个问题的原因是因为spark上执行的任务异常终止导致的。

我碰到这个问题是因为hive的metastore出现了故障导致很多任务失败了,然后重启spark后,将应用到部署到集群上运行会应用端重复出现这样的错误:

解决方案

    停掉spark集群
    删除spark集群各节点上spark相关的临时文件,默认位于/tmp/spark-*,具体路径根据spark-env.sh中的SPARK_LOCAL_DIRS配置而定。
    启动spark集群

注意,必须要停掉spark后在删除临时文件才行,不然删除时文件正在使用会出错,删除不成功。
现这个错误的原因还有可能是集群启动的有问题,比如一台服务器上跑了2个Worker进程(可能其中一个Worker没有杀掉),使用jps命令查看会看到大量的类似如下的进程信息:

17048 -- process information unavailable  
12914 -- process information unavailable  
14540 -- process information unavailable  
13579 -- process information unavailable  
16809 -- process information unavailable  

出现这种现象时,直接把有问题的Worker进程kill掉就行了。

问题排查技巧

放开Spark的日志级别,spark的log4j的默认日志级别是WARN,可以修改为INFO(修改后无须重启Spark),然后使用spark-sql --master spark://your-master-ip:7077去测试,分析日志信息,也能辅助排查错误。

***************************************************************************************************************************************************************** (6)Exception in thread "main" org.apache.spark.SparkException: A master URL must be set in your configuration
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:185)
    at SparkDemo.SimpleApp$.main(SimpleApp.scala:13)
    at SparkDemo.SimpleApp.main(SimpleApp.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)
Using Spark‘s default log4j profile: org/apache/spark/log4j-defaults.propertie


解决办法:

方法(1)在IDE中点击Run -> Edit Configuration,在右侧VM options中输入“-Dspark.master=local”,(指示本程序本地单线程运行)
方法(2)在代码中 conf.setMaster("spark://192.168.1.118:7070")

(7)再次运行,依旧出错:scala.collection.immutable.HashSet$.empty()Lscala/collection/immutable/HashSet;

Exception in thread "main" java.lang.NoSuchMethodError: scala.collection.immutable.HashSet$.empty()Lscala/collection/immutable/HashSet;
at akka.actor.ActorCell$.<init>(ActorCell.scala:336)
at akka.actor.ActorCell$.<clinit>(ActorCell.scala)
at akka.actor.RootActorPath.$div(ActorPath.scala:159)
at akka.actor.LocalActorRefProvider.<init>(ActorRefProvider.scala:464)
at akka.actor.LocalActorRefProvider.<init>(ActorRefProvider.scala:452)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at akka.actor.ReflectiveDynamicAccess$$anonfun$createInstanceFor$2.apply(DynamicAccess.scala:78)
at scala.util.Try$.apply(Try.scala:191)
at akka.actor.ReflectiveDynamicAccess.createInstanceFor(DynamicAccess.scala:73)
at akka.actor.ReflectiveDynamicAccess$$anonfun$createInstanceFor$3.apply(DynamicAccess.scala:84)
at akka.actor.ReflectiveDynamicAccess$$anonfun$createInstanceFor$3.apply(DynamicAccess.scala:84)
at scala.util.Success.flatMap(Try.scala:230)
at akka.actor.ReflectiveDynamicAccess.createInstanceFor(DynamicAccess.scala:84)
at akka.actor.ActorSystemImpl.liftedTree1$1(ActorSystem.scala:584)
at akka.actor.ActorSystemImpl.<init>(ActorSystem.scala:577)
at akka.actor.ActorSystem$.apply(ActorSystem.scala:141)
at akka.actor.ActorSystem$.apply(ActorSystem.scala:108)
at akka.Akka$.delayedEndpoint$akka$Akka$1(Akka.scala:11)
at akka.Akka$delayedInit$body.apply(Akka.scala:9)
at scala.Function0$class.apply$mcV$sp(Function0.scala:40)
at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
at scala.App$$anonfun$main$1.apply(App.scala:76)
at scala.App$$anonfun$main$1.apply(App.scala:76)
at scala.collection.immutable.List.foreach(List.scala:383)
at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35)
at scala.App$class.main(App.scala:76)
at akka.Akka$.main(Akka.scala:9)
at akka.Akka.main(Akka.scala)

首先这个错误是版本不兼容问题导致的(本地测试遇到的),
scala2.11.7 spark1.3
scala2.11.0 spark1.3
scala2.11.5 spark1.3
解决方案

scala 2.10.1  spark 1.4.1 IDEA 测试成功,spark 集群测试成功。
*****************************************************************************************************************************************************************

(8) java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
 

 解决方法是
 https://github.com/srccodes/hadoop-common-2.2.0-bin 下载这个包
     https://github.com/srccodes/hadoop-common-2.2.0-bin/tree/master/bin

     下载后将所有文件都拷贝到hadoop的bin 目录下。
    
     然后在window/system32 方一份
    
    
     然后在idea main 方法第一行放入
        System.setProperty("hadoop.home.dir", "d:\\hadoop-2.2.0")
        
然后运行成功。


(9) 解决Exception in thread "main" org.apache.hadoop.security.AccessControlException: Permission denied: user=yw, access=EXECUTE, inode="/yw/b.txt":root:supergroup:-rw-r--r--


    设置权限问题。
    因为IDEA 使用hadoop插件提交作业时,会默认以administrator身份去将作业写入hdfs文件系统中,对应的也就是 HDFS 上的/user/xxx , 我的为/user/hadoop ,   由于 administrator 用户对hadoop目录并没有写入权限,所以导致异常的发生
   
    。解决方法为:
方法1 在 hdfs-site.xml 总添加参数:

 <property>
        <name>dfs.permissions</name>
        <value>false</value>
  </property>
</configuration>

方法2 设置文件拥有所有读取权限。
  放开 hadoop 目录的权限 , 命令如下 :
  $hadoop fs -chmod 777 /yv
  $hadoop fs -chmod -R 777 /yv
 
 
 **************************************************************************************************************************************************************

(10)

java.lang.UnsatisfiedLinkError:     org.apache.hadoop.util.NativeCrc32.nativeComputeChunkedSums(IILjava/nio/ByteBuffer;ILjava/nio/ByteBuffer;IILjava/lang/String;JZ)V

in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 0.0 failed 4 times, most recent failure: Lost task 1.3 in stage 0.0 (TID 6, 192.168.1.118): java.lang.ClassNotFoundException: com.hq.WorldCount$$anonfun$main$2

    at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:354)


 运行ieda 本机测试并加入本机测试抛出如上错误。


 解决方法:


 (1) 代码中加入 conf.setMaster("spark://192.168.1.118:7077")

   注意:如果是IDEA 调试用青修改“run--editor configurations---option 选项中输入Dspark.master=loclal” 就可以了。

 (2)如果还是解决不行,可以尝试一下配置

在hadoop2.6源码里找到NativeCrc32.java,创建与源码一样的包名,拷贝NativeCrc32.java到该包工程目录下,不做任何修改错误即解决,该错误有可能hadoop安装包里这个.java错误,但下载的源码里这个.Java是正确的


 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值