hive on spark 编译时遇到的问题

1.官方网站下载spark 1.5.0的源码

2.根据官方编译即可。

export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m" build/mvn -Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.0 -DskipTests clean package

./make-distribution.sh --name custom-spark --tgz -Phadoop-2.6 -Pyarn

如你使用的版本是scala2.11 可以做以下操作

./dev/change-scala-version.sh 2.11 mvn -Pyarn -Phadoop-2.4 -Dscala-2.11 -DskipTests clean package

不用再执行 ./make-distribution.sh --name custom-spark --tgz -Phadoop-2.4 -Pyarn

然后将./assembly/target/scala-2.11/ spark-assembly-1.5.0-hadoop2.6.0.jar  大约137MB 将其拷贝到$HIVE_HOME/lib下 hive 启动后,

可以执行 set hive.execution.engine=spark; 即可

调试中遇到的问题: 一定要调YARN的内存,否则会获取不到资源

YARN: Diagnostic Messages for this Task: Container [pid=7830,containerID=container_1397098636321_27548_01_000297] is running beyond physical memory limits. Current usage: 2.1 GB of 2 GB physical memory used;  2.7 GB of 4.2 GB virtual memory used. Killing container. Dump of the process-tree for container_1397098636321_27548_01_000297 : |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE |- 7830 7816 7830 7830 (java) 2547 390 2924818432 539150 /export/servers/jdk1.6.0_25/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Xmx2224m -Djava.io.tmpdir=/data2/nm/local/usercache/admin/appcache/application_1397098636321_27548/container_1397098636321_27548_01_000297/tmp -Dlog4j.configuration=container-log4j.properties......

检查yarn-site-xml job内存限制 <property>   <name>yarn.scheduler.minimum-allocation-mb</name>   <value>2048</value> </property>

 

解决方法: 1.增加yarn.scheduler.minimum-allocation-mb内存上限。 2.--hiveconf mapred.child.java.opts=-Xmx????m  一定要小于yarn.scheduler.minimum-allocation-mb

如果是vm超了,如下:调整yarn.nodemanager.vmem-pmem-ratio

 

查看log没有明显的ERROR,但存在类似以下描述的日志 2012-05-16 13:08:20,876 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl:  Sending out status for container: container_id {, app_attempt_id {, application_id {, id: 18, cluster_timestamp: 1337134318909, },  attemptId: 1, }, id: 6, }, state: C_COMPLETE, diagnostics: "Container [pid=15641,containerID=container_1337134318909_0018_01_000006]  is running beyond virtual memory limits. Current usage: 32.1mb of 1.0gb physical memory used; 6.2gb of 2.1gb virtual memory used.  Killing container.\nDump of the process-tree for container_1337134318909_0018_01_000006 :\n\t|- PID PPID PGRPID  SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE\n\t|  - 15641 26354 15641 15641 (java) 36 2 6686339072 8207 /home/zhouchen.zm/jdk1.6.0_23/bin/java   原因: 该错误是YARN的虚拟内存计算方式导致,上例中用户程序申请的内存为1Gb,YARN根据此值乘以一个比例(默认为2.1)得出申请的虚拟内存的值, 当YARN计算的用户程序所需虚拟内存值大于计算出来的值时,就会报出以上错误。调节比例值可以解决该问题。具体参数为:yarn-site.xml中的yarn.nodemanager.vmem-pmem-ratio

------QIN XIAO YAN -------------- <!-- Site specific YARN configuration properties -->  

<!-- Site specific YARN configuration properties -->
  <property>
    <description>The hostname of the RM.</description>
    <name>yarn.resourcemanager.hostname</name>
    <value>qxy1</value>
  </property>   
 <property>
    <description>The address of the applications manager interface in the RM.</description>
    <name>yarn.resourcemanager.address</name>
    <value>${yarn.resourcemanager.hostname}:8032</value>
  </property>
  <property>
    <description>List of directories to store localized files in. An
      application's localized file directory will be found in:
      ${yarn.nodemanager.local-dirs}/usercache/${user}/appcache/application_${appid}.
      Individual containers' work directories, called container_${contid}, will
      be subdirectories of this.
   </description>
    <name>yarn.nodemanager.local-dirs</name>
    <value>${hadoop.tmp.dir}/nm-local-dir</value>
  </property>

 <property>
    <description>Amount of physical memory, in MB, that can be allocated
    for containers.</description>
    <name>yarn.nodemanager.resource.memory-mb</name>
    <value>4096</value>
  </property>

  <property>
    <description>Ratio between virtual memory to physical memory when
    setting memory limits for containers. Container allocations are
    expressed in terms of physical memory, and virtual memory usage
    is allowed to exceed this allocation by this ratio.
    </description>
    <name>yarn.nodemanager.vmem-pmem-ratio</name>
    <value>2.1</value>
  </property>

  <property>
    <description>Number of vcores that can be allocated
    for containers. This is used by the RM scheduler when allocating
    resources for containers. This is not used to limit the number of
    physical cores used by YARN containers.</description>
    <name>yarn.nodemanager.resource.cpu-vcores</name>
    <value>8</value>
  </property>

<property>
    <description>The class to use as the resource scheduler.</description>
    <name>yarn.resourcemanager.scheduler.class</name>
    <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value>
  </property>

 <property>
    <description>The minimum allocation for every container request at the RM,
    in MBs. Memory requests lower than this will throw a
    InvalidResourceRequestException.</description>
    <name>yarn.scheduler.minimum-allocation-mb</name>
    <value>2048</value>
  </property>

  <property>
    <description>The maximum allocation for every container request at the RM,
    in MBs. Memory requests higher than this will throw a
    InvalidResourceRequestException.</description>
    <name>yarn.scheduler.maximum-allocation-mb</name>
    <value>4096</value>
  </property>


  <property>
    <description>Path to file with nodes to include.</description>
    <name>yarn.resourcemanager.nodes.include-path</name>
    <value></value>
  </property>

     <property>
    <description>
      Where to store container logs. An application's localized log directory
      will be found in ${yarn.nodemanager.log-dirs}/application_${appid}.
      Individual containers' log directories will be below this, in directories
      named container_{$contid}. Each container directory will contain the files
      stderr, stdin, and syslog generated by that container.
    </description>
    <name>yarn.nodemanager.log-dirs</name>
    <value>${yarn.log.dir}/userlogs</value>
  </property>

  <property>
    <description>Time in seconds to retain user logs. Only applicable if
    log aggregation is disabled
    </description>
    <name>yarn.nodemanager.log.retain-seconds</name>
    <value>10800</value>
  </property>

  <property>
    <description>Where to aggregate logs to.</description>
    <name>yarn.nodemanager.remote-app-log-dir</name>
    <value>/tmp/logs</value>
  </property>
 <property>
  <description>The remote log dir will be created at
      {yarn.nodemanager.remote-app-log-dir}/${user}/{thisParam}
    </description>
    <name>yarn.nodemanager.remote-app-log-dir-suffix</name>
    <value>logs</value>
  </property>
<property> 
    <name>yarn.nodemanager.aux-services</name> 
    <value>mapreduce_shuffle</value> 
</property>
<property>
 <name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>

 

SRARK 启动时报如下错误: Error: A JNI error has occurred, please check your installation and try again

 1. SPARK_DIST_CLASSPATH=$(/home/hadoop/hadoop-2.7.2/bin/hadoop classpath)  

2. 解决办法:

 3.  export SCALA_HOME=/opt/scala-2.11.8  4. export SPARK_MASTER_IP=192.168.233.159  5. export SPARK_WORKER_MEMORY=1g  6. export HADOOP_CONF_DIR=/opt/hadoop-2.6.2/etc/hadoop  7. export JAVA_HOME=/opt/jdk1.8.0_77  8. export SPARK_DIST_CLASSPATH=$(/opt/hadoop-2.6.2/bin/hadoop  classpath)     ##加这条

 

转载于:https://www.cnblogs.com/qxyy/p/6238890.html

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值