Spark
binbincoder
生而无畏、战至终章
展开
-
Spark Scala API读Hbase(带Kerberos安全认证)
Spark Scala API读Hbase(带Kerberos安全认证)各个组件版本 <!-- Languages --><java.version>1.8</java.version><scala.version>2.11.8</scala.version><scala.binary.version>2.11<...原创 2020-03-26 15:08:14 · 1085 阅读 · 0 评论 -
Spark调优之JVM调优
Spark-JVM:老年代:存放少量生命周期长的对象,如连接池年轻代:Spark task执行算子函数自己创建的大量对象JVM机制:对象进入java虚拟机之后会放在eden区域和一个survivor区域,还有一个空闲的survivor区域的是空闲的.Eden区域和一个survivor区域满了之后会触发minor GC(小型垃圾回收)清除不再使用的对象,给后续对象腾地方.活下来没有被清...原创 2018-11-20 19:53:18 · 250 阅读 · 0 评论 -
运行spark项目的问题
运行spark项目的问题:[root@aaaaaaa apptooljar]# spark-submit --master spark://aaaaaaa:6066 --deploy-mode client --class controller.KafkaConsumeFlume /opt/software/apptooljar/BIOUSERAI.jar Exception in thre...原创 2019-09-19 18:59:32 · 862 阅读 · 0 评论 -
重拾hadoop,spark的Cloudera Manager BUGHeart
总有你遇到的Question### 版本问题:NoSuchMethodError: scala.Predef$.ArrowAssoc(Ljava/lang/Object;)Ljava/lang/Object解决方案:就是你程序编码版本和CM自带或者自己集群安装的spark/scala版本不一致,具体查看版本的方法看下面。重新安装的方法。。。(Please leave a message)...原创 2019-09-19 18:50:40 · 219 阅读 · 0 评论 -
Protocol message end-group tag did not match expected tagHost Details local host is destination host
问题: Exception in thread "main" java.io.IOException: Failed on local exception: com.google.protobuf.InvalidProtocolBufferException: Protocol message end-group tag did not match expected tag.; Host Det...原创 2019-05-31 18:23:50 · 2289 阅读 · 0 评论 -
java.lang.NoClassDefFoundError: org/apache/spark/sql/internal/SQLConf$SQLConfigBuilder$
问题:Exception in thread “main” java.lang.NoClassDefFoundError: org/apache/spark/sql/internal/SQLConfSQLConfigBuilderSQLConfigBuilderSQLConfigBuilderat org.apache.spark.sql.hive.HiveUtils.<init...原创 2019-05-31 18:19:16 · 2224 阅读 · 0 评论 -
java.lang.NumberFormatException: multiple points
java.lang.NumberFormatException: multiple pointsat sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1110)at java.lang.Double.parseDouble(Double.java:540)at java.text.DigitList.get...原创 2019-01-16 09:12:41 · 359 阅读 · 0 评论 -
Failed to start database 'metastore_db' with class loader
19/01/15 22:53:00 ERROR Schema: Failed initialising database.Unable to open a test connection to the given database. JDBC url = jdbc:derby:;databaseName=metastore_db;create=true, username = APP. Term...原创 2019-01-15 22:55:34 · 2557 阅读 · 0 评论 -
requirement failed: Column features must be of type org.apache.spark.ml.linalg.VectorUDT@3bfc3ba7
Exception in thread “main” java.lang.IllegalArgumentException: requirement failed: Column features must be of type org.apache.spark.ml.linalg.VectorUDT@3bfc3ba7 but was actually org.apache.spark.mllib...原创 2019-01-21 14:13:22 · 3180 阅读 · 0 评论 -
Caused by: java.lang.ClassNotFoundException: org.apache.spark.ml.param.Param
问题是:Exception in thread “main” java.lang.NoClassDefFoundError: org/apache/spark/ml/param/Paramat saprk_ML.demo01.main(demo01.scala)Caused by: java.lang.ClassNotFoundException: org.apache.spark.ml.p...原创 2019-01-21 09:36:59 · 847 阅读 · 0 评论 -
Spark Streaming foreachRDD的正确使用方式
误区一:在driver上创建连接对象(比如网络连接或数据库连接)如果在driver上创建连接对象,然后在RDD的算子函数内使用连接对象,那么就意味着需要将连接对象序列化后从driver传递到worker上。而连接对象(比如Connection对象)通常来说是不支持序列化的,此时通常会报序列化的异常(serialization errors)。因此连接对象必须在worker上创建,不要在drive...原创 2019-01-14 22:11:01 · 252 阅读 · 0 评论 -
scala.Predef$.$conforms()Lscala/Predef$$less$colon$less;
狗血的版本问题:Exception in thread “main” java.lang.NoSuchMethodError: scala.Predef...conforms()Lscala/Predef$lesslesslesscolonless;atorg.apache.spark.util.Utilsless; at org.apache.spark.util.Utilsless;ato...原创 2019-01-24 16:20:54 · 3441 阅读 · 0 评论 -
最完整的Spark数据倾斜解决方案
我的原文: https://www.cnblogs.com/gentle-awen/p/10141315.html一.了解数据倾斜 数据倾斜的原理: 在执行shuffle操作的时候,按照key,来进行values的数据的输出,拉取和聚合.同一个key的values,一定是分配到一个Reduce task进行处理.假如多个key对应的values,总共是90万,但是可能某个key...原创 2018-12-20 09:19:40 · 296 阅读 · 0 评论 -
Error:(3, 12) Implementation restriction: case classes cannot have more than
一.直接上代码package com.etlstuimport java.util.Propertiesimport com.utils.NBFimport com.utilsStu.logmetadataimport org.apache.spark.rdd.RDDimport org.apache.spark.sql.{DataFrame, SQLContext, SaveMod...原创 2018-11-20 08:43:11 · 546 阅读 · 1 评论 -
读取大文件数据进入redis作为缓存:赠(广播变量)
在项目中使用Redis做缓存文件(目的等同于广播变量):package com.appimport com.utils.{JedisConnectionPool, RptUtils}import org.apache.commons.lang.StringUtilsimport org.apache.spark.sql.{DataFrame, Row, SQLContext}import...原创 2018-11-20 19:41:50 · 1363 阅读 · 0 评论