问题1:yarn资源申请不足,导致任务持续等待
2016-09-20 16:49:25,657 [WARN ] 70 org.apache.spark.scheduler.cluster.YarnScheduler - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
2016-09-20 16:49:40,657 [WARN ] 70 org.apache.spark.scheduler.cluster.YarnScheduler - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
2016-09-20 16:49:55,657 [WARN ] 70 org.apache.spark.scheduler.cluster.YarnScheduler - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
2016-09-20 16:50:10,657 [WARN ] 70 org.apache.spark.scheduler.cluster.YarnScheduler - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
2016-09-20 16:50:25,657 [WARN ] 70 org.apache.spark.scheduler.cluster.YarnScheduler - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
2016-09-20 16:50:40,657 [WARN ] 70 org.apache.spark.scheduler.cluster.YarnScheduler - Initial job has not accepted any resources; check
解决办法:
1、yarn上面运行的程序太多,等待内存释放。
2、将自己的程序提交的时候,减少内存的使用。
问题2:spark,代码实现:missing parameter type
package cn.spark.study.core.upgrade
import java.util
import org.apache.spark.SparkConf
import org.apache.spark.api.java.{JavaRDD, JavaSparkContext}
object Cartesian {
def main(args: Array[String]): Unit = {
val conf: SparkConf = new SparkConf().setAppName("Cartesian").setMaster("local")
val sc: JavaSparkContext = new JavaSparkContext(conf)
// cartesian,中文名,笛卡尔乘积
// 比如说两个RDD,分别有10条数据,用了cartesian算子以后
// 两个RDD的每一条数据都会和另外一个RDD的每一条数据执行一次join
// 最终组成了一个笛卡尔乘积
// 小案例
// 比如说,现在5件衣服,5条裤子,分别属于两个RDD
// 就是说,需要对每件衣服都和每天裤子做一次join,尝试进行服装搭配
val clothes: util.List[String] = util.Arrays.asList("夹克", "T恤", "皮衣", "风衣")
val clothesRDD: JavaRDD[String] = sc.parallelize(clothes)
val trousers: util.List[String] = util.Arrays.asList("皮裤", "运动裤", "牛仔裤", "休闲裤")
val trousersRDD: JavaRDD[String] = sc.parallelize(trousers)
clothesRDD.cartesian(trousersRDD).map(row => (row._1, row._2)).foreach(println) //报错
clothesRDD.rdd.cartesian(trousersRDD.rdd).map(row => (row._1, row._2)).foreach(println) //正确
}
}
问题3:DataFrame和保存的表结构字段顺序不一致
07-02-2022 18:44:18 CST AutoAnalyseActiveType INFO - 22/02/07 18:44:18 INFO yarn.Client:
07-02-2022 18:44:18 CST AutoAnalyseActiveType INFO - client token: N/A
07-02-2022 18:44:18 CST AutoAnalyseActiveType INFO - diagnostics: User class threw exception: org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:javax.jdo.JDOException: Exception thrown when executing query : SELECT DISTINCT 'org.apache.hadoop.hive.metastore.model.MPartition' AS `NUCLEUS_TYPE`,`A0`.`CREATE_TIME`,`A0`.`LAST_ACCESS_TIME`,`A0`.`PART_NAME`,`A0`.`PART_ID` FROM `PARTITIONS` `A0` LEFT OUTER JOIN `TBLS` `B0` ON `A0`.`TBL_ID` = `B0`.`TBL_ID` LEFT OUTER JOIN `DBS` `C0` ON `B0`.`DB_ID` = `C0`.`DB_ID` WHERE `B0`.`TBL_NAME` = ? AND `C0`.`NAME` = ? AND `A0`.`PART_NAME` = ?
07-02-2022 18:44:18 CST AutoAnalyseActiveType INFO - at org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:677)
07-02-2022 18:44:18 CST AutoAnalyseActiveType INFO - at org.datanucleus.api.jdo.JDOQuery.executeInternal(JDOQuery.java:388)
07-02-2022 18:44:18 CST AutoAnalyseActiveType INFO - at org.datanucleus.api.jdo.JDOQuery.execute(JDOQuery.java:252)
07-02-2022 18:44:18 CST AutoAnalyseActiveType INFO - at org.apache.hadoop.hive.metastore.ObjectStore.getMPartition(ObjectStore.java:1602)
07-02-2022 18:44:18 CST AutoAnalyseActiveType INFO - at org.apache.hadoop.hive.metastore.ObjectStore.getPartitionWithAuth(ObjectStore.java:1864)
07-02-2022 18:44:18 CST AutoAnalyseActiveType INFO - at sun.reflect.GeneratedMethodAccessor31.invoke(Unknown Source)
07-02-2022 18:44:18 CST AutoAnalyseActiveType INFO - at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
07-02-2022 18:44:18 CST AutoAnalyseActiveType INFO - at java.lang.reflect.Method.invoke(Method.java:498)
07-02-2022 18:44:18 CST AutoAnalyseActiveType INFO - at org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:103)
07-02-2022 18:44:18 CST AutoAnalyseActiveType INFO - at com.sun.proxy.$Proxy10.getPartitionWithAuth(Unknown Source)
07-02-2022 18:44:18 CST AutoAnalyseActiveType INFO - at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partition_with_auth(HiveMetaStore.java:3097)
07-02-2022 18:44:18 CST AutoAnalyseActiveType INFO - at sun.reflect.GeneratedMethodAccessor30.invoke(Unknown Source)
07-02-2022 18:44:18 CST AutoAnalyseActiveType INFO - at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
07-02-2022 18:44:18 CST AutoAnalyseActiveType INFO - at java.lang.reflect.Method.invoke(Method.java:498)
07-02-2022 18:44:18 CST AutoAnalyseActiveType INFO - at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:139)
07-02-2022 18:44:18 CST AutoAnalyseActiveType INFO - at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:97)
07-02-2022 18:44:18 CST AutoAnalyseActiveType INFO - at com.sun.proxy.$Proxy12.get_partition_with_auth(Unknown Source)
07-02-2022 18:44:18 CST AutoAnalyseActiveType INFO - at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_partition_with_auth.getResult(ThriftHiveMetastore.java:10073)
07-02-2022 18:44:18 CST AutoAnalyseActiveType INFO - at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_partition_with_auth.getResult(ThriftHiveMetastore.java:10057)
07-02-2022 18:44:18 CST AutoAnalyseActiveType INFO - at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
07-02-2022 18:44:18 CST AutoAnalyseActiveType INFO - at org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:110)
07-02-2022 18:44:18 CST AutoAnalyseActiveType INFO - at org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:106)
07-02-2022 18:44:18 CST AutoAnalyseActiveType INFO - at java.security.AccessController.doPrivileged(Native Method)
07-02-2022 18:44:18 CST AutoAnalyseActiveType INFO - at javax.security.auth.Subject.doAs(Subject.java:422)
07-02-2022 18:44:18 CST AutoAnalyseActiveType INFO - at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
07-02-2022 18:44:18 CST AutoAnalyseActiveType INFO - at org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:118)
07-02-2022 18:44:18 CST AutoAnalyseActiveType INFO - at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
07-02-2022 18:44:18 CST AutoAnalyseActiveType INFO - at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
07-02-2022 18:44:18 CST AutoAnalyseActiveType INFO - at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
07-02-2022 18:44:18 CST AutoAnalyseActiveType INFO - at java.lang.Thread.run(Thread.java:745)
07-02-2022 18:44:18 CST AutoAnalyseActiveType INFO - NestedThrowablesStackTrace:
07-02-2022 18:44:18 CST AutoAnalyseActiveType INFO - java.sql.SQLException: Illegal mix of collations (latin1_bin,IMPLICIT) and (utf8_unicode_ci,COERCIBLE) for operation '='