1.java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.sql.metadata.SessionHiveMetaStoreClient
spark无法知道hive的元数据的位置,所以就无法实例化对应的client。
解决的办法就是必须将hive-site.xml拷贝到spark/conf目录下
2.Spark not Serializable
使用了非序列化的对象,在Java中若是在类中spark调用使用了匿名函数,则需要将该类实现Serializable接口,并且将成员变量用transient修饰
3.启动spark时加载了hive配置
(1) java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
Caused by: MetaException(message:Version information not found in metastore. )
解决:hive-site.xml 中的 “hive.metastore.schema.verification” 值为 false
Caused by: MetaException(message:Could not connect to meta store using any of the URIs provided. Most recent failure:
解决: 因为没有正常启动Hive 的 Metastore Server服务进程。 :nohup hive –service metastore &
(2)org.datanucleus.store.rdbms.connectionpool.DatastoreDriverNotFoundException: The specified datastore driver
("com.mysql.jdbc.Driver") was not found in the CLASSPATH. Please check your CLASSPATH specification, and the name of the driver.
解决:在spark-env.sh文件加入export SPARK_CLASSPATH="/Users/zouziwen/soft/spark-1.6.3/lib/mysql-connector-java-5.0.8-bin.jar"
(3)java.lang.OutOfMemoryError: PermGen space
-Xms1024m -Xmx1024m -XX:MaxNewSize=256m -XX:MaxPermSize=256m
(4)java.lang.NoClassDefFoundError: javax/jdo/JDOException
解决:将spark目录下lib的jar包加入到运行classpath中
(5)org.apache.spark.sql.AnalysisException: Table not found
解决:idea运行时找不到hive-site.xml,需要将该文件加入到idea的运行环境中
(6)HDFS error: could only be replicated to 0 nodes, instead of 1
stop all hadoop services
delete dfs/name and dfs/data directories
hadoop namenode -format # Answer with a capital Y
start hadoop services
4.Java对象不能在Spark执行函数中进行更改
5. hive启动问题
hadoop dfsadmin -safemode leave
6. 使用map导致程序卡住
由于map的rehash方法不断执行,导致wait
7. key数量不均匀或value数量不均匀,会导致数据倾斜问题,使得数据执行shuffly算子操作时,大量task处于停滞状态
8.ExecutorLostFailure:使用的资源过多,不能保证独享,导致机器被抢占,减少申请资源的量就好了