spark报错集合:
调心跳的参数:spark.executor.heartbeatInterval
tasks is bigger than spark.driver.maxResultSize
解决办法: --conf spark.driver.maxResultSize=3g
在spark里提交任务时,如果报数据库找不到的error则需要加–driver-class-path mysql-connector-java-5.1.41-bin.jar
ERROR ContextCleaner: Error cleaning broadcast 51
org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [120 seconds].
原因:日志太多了
解决办法:提交的时候加两行参数,下次再跑的时候就不需要这两个参数了
–conf spark.cleaner.referenceTracking.blocking=true
–conf spark.cleaner.referenceTracking.blocking.shuffle=true
参考资料
py4j.protocol.Py4JJavaError: An error occurred while calling
z:org.apache.spark.api.python.PythonRDD.runJob.
org.apache.spark.SparkException: Job aborted due to stage failure: Serialized task 0:0 was 216138516 bytes, which exceeds max allowed: spark.rpc.message.maxSize (134217728 bytes). Consider increasing spark.rpc.message.maxSize or using broadcast variables for large values.
解决办法:增加spark.rpc.message.maxSize,该值默认大小128M
提交任务时加:–conf spark.rpc.message.maxSize=512
spark.sql.crossJoin.enabled for Spark 2.x
解决办法:在脚本里加入如下一行
spark.conf.set(“spark.sql.crossJoin.enabled”, True)
资料参考:资料参考
/bin/sh: 1: /home//run.sh: Permission denied**
跑定时任务编译时,若出现以上错误,一般是执行权限不够
解决办法:cd 到run.sh所在的路径;chmod a+x run.sh
即对提示的文件赋予可执行权限或读写权限,给它赋予权限即可。
Df = spark.createDataFrame(Rdd)
ValueError: RDD is empty
rdd转df时rdd不能为空
hbase读进来的数据,变成rdd是乱码,如何解决?
.encode(‘utf-8’).decode(‘utf-8’)
hbase读进来的数据,变成rdd是乱码,如何解决?
dataframe写入mysql报错:
java.sql.SQLException: Unknown initial character set index ‘224’ received from server. Initial client character set can be forced via the ‘characterEncoding’ property
加上?useUnicode=true&characterEncoding=UTF-8,就没问题了
python报错集合:
print list[0][1][i]
IndexError: list index out of range
遍历list的时候一定不能越界,比如想让他生成10个词,实际只有5个,要先看下length是多少,在length范围内遍历
happybase读取的数据转换成dict时,会报keyerror,是因为有些字段是空的
dict = {‘title’: d[‘news:title’].encode(‘utf-8’).decode(‘utf-8’),
KeyError: 'news:title’
解决办法:在把值赋给dict之前,对d[‘news:title’]用try except做容错处理
dict = {‘title’: d[‘news:title’].encode(‘utf-8’).decode(‘utf-8’),
AttributeError: ‘int’ object has no attribute 'encode’
解决:所以要把int类型的先变成str再解码
tensorflow报错集合:
tensorflow.python.framework.errors_impl.InternalError:
Failed to create session.
解决方式:查看显卡(命令:nvidia-smi)中进程的使用情况,杀掉python进程,或者看看哪个gpu有空缺,把任务提到哪块gpu上
(nvidia-smi -L :用于列出所有可用的 NVIDIA 设备信息)