Apache Griffin调试各种报错总结

最新推荐文章于 2024-05-05 21:55:35 发布

ZhaoHY KeepRunning

最新推荐文章于 2024-05-05 21:55:35 发布

阅读量2.3k

点赞数 2

分类专栏：数据质量监控工具-Apache Griffin 文章标签： kafka hadoop spark hdfs 大数据

本文链接：https://blog.csdn.net/weixin_43160819/article/details/118298681

版权

本文总结了Apache Griffin在使用过程中遇到的各种报错，包括数据库表不存在、Spring Boot jar问题、Livy错误、MySQL文件打开失败、Spark启动异常、批处理命令无效、Elasticsearch启动错误、Scala版本冲突、Griffin配置错误、ClassNotFound、Zookeeper超时、Spark-ES源注册失败、版本兼容性问题，以及Gson解析异常。提供了相应的解决方案，如检查数据库字段长度、修改配置文件、更新依赖版本等。

摘要由CSDN通过智能技术生成

错误记录

1、数据库报错 Table ‘quartz.DATACONNECTOR’ doesn’t exist

2021-01-18 14:54:54.135 ERROR 122541 --- [http-nio-8081-exec-8] o.a.c.c.C.[.[.[.[dispatcherServlet]     [175] : Servlet.service() for servlet [dispatcherServlet] in context with path [] threw exception [Request processing failed; nested exception is org.springframework.transaction.TransactionSystemException: Could not commit JPA transaction; nested exception is javax.persistence.RollbackException: Exception [EclipseLink-4002] (Eclipse Persistence Services - 2.6.0.v20150309-bf26070): org.eclipse.persistence.exceptions.DatabaseException
Internal Exception: com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Table 'quartz.DATACONNECTOR' doesn't exist
Error Code: 1146
Call: INSERT INTO DATACONNECTOR (ID, CONFIG, CREATEDDATE, DATAFRAMENAME, DATATIMEZONE, DATAUNIT, MODIFIEDDATE, NAME, TYPE, VERSION) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
        bind => [7, {"database":"griffin_demo","table.name":"demo_tgt","where":"dt=#YYYYMMdd# AND hour=#HH#"}, 1610952894112, null, GMT+8, 1hour, null, target1610952607162, HIVE, 1.2]
Query: InsertObjectQuery(DataConnector{name=target1610952607162type=HIVE, version='1.2', config={"database":"griffin_demo","table.name":"demo_tgt","where":"dt=#YYYYMMdd# AND hour=#HH#"}})] with root cause

com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Table 'quartz.DATACONNECTOR' doesn't exist

这个问题要具体看具体分析，不过大概率是因为要插入的字段太长，超出数据库限制长度造成JPA创建表失败，从而造成后面无法插入表不存在，可以尝试修改源码 DataConnector 类：

@JsonIgnore
    @Transient
    private String defaultDataUnit = "365000d";
    // 添加 columnDefinition = "TEXT"
    @JsonIgnore
    @Column(length = 20480,columnDefinition = "TEXT")
    private String config;

    @Transient
    private Map<String, Object> configMap;

其中，DataConnector的路径在./service/src/main/java/org/apache/griffin/core/measure/entity/DataConnector.java

2.spring-boot-01-helloworld-1.0-SNAPSHOT.jar中没有主清单属性

在用maven编译安装griffin源码的时候,可能会出现jar包没有主清单属性，可以在pom中添加一个SpringBoot的构建的插件，然后重新运行 mvn install即可。

<build>
  <plugins>
  	<plugin>
  		<groupId>org.springframework.boot</groupId>
 		<artifactId>spring-boot-maven-plugin</artifactId>
 		<executions>
          <execution>
            <goals>
              <goal>repackage</goal>
            </goals>
          </execution>
        </executions>
  	</plugin>
  </plugins>
 </build>

其实在griffin-0.7.0版本中有这个插件，只需要在 goal 那里改成repackage即可，记得要clean然后再install重新编译。

3.Livy报错Could not find Livy jars directory

这个大概率是因为包下错了，不实在官网下载incubating-livy那个包，里面没有jar包，要下载安装livy-server-0.x.x.jar

4.MySQL 报错 mysql Failed to open file ‘xxx.sql’, error

mysql里面那些路径的东西在Linux下是以相对路径的形式来查询的，比如我们在/usr/local/tomcat里面打开的MySQL，那么里面的所有路径都是在/usr/local/tomcat目录下进行相对路径查询的，比如我们之前写的source /sqlfile/xxx.sql;，那在MySQL看来，我们给的指令就是让它找/usr/local/tomcat/sqlfile/xxx.sql文件。所以当我们想用mysql打开某个sql文件，要提前cd到相关的绝对路径，然后打开MySQL，输入指令source xxx.sql;

5.Spark启动报错java.net.ConnectException:Call From xxx to xxx:8020 to failed on connection exception:拒绝连接

1.确保防火墙已经关闭了，否则你配置的很多端口都用不了

systemctl stop firewalld
systemctl status firewalld
systemctl disable firewalld

2.确保hosts文件只有集群服务器ip地址与主机名映射，否则每次你都要写全称ip，不能写主机名

vim /etc/hosts
192.168.239.131 Hadoop101
192.168.239.132 Hadoop102
192.168.239.133 Hadoop103

3.看一下hadoop目录下的$HADOOP_HOME/etc/hadoop/core-site.xml或者hdfs-site.xml配置文件，要明确主机hdfs的namenode节点名称和端口号，尤其是端口号，是否和spark/conf/spark-defaults.conf中spark.eventlog.dir配置的端口号，以及spark/conf/spark-env.sh中配置的-Dspark.history.fs.logDirectory端口号相同，比如8020和9000那肯定对不上。

-Dspark.history.fs.logDirectory=hdfs://hadoop101:9000/spark_directory"

spark.eventLog.dir               hdfs://hadoop101:9000/spark_directory

<!-- 指定 HDFS 中 NameNode 的地址 -->
  <property>
    <name>fs.defaultFS</name>
    <!-- 其中，hdfs 为协议名称，hadoop101 为 NameNode 的节点服务器主机名称，9000 为>端口-->
    <value>hdfs://hadoop101:9000</value>
  </property>

6. | xargs kill 似乎总是kill不掉进程

在搭建大数据集群的时候，经常需要编写shell脚本全部开启或关闭某些进程，在我写flume的脚本时，总是关不掉flume消费进程，可能与它后面的kafka提前关闭有关，无论怎样，可以批量化｜ xargs kill后面加一个-9来强行杀死进程

"stop"){
   
    for i in hadoop103
    do
        echo " --------停止 $i 消费 flume-------"
        ssh $i "ps -ef | grep kafka-flume-hdfs | grep -v grep |awk '{print \$2}' | xargs kill -9"
        done
};;

简单解释一下字句含义，ps -ef查全部进程,grep是按照提供的字符条件筛选，加了一个-v就表示相反，awk {print $2}是打印第二个位置的字符串，其实也就是打印出进程ID， xargs kill是一种批量杀死进程的命令，和kill -9类似，后者是杀死某个进程。

7.elasticsearch5.2启动报错

Java HotSpot(TM) 64-Bit Server VM warning: INFO: os::commit_memory(0x00000000c5330000, 986513408, 0) failed; error='Cannot allocate memory' (errno=12)
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (mmap) failed to map 986513408 bytes for committing reserved memory.
# An error report file with more information is saved as:
# /usr/local/elasticsearch/hs_err_pid1675.log

由于elasticsearch5.2默认分配jvm空间大小为1g，我的虚拟机内存不够大，修改jvm空间分配

# vim config/jvm.options  
-Xms1g  
-Xmx1g

修改成

-Xms512m  
-Xmx512m

spark同样也有配置jvm内存的选项，毕竟是基于内存的计算引擎，运行的时候肯定要吃内存的。试验条件下可以把spark的运行内存也改小点(修改spark/conf/spark-default.conf里面的spark.driver.memory，比如512m)，否则也会报错。但是最好还是给虚拟机集群大一点的内存，尤其是运行hadoop的namanode节点的主机，因为要存储大量的元数据，如果还要做计算任务的话，可以给大点内存。

8. scala版本错误

由于griffin底层代码是scala写的，而spark底层也是基于scala，所以当你运行spark -submit执行griffin数据质量检测的时候，因为他会自动调用spack自带的scala，可能会出现与griffin的pom文件里指定的scala版本不一致，这一点经常会被忽略而出错，在griffin官网也有提到这一点，由于目前最新的griffin-0.7.0中父级pom指定得2.11版本的scala，所以建议安装spark2.3或2.4，就不要上spark3.0了，后者自带scala 2.12，会运行出错。
在这里插入图片描述
pom.xml中的Scala.binary.verson项

<properties>
        <encoding>UTF-8</encoding>
        <project.build.sourceEncoding>${
   encoding}</project.build.sourceEncoding>
        <project.reporting.outputEncoding>${
   encoding}</project.reporting.outputEncoding>

        <java.version>1.8</java.version>
        <scala.binary.version>2.11</scala.binary.version>
        <scala211.binary.version>2.11</scala211.binary.version>
        <scala.version>${
   scala.binary.version}.0</scala.version>

9.java.lang.AssertionError: assertion failed: Connector is undefined or invalid

21/06/28 18:54:28 ERROR measure.Application$: assertion failed: Connector is undefined or invalid
java.lang.AssertionError: assertion failed: Connector is undefined or invalid
        at scala.Predef$.assert(Predef.scala:170)
        at org.apache.griffin.measure.configuration.dqdefinition.DataSourceParam.validate(DQConfig.scala:100)
        at org.apache.griffin.measure.configuration.dqdefinition.DQConfig$$anonfun$validate$5.apply(DQConfig.scala:74)
        at org.apache.griffin.measure.configuration.dqdefinition.DQConfig$$anonfun$validate$5.apply(DQConfig.scala:74)
        at scala.collection.immutable.List.foreach(List.scala:392)
        at org.apache.griffin.measure.configuration.dqdefinition.DQConfig.validate(DQConfig.scala:74)
        at org.apache.griffin.measure.configuration.dqdefinition.reader.ParamReader$class.validate(ParamReader.scala:43)
        at org.apache.griffin.measure.configuration.dqdefinition.reader.ParamFileReader.validate(ParamFileReader.scala:33)
        at org.apache.griffin.measure.configuration.dqdefinition.reader.ParamFileReader$$anonfun$readConfig$1.apply(ParamFileReader.scala:40)
        at org.apache.griffin.measure.configuration.dqdefinition.reader.ParamFileReader$$anonfun$readConfig$1.apply(ParamFileReader.scala:36)
        at scala.util.Try$.apply(Try.scala:192)
        at org.apache.griffin.measure.configuration.dqdefinition.reader.ParamFileReader.readConfig(ParamFileReader.scala:36)
        at org.apache.griffin.measure.Application$.readParamFile(Application.scala:127)
        at org.apache.griffin.measure.Application$.main(Application.scala:61)
        at org.apache.griffin.measure.Application.main(Application.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:849)
        at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:167)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:195)

最低0.47元/天解锁文章

ZhaoHY KeepRunning

关注

2
点赞
踩
4

收藏

觉得还不错? 一键收藏
2
评论
Apache Griffin调试各种报错总结

1、数据库报错 Table ‘quartz.DATACONNECTOR’ doesn’t exist2021-01-18 14:54:54.135 ERROR 122541 --- [http-nio-8081-exec-8] o.a.c.c.C.[.[.[.[dispatcherServlet] [175] : Servlet.service() for servlet [dispatcherServlet] in context with path [] threw exception [Re
复制链接

扫一扫