由于上级领导给的需求,让把hbase中的数据通过mapreduce导入到es中,并且需要重新设计es中存储的结构。因为本人菜鸟一名,初次接触es就遇到了以下的坑,在此总结出来:
首先,先推荐两个博客,我就是参考这两个博客加上自己上百次的实验,最终完成了任务,非常感谢这两位博主的分享。
1.https://blog.csdn.net/fxsdbt520/article/details/53893421?utm_source=itdadao&utm_medium=referral
2.https://blog.csdn.net/u014231523/article/details/52816218
下面就是我本人遇到的坑:
通过博客一将基本的项目搭建好了,但是运行就会报下面的错
Exception in thread "main" org.apache.hadoop.hbase.DoNotRetryIOException: java.lang.IllegalAccessError: tried to access method com.google.common.base.Stopwatch.<init>()V from class org.apache.hadoop.hbase.zookeeper.MetaTableLocator
at org.apache.hadoop.hbase.client.RpcRetryingCaller.translateException(RpcRetryingCaller.java:229)
at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:202)
at org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:320)
at org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:295)
at org.apache.hadoop.hbase.client.ClientScanner.initializeScannerInConstruction(ClientScanner.java:160)
at org.apache.hadoop.hbase.client.ClientScanner.<init>(ClientScanner.java:155)
at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:821)
at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:193)
at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:89)
at org.apache.hadoop.hbase.client.MetaScanner.allTableRegions(MetaScanner.java:324)
at org.apache.hadoop.hbase.client.HRegionLocator.getAllRegionLocations(HRegionLocator.java:88)
at org.apache.hadoop.hbase.util.RegionSizeCalculator.init(RegionSizeCalculator.java:94)
at org.apache.hadoop.hbase.util.RegionSizeCalculator.<init>(RegionSizeCalculator.java:81)
at org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.getSplits(TableInputFormatBase.java:256)
at org.apache.hadoop.hbase.mapreduce.TableInputFormat.getSplits(TableInputFormat.java:237)
at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:301)
at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:318)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:196)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Unknown Source)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1308)
at org.eminem.hadoop.ESInitCall.run(ESInitCall.java:51)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.eminem.hadoop.ESInitCall.main(ESInitCall.java:75)
Caused by: java.lang.IllegalAccessError: tried to access method com.google.common.base.Stopwatch.<init>()V from class org.apache.hadoop.hbase.zookeeper.MetaTableLocator
at org.apache.hadoop.hbase.zookeeper.MetaTableLocator.blockUntilAvailable(MetaTableLocator.java:596)
at org.apache.hadoop.hbase.zookeeper.MetaTableLocator.blockUntilAvailable(MetaTableLocator.java:580)
at org.apache.hadoop.hbase.zookeeper.MetaTableLocator.blockUntilAvailable(MetaTableLocator.java:559)
at org.apache.hadoop.hbase.client.ZooKeeperRegistry.getMetaRegionLocation(ZooKeeperRegistry.java:61)
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateMeta(ConnectionManager.java:1185)
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1152)
at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:300)
at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:151)
at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:59)
at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200)
... 26 more
这是由于hbase和es中的guava包冲突报的错,hbase用的是guava-12.0.1,es用的是guava-18,在网上找了半天的解决办法,最终是通过另建一个只引es包的maven项目,然后将hbase和es分开如下
这是一个空项目,只配置了pom.xml文件,以下是pom.xml文件的配置
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>my.elasticsearch</groupId>
<artifactId>es-shaded</artifactId>
<version>1.0-SNAPSHOT</version>
<packaging>jar</packaging>
<name>es-shaded</name>
<url>http://maven.apache.org</url>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<elasticsearch.version>2.3.4</elasticsearch.version>
</properties>
<dependencies>
<dependency>
<groupId>org.elasticsearch</groupId>
<artifactId>elasticsearch</artifactId>
<version>${elasticsearch.version}</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<artifactId>maven-compiler-plugin</artifactId>
<version>2.3.2</version>
<configuration>
<source>1.8</source>
<target>1.8</target>
</configuration>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>2.4.1</version>
<configuration>
<createDependencyReducedPom>false</createDependencyReducedPom>
</configuration>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<relocations>
<relocation>
<pattern>com.google.guava</pattern>
<shadedPattern>my.elasticsearch.guava</shadedPattern>
</relocation>
<relocation>
<pattern>org.joda</pattern>
<shadedPattern>my.elasticsearch.joda</shadedPattern>
</relocation>
<relocation>
<pattern>com.google.common</pattern>
<shadedPattern>my.elasticsearch.common</shadedPattern>
</relocation>
<relocation>
<pattern>org.elasticsearch</pattern>
<shadedPattern>my.elasticsearch</shadedPattern>
</relocation>
</relocations>
<transformers>
<transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer" />
</transformers>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
</project>
配置好后,利用maven 清理,更新,打包,会在maven的源路径下生成一个jar包,然后关闭这个编译好的项目,在主项目中的pom.xml中引用这个jar包。
注:
这一段配置是非常重要的,你在主项目中需要更换导入类的包名。例如:
基本的问题解决了,在eclipse中运行程序没有问题,但是在集群中运行却一直报
Error: FAIL_ON_SYMBOL_HASH_OVERFLOW
这个错误。这让我很头疼,查看集群的日志,错误如下:
2018-08-14 14:51:15,802 FATAL [main] org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.NoSuchFieldError: FAIL_ON_SYMBOL_HASH_OVERFLOW
at my.elasticsearch.common.xcontent.json.JsonXContent.<clinit>(JsonXContent.java:49)
at my.elasticsearch.common.xcontent.XContentFactory.contentBuilder(XContentFactory.java:122)
at my.elasticsearch.action.index.IndexRequest.source(IndexRequest.java:382)
at my.elasticsearch.action.index.IndexRequest.source(IndexRequest.java:372)
at my.elasticsearch.action.update.UpdateRequest.doc(UpdateRequest.java:472)
at my.elasticsearch.action.update.UpdateRequestBuilder.setDoc(UpdateRequestBuilder.java:163)
at org.eminem.hadoop.mapper.ESInitMapper.map(ESInitMapper.java:135)
at org.eminem.hadoop.mapper.ESInitMapper.map(ESInitMapper.java:1)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
在网上查找解决方法,都是在说jackson版本不对,但是我在两个项目的pom文件中加了各种版本jackson但是都不行,最后尝试之前的maven映射引用,就在es的那个项目的配置文件中加入了
<relocation>
<pattern>com.fasterxml.jackson</pattern>
<shadedPattern>my.elasticsearch</shadedPattern>
</relocation>
这个配置,结果文件是:
加入了这个配置,然后将项目重新打包编译,放到集群上运行,结果成功!
下面是我在码云上面分享的自己根据博客一修改好的代码: