使用Mapreduce将hbase 1.1.2 数据导入elasticsearch 2.3.4 中所遇到的坑

由于上级领导给的需求,让把hbase中的数据通过mapreduce导入到es中,并且需要重新设计es中存储的结构。因为本人菜鸟一名,初次接触es就遇到了以下的坑,在此总结出来:

首先,先推荐两个博客,我就是参考这两个博客加上自己上百次的实验,最终完成了任务,非常感谢这两位博主的分享。

1.https://blog.csdn.net/fxsdbt520/article/details/53893421?utm_source=itdadao&utm_medium=referral 

2.https://blog.csdn.net/u014231523/article/details/52816218 

下面就是我本人遇到的坑:

通过博客一将基本的项目搭建好了,但是运行就会报下面的错

Exception in thread "main" org.apache.hadoop.hbase.DoNotRetryIOException: java.lang.IllegalAccessError: tried to access method com.google.common.base.Stopwatch.<init>()V from class org.apache.hadoop.hbase.zookeeper.MetaTableLocator
	at org.apache.hadoop.hbase.client.RpcRetryingCaller.translateException(RpcRetryingCaller.java:229)
	at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:202)
	at org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:320)
	at org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:295)
	at org.apache.hadoop.hbase.client.ClientScanner.initializeScannerInConstruction(ClientScanner.java:160)
	at org.apache.hadoop.hbase.client.ClientScanner.<init>(ClientScanner.java:155)
	at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:821)
	at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:193)
	at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:89)
	at org.apache.hadoop.hbase.client.MetaScanner.allTableRegions(MetaScanner.java:324)
	at org.apache.hadoop.hbase.client.HRegionLocator.getAllRegionLocations(HRegionLocator.java:88)
	at org.apache.hadoop.hbase.util.RegionSizeCalculator.init(RegionSizeCalculator.java:94)
	at org.apache.hadoop.hbase.util.RegionSizeCalculator.<init>(RegionSizeCalculator.java:81)
	at org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.getSplits(TableInputFormatBase.java:256)
	at org.apache.hadoop.hbase.mapreduce.TableInputFormat.getSplits(TableInputFormat.java:237)
	at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:301)
	at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:318)
	at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:196)
	at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
	at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Unknown Source)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
	at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
	at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1308)
	at org.eminem.hadoop.ESInitCall.run(ESInitCall.java:51)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
	at org.eminem.hadoop.ESInitCall.main(ESInitCall.java:75)
Caused by: java.lang.IllegalAccessError: tried to access method com.google.common.base.Stopwatch.<init>()V from class org.apache.hadoop.hbase.zookeeper.MetaTableLocator
	at org.apache.hadoop.hbase.zookeeper.MetaTableLocator.blockUntilAvailable(MetaTableLocator.java:596)
	at org.apache.hadoop.hbase.zookeeper.MetaTableLocator.blockUntilAvailable(MetaTableLocator.java:580)
	at org.apache.hadoop.hbase.zookeeper.MetaTableLocator.blockUntilAvailable(MetaTableLocator.java:559)
	at org.apache.hadoop.hbase.client.ZooKeeperRegistry.getMetaRegionLocation(ZooKeeperRegistry.java:61)
	at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateMeta(ConnectionManager.java:1185)
	at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1152)
	at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:300)
	at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:151)
	at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:59)
	at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200)
	... 26 more

这是由于hbase和es中的guava包冲突报的错,hbase用的是guava-12.0.1,es用的是guava-18,在网上找了半天的解决办法,最终是通过另建一个只引es包的maven项目,然后将hbase和es分开如下

这是一个空项目,只配置了pom.xml文件,以下是pom.xml文件的配置

<project xmlns="http://maven.apache.org/POM/4.0.0"
	xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
	<modelVersion>4.0.0</modelVersion>

	<groupId>my.elasticsearch</groupId>
	<artifactId>es-shaded</artifactId>
	<version>1.0-SNAPSHOT</version>
	<packaging>jar</packaging>

	<name>es-shaded</name>
	<url>http://maven.apache.org</url>

	<properties>
		<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
		<elasticsearch.version>2.3.4</elasticsearch.version>
	</properties>

	<dependencies>
		<dependency>
			<groupId>org.elasticsearch</groupId>
			<artifactId>elasticsearch</artifactId>
			<version>${elasticsearch.version}</version>
		</dependency>
	</dependencies>
	
	<build>
		<plugins>
			<plugin>
				<artifactId>maven-compiler-plugin</artifactId>
				<version>2.3.2</version>
				<configuration>
					<source>1.8</source>
					<target>1.8</target>
				</configuration>
			</plugin>
			<plugin>
				<groupId>org.apache.maven.plugins</groupId>
				<artifactId>maven-shade-plugin</artifactId>
				<version>2.4.1</version>
				<configuration>
					<createDependencyReducedPom>false</createDependencyReducedPom>
				</configuration>
				<executions>
					<execution>
						<phase>package</phase>
						<goals>
							<goal>shade</goal>
						</goals>
						<configuration>
							<relocations>
								<relocation>
									<pattern>com.google.guava</pattern>
									<shadedPattern>my.elasticsearch.guava</shadedPattern>
								</relocation>
								<relocation>
									<pattern>org.joda</pattern>
									<shadedPattern>my.elasticsearch.joda</shadedPattern>
								</relocation>
								<relocation>
									<pattern>com.google.common</pattern>
									<shadedPattern>my.elasticsearch.common</shadedPattern>
								</relocation>
								<relocation>
									<pattern>org.elasticsearch</pattern>
									<shadedPattern>my.elasticsearch</shadedPattern>
								</relocation>
							</relocations>
							<transformers>
								<transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer" />
							</transformers>
						</configuration>
					</execution>
				</executions>
			</plugin>
		</plugins>
	</build>
</project>

配置好后,利用maven 清理,更新,打包,会在maven的源路径下生成一个jar包,然后关闭这个编译好的项目,在主项目中的pom.xml中引用这个jar包。

注:

这一段配置是非常重要的,你在主项目中需要更换导入类的包名。例如:

基本的问题解决了,在eclipse中运行程序没有问题,但是在集群中运行却一直报

Error: FAIL_ON_SYMBOL_HASH_OVERFLOW 

这个错误。这让我很头疼,查看集群的日志,错误如下:

2018-08-14 14:51:15,802 FATAL [main] org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.NoSuchFieldError: FAIL_ON_SYMBOL_HASH_OVERFLOW
	at my.elasticsearch.common.xcontent.json.JsonXContent.<clinit>(JsonXContent.java:49)
	at my.elasticsearch.common.xcontent.XContentFactory.contentBuilder(XContentFactory.java:122)
	at my.elasticsearch.action.index.IndexRequest.source(IndexRequest.java:382)
	at my.elasticsearch.action.index.IndexRequest.source(IndexRequest.java:372)
	at my.elasticsearch.action.update.UpdateRequest.doc(UpdateRequest.java:472)
	at my.elasticsearch.action.update.UpdateRequestBuilder.setDoc(UpdateRequestBuilder.java:163)
	at org.eminem.hadoop.mapper.ESInitMapper.map(ESInitMapper.java:135)
	at org.eminem.hadoop.mapper.ESInitMapper.map(ESInitMapper.java:1)
	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)

在网上查找解决方法,都是在说jackson版本不对,但是我在两个项目的pom文件中加了各种版本jackson但是都不行,最后尝试之前的maven映射引用,就在es的那个项目的配置文件中加入了

                                <relocation>
                                    <pattern>com.fasterxml.jackson</pattern>
                                    <shadedPattern>my.elasticsearch</shadedPattern>
                                </relocation>

这个配置,结果文件是:

加入了这个配置,然后将项目重新打包编译,放到集群上运行,结果成功!

 

下面是我在码云上面分享的自己根据博客一修改好的代码:

https://gitee.com/zhangxiaoze/hbaseToEs

评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值