hbase跨集群复制snapshort错误解决

最新推荐文章于 2024-06-12 10:51:15 发布

回忆美好

最新推荐文章于 2024-06-12 10:51:15 发布

阅读量1.7k

点赞数

起因

在跨集群复制HBase快照时，经常会出现由于/hbase/.tmp/data/xxx FileNotFoundException导致任务失败。
现还原出错场景，并分析错误原因，给出一些常用的解决方法：

Caused by: org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File /datafs/.tmp/data/default/mytable/c48642fecae3913e0d09ba236b014667/info/3c5e9ec890f04560a396040fa8b592a3 not found.
        at org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:119)
        at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:419)
        at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$200(WebHdfsFileSystem.java:107)
        at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.connect(WebHdfsFileSystem.java:595)
        at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$ReadRunner.connect(WebHdfsFileSystem.java:1855)
        at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:673)
        ... 23 more

18/08/13 20:14:14 INFO mapreduce.Job:  map 100% reduce 0%
18/08/13 20:14:14 INFO mapreduce.Job: Job job_1533546266978_0038 failed with state FAILED due to: Task failed task_1533546266978_0038_m_000000

主要原因

在创建快照到跨集群复制过程中，部分StoreFile的位置发生了变动，以至不能正常寻址(使用webhdfs的bug)。

场景还原

准备工作

环境：
源集群：HBase 1.2.0-cdh5.10.0
目标集群：HBase 1.2.0-cdh5.12.1

1. 创建表mytable，2个region，以03为分割，一个列族info，Put 6条数据

put 'mytable','01','info:age','1'`
put 'mytable','02','info:age','2'`
put 'mytable','03','info:age','3'
put 'mytable','04','info:age','1'
put 'mytable','05','info:age','1'
put 'mytable','06','info:age','1'

2. 创建快照mysnapshot，生成以下文件

[root@test108 ~]# hdfs dfs -ls /datafs/.hbase-snapshot/mysnapshot/
Found 2 items
-rw-r--r--   2 hbase hbase         32 2018-08-13 18:48 /datafs/.hbase-snapshot/mysnapshot/.snapshotinfo
-rw-r--r--   2 hbase hbase        466 2018-08-13 18:48 /datafs/.hbase-snapshot/mysnapshot/data.manifest

.snapshot 包含了快照信息，即HBaseProtos.SnapshotDescription对象
name: "mysnapshot"
table: "mytable"
creation_time: 1533774121010
type: FLUSH
version: 2
data.manifest
包含了hbase表schema、attributes、column_families，即HBaseProtos.SnapshotDescription对象，重点的是store_files信息，

region_info {
    region_id: 1533784567273
    table_name {
      namespace: "default"
      qualifier: "mytable"
    }
    start_key: "03"
    end_key: ""
    offline: false
    split: false
    replica_id: 0
}
family_files {
    family_name: "info"
    store_files {
      name: "3c5e9ec890f04560a396040fa8b592a3"
      file_size: 1115
    }
}

3. 修改数据

通过Put 修改一个Region的数据

put 'mytable','04','info:age','4'
put 'mytable','05','info:age','5'
put 'mytable','06','info:age','6'

4. 进行flush，major_compat

模拟跨集群复制过程中出现的大/小合并

hbase(main):001:0> flush 'mytable'
0 row(s) in 0.8200 seconds

hbase(main):002:0> major_compact 'mytable'
0 row(s) in 0.1730 seconds

此时 storefile 3c5e9ec890f04560a396040fa8b592a3 出现在了archive下

[root@test108 ~]# hdfs dfs -ls -R /datafs/archive/data/default/mytable/c48642fecae3913e0d09ba236b014667
drwxr-xr-x   - hbase hbase          0 2018-08-15 08:30 /datafs/archive/data/default/mytable/c48642fecae3913e0d09ba236b014667/info
-rw-r--r--   2 hbase hbase       1115 2018-08-13 18:48 /datafs/archive/data/default/mytable/c48642fecae3913e0d09ba236b014667/info/3c5e9ec890f04560a396040fa8b592a3

还原出错

[root@a2502f06 ~]# hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot    \
>  -Dipc.client.fallback-to-simple-auth-allowed=true  \
>  -Dmapreduce.job.queuename=root.default  \
>  -snapshot mysnapshot    \
>  -copy-from webhdfs://archive.cloudera.com/datafs     \
>  -copy-to webhdfs://nameservice1/hbase/     \
>  -chuser hbase -chgroup hbase -chmod 755 -overwrite

控制台提示，FileNotFound，任务失败。

18/08/13 20:59:34 INFO mapreduce.Job: Task Id : attempt_1533546266978_0037_m_000000_0, Status : FAILED
Error: java.io.FileNotFoundException: File /datafs/.tmp/data/default/mytable/c48642fecae3913e0d09ba236b014667/info/3c5e9ec890f04560a396040fa8b592a3 not found.
        at sun.reflect.GeneratedConstructorAccessor14.newInstance(Unknown Source)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
        at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95)
        at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.toIOException(WebHdfsFileSystem.java:450)

源码剖析

1.ExportSnapshot执行复制前会先将.snapshot,data.manifest 复制到目标端 .hbase-snapshot/.tmp/mysnapshot下：

[root@a2502f06 ~]# hdfs dfs -ls /hbase/.hbase-snapshot/.tmp/mysnapshot
Found 2 items
-rwxr-xr-x   2 hbase hbase         32 2018-08-13 20:28 /hbase/.hbase-snapshot/.tmp/mysnapshot/.snapshotinfo
-rwxr-xr-x   2 hbase hbase        466 2018-08-13 20:28 /hbase/.hbase-snapshot/.tmp/mysnapshot/data.manifest

2.解析data.manifest，按照storefile进行逻辑切片，map每次会读入一个SnapshotFileInfo的信息，只包含了HFileLink信息，并没有包括具体路径。

String region = regionInfo.getEncodedName();
String hfile = storeFile.getName();
Path path = HFileLink.createPath(table, region, family, hfile); 
SnapshotFileInfo fileInfo = SnapshotFileInfo.newBuilder()
    .setType(SnapshotFileInfo.Type.HFILE)
    .setHfile(path.toString())
    .build();

3.map阶段

每读入一个SnapshotFileInfo时，拼接出关于StoreFile可能出现的4个路径，读取时按照该顺序查找。

/datafs/data/default/mytable/c48642fecae3913e0d09ba236b014667/info/3c5e9ec890f04560a396040fa8b592a3
/datafs/.tmp/data/default/mytable/c48642fecae3913e0d09ba236b014667/info/3c5e9ec890f04560a396040fa8b592a3
/datafs/mobdir/data/default/mytable/c48642fecae3913e0d09ba236b014667/info/3c5e9ec890f04560a396040fa8b592a3
/datafs/archive/data/default/mytable/c48642fecae3913e0d09ba236b014667/info/3c5e9ec890f04560a396040fa8b592a3

当map读入数据时，调用ExportSnapshot.ExportMapper#openSourceFile 初始化InputStream的过程中
通过调用FileLink.tryOpen()方法中，来确定StoreFile的真实路径路径(遍历4个路径，抛出异常说明不存在，继续找下一个)。

private FSDataInputStream tryOpen() throws IOException {
      for (Path path: fileLink.getLocations()) {
        if (path.equals(currentPath)) continue;
        try {
          in = fs.open(path, bufferSize);
          if (pos != 0) in.seek(pos);
          assert(in.getPos() == pos) : "Link unable to seek to the right position=" + pos;
          currentPath = path;
          return(in);
        } catch (FileNotFoundException e) {
          // Try another file location
        }
      }
      throw new FileNotFoundException("Unable to open link: " + fileLink);
    }

在debug中发现，fs为org.apache.hadoop.hdfs.web.WebHdfsFileSystem对象
遗憾的是，WebHdfsFileSystem调用getPos()时，不会抛出异常，因此，第一次获取到的路径如下(实际文件存在于archive)。

/datafs/data/default/mytable/c48642fecae3913e0d09ba236b014667/info/3c5e9ec890f04560a396040fa8b592a3

并将该路径设置为currentPath(下一次会用到，避免重复判定)。

当InputStream.read(buffer)时，调用FileLink.read()。

@Override
public int read() throws IOException {
      int res;
      try {
        res = in.read();
      } catch (FileNotFoundException e) {
        res = tryOpen().read();
      } catch (NullPointerException e) { // HDFS 1.x - DFSInputStream.getBlockAt()
        res = tryOpen().read();
      } catch (AssertionError e) { // assert in HDFS 1.x - DFSInputStream.getBlockAt()
        res = tryOpen().read();
      }
      if (res > 0) pos += 1;
      return res;
}

由于初始化时，并没有使用正确的路径，因此 in.read()时，抛出FileNotFoundException(第一次)
继续调用tryOpen().read()方法遍历4个路径，此时 currentPath为 data路径跳过，使用下一个路径(文件仍不在这)

/datafs/.tmp/data/default/mytable/c48642fecae3913e0d09ba236b014667/info/3c5e9ec890f04560a396040fa8b592a3

read .tmp路径，抛出FileNotFoundException（第二次），此异常向上抛出，task失败，观察，经常出现由于.tmp下文件找不到报Error，实际跟.tmp并没多大关系。

2018-08-13 20:13:59,738 WARN [main] org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hbase (auth:SIMPLE) cause:java.io.FileNotFoundException: File does not exist: /datafs/data/default/mytable/c48642fecae3913e0d09ba236b014667/info/3c5e9ec890f04560a396040fa8b592a3
2018-08-13 20:13:59,740 WARN [main] org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hbase (auth:SIMPLE) cause:java.io.FileNotFoundException: File does not exist: /datafs/.tmp/data/default/mytable/c48642fecae3913e0d09ba236b014667/info/3c5e9ec890f04560a396040fa8b592a3
2018-08-13 20:13:59,741 WARN [main] org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hbase (auth:SIMPLE) cause:java.io.FileNotFoundException: File does not exist: /datafs/mobdir/data/default/mytable/c48642fecae3913e0d09ba236b014667/info/3c5e9ec890f04560a396040fa8b592a3
--------------------------------------------------------------------------------------------------------------------------------------------
2018-08-13 20:13:59,830 WARN [main] org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hbase (auth:SIMPLE) cause:java.io.FileNotFoundException: File /datafs/data/default/mytable/c48642fecae3913e0d09ba236b014667/info/3c5e9ec890f04560a396040fa8b592a3 not found.
2018-08-13 20:13:59,833 WARN [main] org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hbase (auth:SIMPLE) cause:java.io.FileNotFoundException: File /datafs/.tmp/data/default/mytable/c48642fecae3913e0d09ba236b014667/info/3c5e9ec890f04560a396040fa8b592a3 not found.
--------------------------------------------------------------------------------------------------------------------------------------------
2018-08-13 20:13:59,833 ERROR [main] org.apache.hadoop.hbase.snapshot.ExportSnapshot$ExportMapper: Error copying webhdfs://archive.cloudera.com/datafs/archive/data/default/mytable/c48642fecae3913e0d09ba236b014667/info/3c5e9ec890f04560a396040fa8b592a3 to webhdfs://nameservice1/hbase/archive/data/default/mytable/c48642fecae3913e0d09ba236b014667/info/3c5e9ec890f04560a396040fa8b592a3
java.io.FileNotFoundException: File /datafs/.tmp/data/default/mytable/c48642fecae3913e0d09ba236b014667/info/3c5e9ec890f04560a396040fa8b592a3 not found.
    at sun.reflect.GeneratedConstructorAccessor14.newInstance(Unknown Source)

分隔线之间的FileNotFoundException，即为read()时，抛出的两次异常
分隔线之上File does not exist 为ExportSnapshot 系调用 getSourceFileStatus产生，可以观察到在遍历 data/.tmp/mobdir 后寻找到了正确路径archive(未打印出)。

解决思路

综上：查找StoreFile时只会查找data、.tmp目录，不会查找archive目录。
因此解决思路上，一是避免StoreFile出现在archive下，二是能正确获致到archive路径。

避免StoreFile出现在archive

根据生产经验，在数据大量写入过程中，Region下不断生成StoreFile，当StoreFile数量达到阈值时，触发大/小合并
被合并的StoreFile文件移动到了archive文件下，可使用以下几个方法避免复制时大/小合并

对表进行major_compact后再建快照
如果表可以接受一段时间的不可用，几分钟到几十分钟不等，可对表进行disable后再操作
或者适当调大 hbase.hstore.compaction.Threadhold(表写入不频繁下)
根据业务情况，尽可能大的错开数据写入与复制的间隔(等待大/小合并自动完成)

避免使用webhdfs

使用hdfs时，可以正常的抛出异常(未具体使用）

修复源码bug

使得在寻址过程中，可正确读到archive文件夹
借鉴getSourceFileStatus()，在for中加一行 fs.getFileStatus()，遍历时正常抛出FileNotFoundException。

private FSDataInputStream tryOpen() throws IOException {
            for (Path path : fileLink.getLocations()) {
                if (path.equals(currentPath)) continue;
                try {
                    fs.getFileStatus(path); // 添加此行，使正常抛出异常
                    in = fs.open(path, bufferSize);
                    if (pos != 0) in.seek(pos);
                    assert(in.getPos() == pos) : "Link unable to seek to the right position=" + pos;
                    if (LOG.isTraceEnabled()) {
                        if (currentPath == null) {
                            LOG.debug("link open path=" + path);
                        } else {
                            LOG.trace("link switch from path=" + currentPath + " to path=" + path);
                        }
                    }
                    currentPath = path;
                    return(in);
                } catch (FileNotFoundException e) {
                    // Try another file location
                }
            }
            throw new FileNotFoundException("Unable to open link: " + fileLink);
        }

将ExportSnapshot抽出，重新组织HFileLink,FileLink,WALLink依赖。

打包成hadoop jar，避免影响其它功能。