参考尚硅谷hadoop 教程
链接:http://i8n.cn/U4sxdj
- 概念
NameNode被格式化之后,将在/opt/module/hadoop-2.7.2/data/tmp/dfs/name/current目录中产生如下文件
fsimage_0000000000000000000
fsimage_0000000000000000000.md5
seen_txid
VERSION
(1)Fsimage文件:HDFS文件系统元数据的一个永久性的检查点,
其中包含HDFS文件系统的所有目录和文件inode的序列化信息。
(2)Edits文件:存放HDFS文件系统的所有更新操作的路径,文件系统客户端执行的所有写操作首先会被记录到Edits文件中。
(3)seen_txid文件保存的是一个数字,就是最后一个edits_的数字
(4)每次NameNode启动的时候都会将Fsimage文件读入内存,加载Edits里面的更新操作,保证内存中的元数据信息是最新的、同步的,可以看成NameNode启动的时候就将Fsimage和Edits文件进行了合并。
- oiv查看Fsimage文件
(1)查看oiv和oev命令
[hadoop@hadoop102 current]$ hdfs
oiv apply the offline fsimage viewer to an fsimage
oiv_legacy apply the offline fsimage viewer to an legacy fsimage
oev apply the offline edits viewer to an edits file
(2)基本语法
hdfs oiv -p 文件类型 -i镜像文件 -o 转换后文件输出路径
(3)案例实操
[hadoop@hadoop102 current]$ pwd
/opt/module/hadoop-2.7.2/data/tmp/dfs/name1/current
[hadoop@hadoop102 current]$ ll
total 1064
-rw-rw-r--. 1 hadoop hadoop 42 Aug 16 02:44 edits_0000000000000000001-0000000000000000002
-rw-rw-r--. 1 hadoop hadoop 534 Aug 16 03:44 edits_0000000000000000003-0000000000000000010
-rw-rw-r--. 1 hadoop hadoop 1048576 Aug 16 03:44 edits_inprogress_0000000000000000011
-rw-rw-r--. 1 hadoop hadoop 991 Aug 16 03:46 fsiamge.xml
-rw-rw-r--. 1 hadoop hadoop 353 Aug 16 02:44 fsimage_0000000000000000002
-rw-rw-r--. 1 hadoop hadoop 62 Aug 16 02:44 fsimage_0000000000000000002.md5
-rw-rw-r--. 1 hadoop hadoop 442 Aug 16 03:44 fsimage_0000000000000000010
-rw-rw-r--. 1 hadoop hadoop 62 Aug 16 03:44 fsimage_0000000000000000010.md5
-rw-rw-r--. 1 hadoop hadoop 3 Aug 16 03:44 seen_txid
-rw-rw-r--. 1 hadoop hadoop 205 Aug 16 02:40 VERSION
[hadoop@hadoop102 current]$ hdfs oiv -p XML -i fsimage_0000000000000000010 -o fs.xml
[hadoop@hadoop102 current]$ cat fs.xml
将显示的xml文件内容拷贝到idea中创建的xml文件中,并格式化。部分显示结果如下。
<?xml version="1.0"?>
<fsimage>
<NameSection>
<genstampV1>1000</genstampV1>
<genstampV2>1001</genstampV2>
<genstampV1Limit>0</genstampV1Limit>
<lastAllocatedBlockId>1073741825</lastAllocatedBlockId>
<txid>10</txid>
</NameSection>
<INodeSection>
<lastInodeId>16386</lastInodeId>
<inode>
<id>16385</id>
<type>DIRECTORY</type>
<name></name>
<mtime>1597560655009</mtime>
<permission>hadoop:supergroup:rwxr-xr-x</permission>
<nsquota>9223372036854775807</nsquota>
<dsquota>-1</dsquota>
</inode>
<inode>
<id>16386</id>
<type>FILE</type>
<name>xinyue.txt</name>
<replication>3</replication>
<mtime>1597560655002</mtime>
<atime>1597560654373</atime>
<perferredBlockSize>134217728</perferredBlockSize>
<permission>hadoop:supergroup:rw-r--r--</permission>
<blocks>
<block>
<id>1073741825</id>
<genstamp>1001</genstamp>
<numBytes>14</numBytes>
</block>
</blocks>
</inode>
</INodeSection>
<INodeReferenceSection></INodeReferenceSection>
<SnapshotSection>
<snapshotCounter>0</snapshotCounter>
</SnapshotSection>
<INodeDirectorySection>
<directory>
<parent>16385</parent>
<inode>16386</inode>
</directory>
</INodeDirectorySection>
<FileUnderConstructionSection></FileUnderConstructionSection>
<SnapshotDiffSection>
<diff>
<inodeid>16385</inodeid>
</diff>
</SnapshotDiffSection>
<SecretManagerSection>
<currentId>0</currentId>
<tokenSequenceNumber>0</tokenSequenceNumber>
</SecretManagerSection>
<CacheManagerSection>
<nextDirectiveId>1</nextDirectiveId>
</CacheManagerSection>
</fsimage>
思考:可以看出,Fsimage中没有记录块所对应DataNode,为什么?
在集群启动后,要求DataNode上报数据块信息,并间隔一段时间后再次上报。
- oev查看Edits文件
(1)基本语法
hdfs oev -p 文件类型 -i编辑日志 -o 转换后文件输出路径
(2)案例实操
[hadoop@hadoop102 current]$ hdfs oev -p XML -i edits_0000000000000000003-0000000000000000010 -o edits.xml
[hadoop@hadoop102 current]$ cat edits.xml
将显示的xml文件内容拷贝到idea中创建的xml文件中,并格式化。显示结果如下。
<?xml version="1.0" encoding="UTF-8"?>
<EDITS>
<EDITS_VERSION>-63</EDITS_VERSION>
<RECORD>
<OPCODE>OP_START_LOG_SEGMENT</OPCODE>
<DATA>
<TXID>3</TXID>
</DATA>
</RECORD>
<RECORD>
<OPCODE>OP_ADD</OPCODE>
<DATA>
<TXID>4</TXID>
<LENGTH>0</LENGTH>
<INODEID>16386</INODEID>
<PATH>/xinyue.txt._COPYING_</PATH>
<REPLICATION>3</REPLICATION>
<MTIME>1597560654373</MTIME>
<ATIME>1597560654373</ATIME>
<BLOCKSIZE>134217728</BLOCKSIZE>
<CLIENT_NAME>DFSClient_NONMAPREDUCE_95516467_1</CLIENT_NAME>
<CLIENT_MACHINE>192.168.88.102</CLIENT_MACHINE>
<OVERWRITE>true</OVERWRITE>
<PERMISSION_STATUS>
<USERNAME>hadoop</USERNAME>
<GROUPNAME>supergroup</GROUPNAME>
<MODE>420</MODE>
</PERMISSION_STATUS>
<RPC_CLIENTID>65a65bd9-5b47-47af-b426-641b6999d9af</RPC_CLIENTID>
<RPC_CALLID>3</RPC_CALLID>
</DATA>
</RECORD>
<RECORD>
<OPCODE>OP_ALLOCATE_BLOCK_ID</OPCODE>
<DATA>
<TXID>5</TXID>
<BLOCK_ID>1073741825</BLOCK_ID>
</DATA>
</RECORD>
<RECORD>
<OPCODE>OP_SET_GENSTAMP_V2</OPCODE>
<DATA>
<TXID>6</TXID>
<GENSTAMPV2>1001</GENSTAMPV2>
</DATA>
</RECORD>
<RECORD>
<OPCODE>OP_ADD_BLOCK</OPCODE>
<DATA>
<TXID>7</TXID>
<PATH>/xinyue.txt._COPYING_</PATH>
<BLOCK>
<BLOCK_ID>1073741825</BLOCK_ID>
<NUM_BYTES>0</NUM_BYTES>
<GENSTAMP>1001</GENSTAMP>
</BLOCK>
<RPC_CLIENTID></RPC_CLIENTID>
<RPC_CALLID>-2</RPC_CALLID>
</DATA>
</RECORD>
<RECORD>
<OPCODE>OP_CLOSE</OPCODE>
<DATA>
<TXID>8</TXID>
<LENGTH>0</LENGTH>
<INODEID>0</INODEID>
<PATH>/xinyue.txt._COPYING_</PATH>
<REPLICATION>3</REPLICATION>
<MTIME>1597560655002</MTIME>
<ATIME>1597560654373</ATIME>
<BLOCKSIZE>134217728</BLOCKSIZE>
<CLIENT_NAME></CLIENT_NAME>
<CLIENT_MACHINE></CLIENT_MACHINE>
<OVERWRITE>false</OVERWRITE>
<BLOCK>
<BLOCK_ID>1073741825</BLOCK_ID>
<NUM_BYTES>14</NUM_BYTES>
<GENSTAMP>1001</GENSTAMP>
</BLOCK>
<PERMISSION_STATUS>
<USERNAME>hadoop</USERNAME>
<GROUPNAME>supergroup</GROUPNAME>
<MODE>420</MODE>
</PERMISSION_STATUS>
</DATA>
</RECORD>
<RECORD>
<OPCODE>OP_RENAME_OLD</OPCODE>
<DATA>
<TXID>9</TXID>
<LENGTH>0</LENGTH>
<SRC>/xinyue.txt._COPYING_</SRC>
<DST>/xinyue.txt</DST>
<TIMESTAMP>1597560655009</TIMESTAMP>
<RPC_CLIENTID>65a65bd9-5b47-47af-b426-641b6999d9af</RPC_CLIENTID>
<RPC_CALLID>9</RPC_CALLID>
</DATA>
</RECORD>
<RECORD>
<OPCODE>OP_END_LOG_SEGMENT</OPCODE>
<DATA>
<TXID>10</TXID>
</DATA>
</RECORD>
</EDITS>
思考:NameNode如何确定下次开机启动的时候合并哪些Edits
?
几百本常用电子书免费领取:https://github.com/XiangLinPro/IT_book