这篇文章,主要讲解如何用把raid跑起来。基础的datanode和namenode前人经验有。
1. 将hadoop-0.21.0-raid.jar放在 $HADOOP_HOME/lib下
第一个要用命令找,在项目文件目录下面,用
find ./ -name "*raid.jar"
.找到hadoop-0.21.0-raid.jar。那个$HADOOP_HOME是配置的环境变量。就是hadoop文件的目录。
2. 将raid.xml 放在$HADOOP_HOME/conf下
还是要用find命令找,再项目文件目录下面,用
find ./ -name "raid.xml"
找到raid.xml,放在$HADOOP_HOME/conf
参考配置
<configuration>
<srcPath prefix="hdfs://192.168.5.208:9000/home"><!--前面是ip,后面是端口,因为用的是9000端口访问集群,所以用9000-->
<policy name = "dhruba">
<property>
<name>srcReplication</name>
<value>3</value>
<description> pick files for RAID only if their replication factor is
greater than or equal to this value.
</description>
</property>
<property>
<name>targetReplication</name>
<value>2</value>
<description> after RAIDing, decrease the replication factor of a file to
this value.
</description>
</property>
<property>
<name>metaReplication</name>
<value>2</value>
<description> the replication factor of the RAID meta file
</description>
</property>
<property>
<name>modTimePeriod</name>
<value>3600000</value>
<description> time (milliseconds) after a file is modified to make it a
candidate for RAIDing
</description>
</property>
</policy>
</srcPath>
<!--srcPath prefix="hdfs://dfs1.xxx.com:9000/warehouse/table1">
<policy name = "table1">
<property>
<name>targetReplication</name>
<value>1</value>
<description> after RAIDing, decrease the replication factor of a file to
this value.
</description>
</property>
<property>
<name>metaReplication</name>
<value>2</value>
<description> the replication factor of the RAID meta file
</description>
</property>
<property>
<name>modTimePeriod</name>
<value>3600000</value>
<description> time (milliseconds) after a file is modified to make it a
candidate for RAIDing
</description>
</property>
</policy>
</srcPath-->
</configuration>
3. hdfs.site配置
这个配置的hdfs.site core-site mapred.site是放在$HADOOP_HOME/conf下面的。
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/home/zsdnr/hadoop21/data</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>/home/zsdnr/hadoop21/name</value>
</property>
<property>
<name>raid.config.file</name>
<value>/home/zsdnr/llm/work_hadoop/hadoop-0.21.0/conf/raid.xml</value>
<description>This is needed by the RaidNode </description>
</property>
<property>
<name>fs.hdfs.impl</name>
<value>org.apache.hadoop.hdfs.DistributedRaidFileSystem</value>
<description>The FileSystem for hdfs: uris.</description>
</property>
<property>
<name>hdfs.raid.locations</name>
<value>hdfs://192.168.5.208:9000/raid</value><!--raid的文件放在这里-->
<description>The location for parity files. If this is is not defined,then defaults to /raid.</description>
</property>
<property>
<name>hdfs.raid.stripeLength</name>
<value>10</value>
<description>The number of blocks in a file to be combined into a single raid parity block. The default value is 5. The lower the number the greater is the disk space you will save when you enable raid.</description>
</property>
<property>
<name>raid.har.partfile.size</name>
<value>4294967296</value>
<description>The size of HAR part files that store raid parity files. The default is 4GB. The higher the number the fewer the number of files used to store the HAR archive.</description>
</property>
<property>
<name>dfs.block.replicator.classname</name>
<value>org.apache.hadoop.hdfs.server.namenode.BlockPlacementPolicyDefault</value>
<description>The name of the class which specifies how to place blocks in HDFS. The class BlockPlacementPolicyRaid will try to avoid co-located replicas of the same stripe. This will greatly reduce the probability of raid file corruption.</description>
</property>
<property>
<name>raid.mapred.fairscheduler.pool</name>
<value>none</value>
<description>The name of the fair scheduler pool to use.</description>
</property>
<property>
<name>raid.classname</name>
<value>org.apache.hadoop.raid.DistRaidNode</value>
<description>Specify which implementation of RaidNode to use(class name).</description>
</property>
<property>
<name>raid.policy.rescan.interval</name>
<value>5000</value>
<description>Specify the periodicity in milliseconds after whichall source paths are rescanned and parity blocks recomputed if necessary. By default, this value is 1 hour. </description>
</property>
<property>
<name>fs.raid.underlyingfs.impl</name>
<value>org.apache.hadoop.hdfs.DistributedFileSystem</value>
<description>Specify the filesystem that is layered immediately below the DistributedRaidFileSystem. By default, this value is DistributedFileSystem.</description>
</property>
<property>
<name>raid.blockfix.classname</name>
<value>org.apache.hadoop.raid.LocalBlockFixer</value>
<description>Specify the BlockFixer implementation to use.The default is org.apache.hadoop.raid.DistBlockFixer.</description>
</property>
</configuration>
4. core-site配置
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://192.168.5.208:9000</value>
</property>
</configuration>
5. mapred.site配置
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>hdfs://192.168.5.208:9001</value>
</property>
</configuration>
5. 运行跑起
./bin/hadoop org.apache.hadoop.raid.RaidNode
错误
- 有可能在跑起的时候老是连结不上,这样的话,我是因为有的配置配置的是localhost,有的配置的是ip,但是这样就不行,要配就全陪localhost,要么就全配置ip