Hadoop常见问题及解决办法

最新推荐文章于 2021-11-16 17:08:52 发布

iteye_3607

最新推荐文章于 2021-11-16 17:08:52 发布

阅读量799

点赞数

文章标签：大数据 java 操作系统

1：ShuffleError: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out

Answer：

程序里面需要打开多个文件，进行分析，系统一般默认数量是1024，（用ulimit-a可以看到）对于正常使用是够了，但是对于程序来讲，就太少了。

修改办法：

修改2个文件。

/etc/security/limits.conf

vi/etc/security/limits.conf

加上：

* soft nofile 102400

* hard nofile 409600

$cd /etc/pam.d/

$sudo vi login

添加 session required /lib/security/pam_limits.so

针对第一个问题我纠正下答案：

这是reduce预处理阶段shuffle时获取已完成的map的输出失败次数超过上限造成的，上限默认为5。引起此问题的方式可能会有很多种，比如网络连接不正常，连接超时，带宽较差以及端口阻塞等。。。通常框架内网络情况较好是不会出现此错误的。

2：Toomany fetch-failures

Answer:

出现这个问题主要是结点间的连通不够全面。

1) 检查、/etc/hosts

要求本机ip 对应服务器名

要求要包含所有的服务器ip + 服务器名

2) 检查.ssh/authorized_keys

要求包含所有服务器（包括其自身）的publickey

3：处理速度特别的慢出现map很快但是reduce很慢而且反复出现 reduce=0%

Answer:

结合第二点，然后

修改 conf/hadoop-env.sh中的export HADOOP_HEAPSIZE=4000

4：能够启动datanode，但无法访问，也无法结束的错误

在重新格式化一个新的分布式文件时，需要将你NameNode上所配置的dfs.name.dir这一namenode用来存放NameNode持久存储名字空间及事务日志的本地文件系统路径删除，同时将各DataNode上的dfs.data.dir的路径 DataNode存放块数据的本地文件系统路径的目录也删除。如本此配置就是在NameNode上删除/home/hadoop/NameData，在DataNode上删除/home/hadoop/DataNode1和/home/hadoop/DataNode2。这是因为Hadoop在格式化一个新的分布式文件系统时，每个存储的名字空间都对应了建立时间的那个版本（可以查看/home/hadoop/NameData/current目录下的VERSION文件，上面记录了版本信息），在重新格式化新的分布式系统文件时，最好先删除NameData目录。必须删除各DataNode的dfs.data.dir。这样才可以使namedode和datanode记录的信息版本对应。

注意：删除是个很危险的动作，不能确认的情况下不能删除！！做好删除的文件等通通备份！！

5：java.io.IOException:Could not obtain block: blk_194219614024901469_1100file=/user/hive/warehouse/src_20090724_log/src_20090724_log

出现这种情况大多是结点断了，没有连接上。

6：Error:Java heap space

<name>mapred.child.java.opts</name>

</property>

With the right JVMsize in your hadoop-site.xml , you will have to copy this

to all mapred nodesand restart the cluster.

或者：hadoop jar jarfile[main class] -D mapred.child.java.opts=-Xmx1024m

7：解决hadoopOutOfMemoryError问题：

出现这种异常，明显是jvm内存不够得原因，要修改所有的datanode的jvm内存大小。

Java -Xms1024m-Xmx4096m

一般jvm的最大内存使用应该为总内存大小的一半，我们使用的8G内存，所以设置为4096m，这一值可能依旧不是最优的值。

8：Namenodein safe mode

解决方法

bin/hadoop dfsadmin-safemode leave

‍

9：‍reduce exceed 100%

"Reduce TaskProgress shows > 100% when the total size of map outputs (for a

single reducer) ishigh "

造成原因：

在reduce的merge过程中，checkprogress有误差，导致status >100%，在统计过程中就会出现以下错误：java.lang.ArrayIndexOutOfBoundsException: 3

atorg.apache.hadoop.mapred.StatusHttpServer$TaskGraphServlet.getReduceAvarageProgresses(StatusHttpServer.java:228)

atorg.apache.hadoop.mapred.StatusHttpServer$TaskGraphServlet.doGet(StatusHttpServer.java:159)

……

10：java.net.NoRouteToHostException:No route to host

j解决方法：

sudo/etc/init.d/iptables stop

‍11：更改namenode后，在hive中运行select 依旧指向之前的namenode地址

这是因为：When youcreate atable, hive actually stores the location of the table (e.g.

hdfs://ip:port/user/root/...)in the SDS and DBS tables in the metastore . So when I bring up a new clusterthe master has a new IP, but hive's metastore is still pointing to thelocations within the old

cluster. I couldmodify the metastore to update with the new IP everytime I bring up a cluster.But the easier and simpler solution was to just use an elastic IP for themaster

所以要将metastore中的之前出现的namenode地址全部更换为现有的namenode地址

12：YourDataNode is started and you can create directories with bin/hadoop dfs -mkdir,but you get an error message when you try to put files into the HDFS (e.g.,when you run a command like bin/hadoop dfs -put).

解决方法：

Go to the HDFS infoweb page (open your web browser and go tohttp://namenode:dfs_info_port wherenamenode is the hostname of your NameNode and dfs_info_port is the port youchose dfs.info.port; if followed the QuickStart on your personal computer thenthis URL will behttp://localhost:50070).Once at that page click on the number where it tells you how many DataNodes youhave to look at a list of the DataNodes in your cluster.

If it says you haveused 100% of your space, then you need to free up room on local disk(s) of theDataNode(s).

If you are on Windowsthen this number will not be accurate (there is some kind of bug either inCygwin's df.exe or in Windows). Just free up some more space and you should beokay. On one Windows machine we tried the disk had 1GB free but Hadoop reportedthat it was 100% full. Then we freed up another 1GB and then it said that thedisk was 99.15% full and started writing data into the HDFS again. Weencountered this bug on Windows XP SP2.

13：YourDataNodes won't start, and you see something like this in logs/*datanode*:

IncompatiblenamespaceIDs in /tmp/hadoop-ross/dfs/data

原因：

Your HadoopnamespaceID became corrupted. Unfortunately the easiest thing to do reformatthe HDFS.

解决方法：

You need to dosomething like this:

bin/stop-all.sh

rm -Rf/tmp/hadoop-your-username/*

bin/hadoop namenode-format

14：Youcan run Hadoop jobs written in Java (like the grep example), but yourHadoopStreaming jobs (such as the Python example that fetches web page titles)won't work.

原因：

You might have givenonly a relative path to the mapper and reducer programs. The tutorialoriginally just specified relative paths, but absolute paths are required ifyou are running in a real cluster.

解决方法：

Use absolute pathslike this from the tutorial:

bin/hadoop jarcontrib/hadoop-0.15.2-streaming.jar \

-mapper $HOME/proj/hadoop/multifetch.py \

-reducer$HOME/proj/hadoop/reducer.py \

-inputurls/* \

-output titles

15：09/08/3118:25:45 INFO hdfs.DFSClient: Exception in createBlockOutputStreamjava.io.IOException:Bad connect ack with firstBadLink 192.168.1.11:50010

> 09/08/31 18:25:45INFO hdfs.DFSClient: Abandoning block blk_-8575812198227241296_1001

> 09/08/31 18:25:51INFO hdfs.DFSClient: Exception in createBlockOutputStream java.io.IOException:

Bad connect ack withfirstBadLink 192.168.1.16:50010

……

to create new block.

> atorg.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2731)

> atorg.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1996)

> atorg.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2182)

> 09/08/31 18:26:09WARN hdfs.DFSClient: Error Recovery for block blk_7193173823538206978_1001

bad datanode[2] nodes== null

> 09/08/31 18:26:09WARN hdfs.DFSClient: Could not get block locations. Source file"/user/umer/8GB_input"

- Aborting...

> put: Bad connectack with firstBadLink 192.168.1.16:50010

解决方法：

1)'/etc/init.d/iptables stop' -->stopped firewall

2) SELINUX=disabled in'/etc/selinux/config' file.-->disabled selinux

I worked for me afterthese two changes

16：解决jline.ConsoleReader.readLine在Windows上不生效问题方法

在CliDriver.java的main()函数中，有一条语句reader.readLine，用来读取标准输入，但在Windows平台上该语句总是返回null，这个reader是一个实例jline.ConsoleReader实例，给WindowsEclipse调试带来不便。

我们可以通过使用java.util.Scanner.Scanner来替代它，将原来的

while((line=reader.readLine(curPrompt+"> ")) != null)

复制代码

替换为：

Scanner sc = newScanner(System.in);

while((line=sc.nextLine()) != null)

复制代码

重新编译发布，即可正常从标准输入读取输入的SQL语句了。

17：IO写操作出现问题

0-1246359584298,infoPort=50075, ipcPort=50020):Got exception while servingblk_-5911099437886836280_1292 to /172.16.100.165:

java.net.SocketTimeoutException:480000 millis timeout while waiting for channel to be ready for write. ch :java.nio.channels.SocketChannel[connected local=/

172.16.100.165:50010remote=/172.16.100.165:50930]

atorg.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:185)

atorg.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159)

……

It seems there aremany reasons that it can timeout, the example given in

HADOOP-3831 is a slowreading client.

解决办法：在hadoop-site.xml中设置dfs.datanode.socket.write.timeout=0试试；

My understanding isthat this issue should be fixed in Hadoop 0.19.1 so that

we should leave thestandard timeout. However until then this can help

resolve issues likethe one you're seeing.

18：‍status of 255 error

错误类型：

java.io.IOException:Task process exit with nonzero status of 255.

atorg.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:424)

错误原因：

Setmapred.jobtracker.retirejob.interval and mapred.userlog.retain.hours to highervalue. By default, their values are 24 hours. These might be the reason forfailure, though I'm not sure

‍restart 单个datanode

如果一个datanode出现问题，解决之后需要重新加入cluster而不重启cluster，方法如下：

bin/hadoop-daemon.shstart datanode

bin/hadoop-daemon.shstart jobtracker

Hadoop添加节点的方法

自己实际添加节点过程：

1.先在slave上配置好环境，包括ssh，jdk，相关config，lib，bin等的拷贝；

2.将新的datanode的host加到集群namenode及其他datanode中去；

3.将新的datanode的ip加到master的conf/slaves中；

4.重启cluster,在cluster中看到新的datanode节点；

5.运行bin/start-balancer.sh，这个会很耗时间

备注：

1.如果不balance，那么cluster会把新的数据都存放在新的node上，这样会降低mr的工作效率；

2.也可调用bin/start-balancer.sh 命令执行，也可加参数 -threshold 5

threshold是平衡阈值，默认是10%，值越低各节点越平衡，但消耗时间也更长。

3. balancer也可以在有mrjob的cluster上运行，默认dfs.balance.bandwidthPerSec很低，为1M/s。在没有mrjob时，可以提高该设置加快负载均衡时间。

其他备注：

1.必须确保slave的firewall已关闭;

2.确保新的slave的ip已经添加到master及其他slaves的/etc/hosts中，反之也要将master及其他slave的ip添加到新的slave的/etc/hosts中

mapper及reducer个数

url地址： http://wiki.apache.org/hadoop/HowManyMapsAndReduces

mapper个数的设置：跟inputfile有关系，也跟filesplits有关系，filesplits的上线为dfs.block.size，下线可以通过mapred.min.split.size设置，最后还是由InputFormat决定。

较好的建议：

The right number ofreduces seems to be 0.95 or 1.75 multiplied by (<no. of nodes> *mapred.tasktracker.reduce.tasks.maximum).increasing the number of reducesincreases the framework overhead, but increases load balancing and lowers thecost of failures.

<name>mapred.tasktracker.reduce.tasks.maximum</name>

<description>The maximum number of reduce tasks that will be run

simultaneously by a task tracker.

</description>

</property>

单个node新加硬盘

1.修改需要新加硬盘的node的dfs.data.dir，用逗号分隔新、旧文件目录

2.重启dfs

同步hadoop代码

hadoop-env.sh

# host:path wherehadoop code should be rsync'd from. Unset by default.

# exportHADOOP_MASTER=master:/home/$USER/src/hadoop

用命令合并HDFS小文件

hadoop fs -getmerge<src> <dest>

重启reducejob方法

Introduced recovery ofjobs when JobTracker restarts. This facility is off by default.

Introduced configparameters "mapred.jobtracker.restart.recover","mapred.jobtracker.job.history.block.size", and"mapred.jobtracker.job.history.buffer.size".

还未验证过。

HDFS退服节点的方法

目前版本的dfsadmin的帮助信息是没写清楚的，已经file了一个bug了，正确的方法如下：

1. 将 dfs.hosts 置为当前的slaves，文件名用完整路径，注意，列表中的节点主机名要用大名，即 uname -n 可以得到的那个。

2. 将 slaves中要被退服的节点的全名列表放在另一个文件里，如 slaves.ex，使用 dfs.host.exclude 参数指向这个文件的完整路径

3. 运行命令 bin/hadoopdfsadmin -refreshNodes

4. web界面或 bin/hadoopdfsadmin -report 可以看到退服节点的状态是 Decomission in progress，直到需要复制的数据复制完成为止

5. 完成之后，从 slaves 里（指dfs.hosts 指向的文件）去掉已经退服的节点

附带说一下 -refreshNodes命令的另外三种用途：

2. 添加允许的节点到列表中（添加主机名到dfs.hosts 里来）

3. 直接去掉节点，不做数据副本备份（在dfs.hosts 里去掉主机名）

4. 退服的逆操作——停止 exclude里面和 dfs.hosts 里面都有的，正在进行 decomission 的节点的退服，也就是把 Decomission in progress的节点重新变为 Normal （在 web 界面叫 in service)

distributecache使用

类似一个全局变量，但是由于这个变量较大，所以不能设置在config文件中，转而使用distributecache

具体使用方法：(详见《thedefinitive guide》,P240)

1.在命令行调用时：调用-files，引入需要查询的文件(可以是local file, HDFS file(使用hdfs://xxx?)), 或者-archives (JAR,ZIP, tar等)

% hadoop jar job.jarMaxTemperatureByStationNameUsingDistributedCacheFile \

-filesinput/ncdc/metadata/stations-fixed-width.txt input/ncdc/all output

2. 程序中调用：

public voidconfigure(JobConf conf) {

metadata= new NcdcStationMetadata();

try {

metadata.initialize(newFile("stations-fixed-width.txt"));

} catch(IOException e) {

throw new RuntimeException(e);

}

另外一种间接的使用方法：在hadoop-0.19.0中好像没有

调用addCacheFile()或者addCacheArchive()添加文件，

使用getLocalCacheFiles()或 getLocalCacheArchives() 获得文件

hadoop的job显示web

There are web-basedinterfaces to both the JobTracker (MapReduce master) and NameNode (HDFS master)which display status pages about the state of the entire system. By default,these are located at [WWW]http://job.tracker.addr:50030/and [WWW]http://name.node.addr:50070/.

hadoop监控

OnlyXP(52388483)131702

用nagios作告警，ganglia作监控图表即可

splitsize

FileInputFormat inputsplits: (详见《the definitive guide》P190)

mapred.min.split.size:default=1, the smallest valide size in bytes for a file split.

mapred.max.split.size:default=Long.MAX_VALUE, the largest valid size.

dfs.block.size:default = 64M, 系统中设置为128M。

如果设置 minimum splitsize > block size, 会增加块的数量。(猜想从其他节点拿去数据的时候，会合并block，导致block数量增多)

如果设置maximum split size< block size, 会进一步拆分block。

split size =max(minimumSize, min(maximumSize, blockSize));

其中 minimumSize <blockSize < maximumSize.

sortby value

hadoop 不提供直接的sort byvalue方法，因为这样会降低mapreduce性能。

但可以用组合的办法来实现，具体实现方法见《thedefinitive guide》, P250

基本思想：

1. 组合key/value作为新的key；

2. 重载partitioner，根据oldkey来分割；

conf.setPartitionerClass(FirstPartitioner.class);

3.自定义keyComparator：先根据old key排序，再根据old value排序；

conf.setOutputKeyComparatorClass(KeyComparator.class);

4. 重载GroupComparator,也根据old key 来组合；conf.setOutputValueGroupingComparator(GroupComparator.class);

smallinput files的处理

对于一系列的smallfiles作为input file，会降低hadoop效率。

有3种方法可以将smallfile合并处理：

1. 将一系列的smallfiles合并成一个sequneceFile，加快mapreduce速度。

详见WholeFileInputFormat及SmallFilesToSequenceFileConverter,《thedefinitive guide》, P194

2.使用CombineFileInputFormat集成FileinputFormat，但是未实现过；

3. 使用hadooparchives(类似打包)，减少小文件在namenode中的metadata内存消耗。(这个方法不一定可行，所以不建议使用)

方法：

将/my/files目录及其子目录归档成files.har，然后放在/my目录下

bin/hadoop archive-archiveName files.har /my/files /my

查看files in thearchive:

bin/hadoop fs -lsrhar://my/files.har

skipbad records

JobConf conf = newJobConf(ProductMR.class);

conf.setJobName("ProductMR");

conf.setOutputKeyClass(Text.class);

conf.setOutputValueClass(Product.class);

conf.setMapperClass(Map.class);

conf.setReducerClass(Reduce.class);

conf.setMapOutputCompressorClass(DefaultCodec.class);

conf.setInputFormat(SequenceFileInputFormat.class);

conf.setOutputFormat(SequenceFileOutputFormat.class);

String objpath ="abc1";

SequenceFileInputFormat.addInputPath(conf,new Path(objpath));

SkipBadRecords.setMapperMaxSkipRecords(conf,Long.MAX_VALUE);

SkipBadRecords.setAttemptsToStartSkipping(conf,0);

SkipBadRecords.setSkipOutputPath(conf,new Path("data/product/skip/"));

String output ="abc";

SequenceFileOutputFormat.setOutputPath(conf,new Path(output));

JobClient.runJob(conf);

For skipping failedtasks try : mapred.max.map.failures.percent

counters

3中counters：

1. built-in counters:Map input bytes, Map output records...

2. enum counters

调用方式：

enumTemperature {

MISSING,

MALFORMED

}

reporter.incrCounter(Temperature.MISSING,1)

结果显示：

09/04/20 06:33:36 INFOmapred.JobClient: Air Temperature Recor

09/04/20 06:33:36 INFOmapred.JobClient: Malformed=3

09/04/20 06:33:36 INFOmapred.JobClient: Missing=66136856

3. dynamic countes:

调用方式：

reporter.incrCounter("TemperatureQuality",parser.getQuality(),1);

结果显示：

09/04/20 06:33:36 INFOmapred.JobClient: TemperatureQuality

09/04/20 06:33:36 INFOmapred.JobClient: 2=1246032

09/04/20 06:33:36 INFOmapred.JobClient: 1=973422173

09/04/20 06:33:36 INFOmapred.JobClient: 0=1

‍

Windows eclispe调试hive报does not have a scheme错误可能原因

1、Hive配置文件中的“hive.metastore.local”配置项值为false，需要将它修改为true，因为是单机版

2、没有设置HIVE_HOME环境变量，或设置错误

3、 “doesnot have a scheme”很可能是因为找不到“hive-default.xml”。使用Eclipse调试Hive时，遇到找不到hive-default.xml的解决方法：http://bbs.hadoopor.com/thread-292-1-1.html

1、中文问题

从url中解析出中文,但hadoop中打印出来仍是乱码?我们曾经以为hadoop是不支持中文的，后来经过查看源代码，发现hadoop仅仅是不支持以gbk格式输出中文而己。

这是TextOutputFormat.class中的代码，hadoop默认的输出都是继承自FileOutputFormat来的，FileOutputFormat的两个子类一个是基于二进制流的输出，一个就是基于文本的输出TextOutputFormat。

public classTextOutputFormat<K, V> extends FileOutputFormat<K, V> {

protectedstatic class LineRecordWriter<K, V>

implementsRecordWriter<K, V> {

privatestatic final String utf8 = “UTF-8″;//这里被写死成了utf-8

private static finalbyte[] newline;

static {

try {

newline = “\n”.getBytes(utf8);

} catch(UnsupportedEncodingException uee) {

throw new IllegalArgumentException(”can’t find ” + utf8 + ”encoding”);

}

…

publicLineRecordWriter(DataOutputStream out, String keyValueSeparator) {

this.out= out;

try {

this.keyValueSeparator = keyValueSeparator.getBytes(utf8);

} catch(UnsupportedEncodingException uee) {

throw new IllegalArgumentException(”can’t find ” + utf8 + ”encoding”);

}

…

private voidwriteObject(Object o) throws IOException {

if (oinstanceof Text) {

Text to = (Text) o;

out.write(to.getBytes(), 0, to.getLength());//这里也需要修改

} else {

out.write(o.toString().getBytes(utf8));

}

…

}

可以看出hadoop默认的输出写死为utf-8，因此如果decode中文正确，那么将Linux客户端的character设为utf-8是可以看到中文的。因为hadoop用utf-8的格式输出了中文。

因为大多数数据库是用gbk来定义字段的，如果想让hadoop用gbk格式输出中文以兼容数据库怎么办？

我们可以定义一个新的类：

public classGbkOutputFormat<K, V> extends FileOutputFormat<K, V> {

protectedstatic class LineRecordWriter<K, V>

implementsRecordWriter<K, V> {

//写成gbk即可

private static finalString gbk = “gbk”;

private static finalbyte[] newline;

static {

try {

newline = “\n”.getBytes(gbk);

} catch(UnsupportedEncodingException uee) {

throw new IllegalArgumentException(”can’t find ” + gbk + ”encoding”);

}

…

publicLineRecordWriter(DataOutputStream out, String keyValueSeparator) {

this.out= out;

try {

this.keyValueSeparator = keyValueSeparator.getBytes(gbk);

} catch(UnsupportedEncodingException uee) {

throw new IllegalArgumentException(”can’t find ” + gbk + ”encoding”);

}

…

private voidwriteObject(Object o) throws IOException {

if (oinstanceof Text) {

// Text to = (Text) o;

// out.write(to.getBytes(), 0, to.getLength());

// } else{

out.write(o.toString().getBytes(gbk));

}

…

}

然后在mapreduce代码中加入conf1.setOutputFormat(GbkOutputFormat.class)

即可以gbk格式输出中文。

2、某次正常运行mapreduce实例时,抛出错误

java.io.IOException:All datanodes xxx.xxx.xxx.xxx:xxx are bad. Aborting…

atorg.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2158)

atorg.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400(DFSClient.java:1735)

atorg.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1889)

java.io.IOException:Could not get block locations. Aborting…

atorg.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2143)

atorg.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400(DFSClient.java:1735)

atorg.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1889)

经查明，问题原因是linux机器打开了过多的文件导致。用命令ulimit-n可以发现linux默认的文件打开数目为1024，修改/ect/security/limit.conf，增加hadoop soft 65535

再重新运行程序（最好所有的datanode都修改），问题解决

3、运行一段时间后hadoop不能stop-all.sh的问题，显示报错

no tasktracker to stop，no datanode to stop

问题的原因是hadoop在stop的时候依据的是datanode上的mapred和dfs进程号。而默认的进程号保存在/tmp下，linux默认会每隔一段时间（一般是一个月或者7天左右）去删除这个目录下的文件。因此删掉hadoop-hadoop-jobtracker.pid和hadoop-hadoop-namenode.pid两个文件后，namenode自然就找不到datanode上的这两个进程了。

在配置文件中的exportHADOOP_PID_DIR可以解决这个问题

问题：

IncompatiblenamespaceIDs in /usr/local/hadoop/dfs/data: namenode namespaceID =405233244966; datanode namespaceID = 33333244

原因：

在每次执行hadoop namenode-format时，都会为NameNode生成namespaceID,，但是在hadoop.tmp.dir目录下的DataNode还是保留上次的namespaceID，因为namespaceID的不一致，而导致DataNode无法启动，所以只要在每次执行hadoop namenode-format之前，先删除hadoop.tmp.dir目录就可以启动成功。请注意是删除hadoop.tmp.dir对应的本地目录，而不是HDFS 目录。

Problem:Storage directory not exist

2010-02-0921:37:53,203 INFO org.apache.hadoop.hdfs.server.common.Storage: Storagedirectory D:\hadoop\run\dfs_name_dir does not exist.

2010-02-0921:37:53,203 ERROR org.apache.hadoop.hdfs.server.namenode.FSNamesystem:FSNamesystem initialization failed.

org.apache.hadoop.hdfs.server.common.InconsistentFSStateException:Directory D:\hadoop\run\dfs_name_dir is in an inconsistent state: storagedirectory does not exist or is not accessible.

solution:是因为存储目录D:\hadoop\run\dfs_name_dir不存在，所以只需要手动创建好这个目录即可。

Problem:NameNode is not formatted

solution:是因为HDFS还没有格式化，只需要运行hadoop namenode -format一下，然后再启动即可

bin/hadoopjps后报如下异常：

Exception in thread"main" java.lang.NullPointerException

atsun.jvmstat.perfdata.monitor.protocol.local.LocalVmManager.activeVms(LocalVmManager.java:127)

atsun.jvmstat.perfdata.monitor.protocol.local.MonitoredHostProvider.activeVms(MonitoredHostProvider.java:133)

at sun.tools.jps.Jps.main(Jps.java:45)

原因为：

系统根目录/tmp文件夹被删除了。重新建立/tmp文件夹即可。

bin/hive中出现unable to create log directory /tmp/...也可能是这个原因

iteye_3607

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Hadoop常见问题及解决办法

1：ShuffleError: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-outAnswer：程序里面需要打开多个文件，进行分析，系统一般默认数量是1024，（用ulimit-a可以看到）对于正常使用是够了，但是对于程序来讲，就太少了。修改办法：修改2个文件。 /etc/security/limits.confvi/etc...
复制链接

扫一扫