Hadoop 为0.20.2版本,jdk1.6.0_13
1. ssh无密码验证登陆localhost
保证Linux系统的ssh服务已经启动,并保证能够通过无密码验证登陆本机Linux系统。如果不能保证,可以按照如下的步骤去做:
1)启动命令行窗口,执行命令行:
$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
2)ssh登陆localhost,执行命令行:
$ ssh localhost
第一次登录,会提示你无法建立到127.0.0.1的连接,是否要建立,输入yes即可,下面是能够通过无密码验证登陆的信息:
[root@localhost Hadoop-0.19.2]# ssh localhost
Last login: Sun Aug 1 18:35:37 2010 from 192.168.0.104
2.hadoop配置
在conf文件夹下
1)配置JAVA_HOME
在hadoop-env.sh中,添加 export=$PATH:/jdk or jre 路径
2)配置 mapred
在core-site.xml中,添加如下属性
conf/core-site.xml:
<configuration> |
<property> |
<name>fs.default.name</name> |
<value>hdfs://localhost:9000</value> |
</property> |
</configuration> |
conf/hdfs-site.xml:
<configuration> |
<property> |
<name>dfs.replication</name> |
<value>1</value> |
</property> |
</configuration> |
conf/mapred-site.xml:
<configuration> |
<property> |
<name>mapred.job.tracker</name> |
<value>localhost:9001</value> |
</property> |
</configuration> |
3)配置Hadoop路径:方便在任何目录下运行hadoop
export PATH=$PATH:/hadoop bin 的目录
3测试
1)命令:hadoop version
输出为:
2)格式化namenode
hadoop namenode format
之后要执行 start-all.sh,否则不会生效。会出现以下异常
java.net.ConnectException: Call to localhost/127.0.0.1:9000 failed on connection exception: java.net.ConnectException: Connection refused
at org.apache.hadoop.ipc.Client.wrapException(Client.java:767)
at org.apache.hadoop.ipc.Client.call(Client.java:743)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
at $Proxy0.getProtocolVersion(Unknown Source)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:106)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:207)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:170)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:82)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:95)
at org.apache.hadoop.examples.Grep.run(Grep.java:87)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.examples.Grep.main(Grep.java:93)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:64)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:404)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:304)
at org.apache.hadoop.ipc.Client$Connection.access$1700(Client.java:176)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:860)
at org.apache.hadoop.ipc.Client.call(Client.java:720)
... 27 more
4 执行任务
这个例子把conf目录中的文件拷贝到input目录中,然后在这些文件的内容中匹配指定的短语将每个匹配内容打印到输出文件。输出文件被放在output目录中。
$ mkdir input
$ cp conf/*.xml input
$ bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+'
$ cat output/*
如果运行成功则会看到匹配短语打印出来
但是,在执行第三步是会抛出异常:
org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://localhost:9000/user/admin/input
原因是并没有把input目录上传到hdfs上去。