1:下载
本来想传到csdn上的,但是最大只支持70M的文件,还是网上找吧
2:解压在C:\cygwin\hadoop
3:配置
hadoop/conf下需要配置的文件:
1.hadoop-env.sh
修改JDK的路径 export JAVA_HOME
# Set Hadoop-specific environment variables here.
# The only required environment variable is JAVA_HOME. All others are
# optional. When running a distributed configuration it is best to
# set JAVA_HOME in this file, so that it is correctly defined on
# remote nodes.
# The java implementation to use. Required.
export JAVA_HOME=/cygdrive/d/java/Tomcat6/jdk
# Extra Java CLASSPATH elements. Optional.
# export HADOOP_CLASSPATH=
2.core-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
3.hdfs-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
<description>
The actual number of replications can be specified when the file is created.
</description>
</property>
</configuration>
<value>1</value> 该变量意思是文件系统中文件的复本数量。在单独的一个数据节点上运行时,HDFS无法将块复制到三个数据节点上。
4.mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
<description>NameNode</description>
</property>
</configuration>
4:启动Hadoop
第一步,在hadoop目录下创建目录logs,用于保存日志
第二步,Format a new distributed-filesystem: 格式化namenode,创建HDFS
执行命令: bin/hadoop namenode -format
如果有错误请参考:org.apache.hadoop.util.PlatformName //cgywin下Hadoop-0.21.0 错误问题
lenovo@lenovo-PC /hadoop
$ bin/hadoop namenode -format
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
[INFO ][mgmnt ] Local JMX connector started
13/07/02 09:43:50 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = lenovo-PC/10.6.3.180
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 0.21.0
STARTUP_MSG: classpath = C:\cygwin\hadoop\conf;D:\java\Tomcat6\jdk\lib\tools.jar;C:\cygwin\hadoop\;C:\cygwin\hadoop\hadoop-common-0.21.0.jar;C:\cygwin\hadoop\......
STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.21 -r 985326; compiled by 'tomwhite' on Tue Aug 17 01:02:28 EDT 2010
************************************************************/
13/07/02 09:43:51 INFO namenode.FSNamesystem: defaultReplication = 1
13/07/02 09:43:51 INFO namenode.FSNamesystem: maxReplication = 512
13/07/02 09:43:51 INFO namenode.FSNamesystem: minReplication = 1
13/07/02 09:43:51 INFO namenode.FSNamesystem: maxReplicationStreams = 2
13/07/02 09:43:51 INFO namenode.FSNamesystem: shouldCheckForEnoughRacks = false
13/07/02 09:43:51 INFO security.Groups: Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; cacheTimeout=300000
13/07/02 09:43:51 INFO namenode.FSNamesystem: fsOwner=lenovo
13/07/02 09:43:51 INFO namenode.FSNamesystem: supergroup=supergroup
13/07/02 09:43:51 INFO namenode.FSNamesystem: isPermissionEnabled=true
13/07/02 09:43:51 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
13/07/02 09:43:51 INFO common.Storage: Image file of size 112 saved in 0 seconds .
13/07/02 09:43:52 INFO common.Storage: Storage directory \tmp\hadoop-lenovo\dfs\name has been successfully formatted.
13/07/02 09:43:52 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at lenovo-PC/10.6.3.180
************************************************************/
第三步,启动Hadoop,执行命令: bin/start-all.sh
lenovo@lenovo-PC /hadoop
$ bin/start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-mapred.sh
starting namenode, logging to C:\cygwin\hadoop\logs/hadoop-lenovo-namenode-lenovo-PC.out
lenovo@localhost's password:
localhost: starting datanode, logging to C:\cygwin\hadoop\logs/hadoop-lenovo-datanode-lenovo-PC.out
lenovo@localhost's password:
localhost: starting secondarynamenode, logging to C:\cygwin\hadoop\logs/hadoop-lenovo-secondarynamenode-lenovo- PC.out
starting jobtracker, logging to C:\cygwin\hadoop\logs/hadoop-lenovo-jobtracker-lenovo-PC.out
lenovo@localhost's password:
localhost: starting tasktracker, logging to C:\cygwin\hadoop\logs/hadoop-lenovo-tasktracker-lenovo-PC.out
还需要输入3次密码,运行5个JVM进程,查看进程:
lenovo@lenovo-PC /hadoop
$ ps
PID PPID PGID WINPID TTY UID STIME COMMAND
29756 1 30500 30576 ? 1000 10:02:19 /cygdrive/d/java/Tomcat6/jdk/bin/java
28184 1 21636 26008 pty0 1000 10:01:52 /cygdrive/d/java/Tomcat6/jdk/bin/java
23856 1 21636 22664 pty0 1000 10:01:11 /cygdrive/d/java/Tomcat6/jdk/bin/java
31320 4868 31320 31332 pty0 1000 10:03:38 /usr/bin/ps
27224 1 19108 27980 ? 1000 10:01:49 /cygdrive/d/java/Tomcat6/jdk/bin/java
25556 1 24812 26308 ? 1000 10:01:27 /cygdrive/d/java/Tomcat6/jdk/bin/java
8732 1 8732 8732 ? 1000 09:12:27 /usr/bin/mintty
4868 8732 4868 9376 pty0 1000 09:12:27 /usr/bin/bash
运行成功。
命令:
1):bin/hadoop fs -copyFromLocal local/hibernate.rar /user/pdf/hibernate.rar
上传本地文件到HDFS
lenovo@lenovo-PC /hadoop
$ bin/hadoop fs -copyFromLocal local/hibernate.rar /user/pdf/hibernate.rar
13/07/02 10:15:06 INFO security.Groups: Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; cacheTimeout=300000
13/07/02 10:15:06 WARN conf.Configuration: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
2):查询上传的文件
lenovo@lenovo-PC /hadoop
$ bin/hadoop fs -ls /user/pdf
13/07/02 15:57:26 INFO security.Groups: Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; cacheTimeout=300000
13/07/02 15:57:26 WARN conf.Configuration: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
Found 1 items
-rw-r--r-- 1 lenovo supergroup 3375455 2013-07-02 10:15 /user/pdf/hibernate.rar
也可以在web中查询:
hadoop默认的NameNode 和 JobTracker
- NameNode - http://localhost:50070/
- JobTracker - http://localhost:50030/