关于大数据学习的最好的网站就是官网http://www.apache.org/
免责声明:很多资料都是网上一步步搜集到的,感谢各位前辈默默无闻的奉献与付出,资料过多,不一一感谢,如果侵权,请及时联系作者本人或者投诉至平台,我会第一时间删除,纯分享。
hadoop2.8.1 伪分布式环境搭建;
准备工具:
hadoop2.8.1 安装包
前提工作:
已经搭建好了java环境变量,请查阅https://mp.csdn.net/postedit/90669180
准备目录:
mkdir -p /opt/module
1.解压hadoop-2.8.1.tar.gz
tar -xzvf hadoop-2.8.1.tar.gz
2. mv hadoop-2.8.1 /opt/module
3. cd /opt/module/hadoop-2.8.1
4. 配置SSH 信任关系
$ ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
$ chmod 0600 ~/.ssh/authorized_keys
5. cd /etc/hadoop/
6. vi core-site.xml
添加默认的文件系统
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
7. vi hdfs-site.xml
配置副本放置策略s
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
8. vi hadoop-env.sh
修改 export JAVA_HOME=/usr/java/jdk(你的java绝对路径)
9. 将hadoop添加到环境变量中
vi /etc/profile
最后添加
export HADOOP_HOME=/opt/module/hadoop-2.8.1
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
10.开始格式化文件系统
hdfs namenode -format
我这边会显示报错java.net.UnknownHostException: hadoop: hadoop: unknown error
显示主机名hadoop无法识别
解决方式:
修改主机的hosts文件
vi /etc/hosts
添加 127.0.0.1 hadoop
重启机器 reboot
11. 重新格式化
成功
12. 开始启动hdfs
start-dfs.sh
13. 命令jps
查看启动的java进程
2993 Jps
2153 NameNode
2281 DataNode
2443 SecondaryNameNode
hdfs的安装以及完成,后面开始配置mapreduce 和yarn
参考博客连接:
https://blog.csdn.net/u012343297/article/details/79978526
采用系统默认的wordcount 运行程序
hadoop jar /opt/module/hadoop-2.8.1/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.8.1.jar wordcount /1.txt /output
1.txt 文件里面存放了数据 例如aaaa f f s d ad a das da da d ad a d 这种数据
/output 必须是一个不存在的目录
执行过程:
19/05/30 00:29:12 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
19/05/30 00:29:13 INFO input.FileInputFormat: Total input files to process : 1
19/05/30 00:29:13 INFO mapreduce.JobSubmitter: number of splits:1
19/05/30 00:29:13 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1559144078198_0001
19/05/30 00:29:14 INFO impl.YarnClientImpl: Submitted application application_1559144078198_0001
19/05/30 00:29:14 INFO mapreduce.Job: The url to track the job: http://localhost:8088/proxy/application_1559144078198_0001/
19/05/30 00:29:14 INFO mapreduce.Job: Running job: job_1559144078198_0001
19/05/30 00:29:24 INFO mapreduce.Job: Job job_1559144078198_0001 running in uber mode : false
19/05/30 00:29:24 INFO mapreduce.Job: map 0% reduce 0%
19/05/30 00:29:32 INFO mapreduce.Job: map 100% reduce 0%
19/05/30 00:29:38 INFO mapreduce.Job: map 100% reduce 100%
19/05/30 00:29:39 INFO mapreduce.Job: Job job_1559144078198_0001 completed successfully
19/05/30 00:29:39 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=55
FILE: Number of bytes written=272557
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=122
HDFS: Number of bytes written=25
HDFS: Number of read operations=6
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=4760
Total time spent by all reduces in occupied slots (ms)=3669
Total time spent by all map tasks (ms)=4760
Total time spent by all reduce tasks (ms)=3669
Total vcore-milliseconds taken by all map tasks=4760
Total vcore-milliseconds taken by all reduce tasks=3669
Total megabyte-milliseconds taken by all map tasks=4874240
Total megabyte-milliseconds taken by all reduce tasks=3757056
Map-Reduce Framework
Map input records=15
Map output records=14
Map output bytes=85
Map output materialized bytes=55
Input split bytes=92
Combine input records=14
Combine output records=6
Reduce input groups=6
Reduce shuffle bytes=55
Reduce input records=6
Reduce output records=6
Spilled Records=12
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=216
CPU time spent (ms)=1230
Physical memory (bytes) snapshot=426430464
Virtual memory (bytes) snapshot=4178669568
Total committed heap usage (bytes)=280494080
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=30
File Output Format Counters
Bytes Written=25
查看结果
[root@hadoop opt]# hdfs dfs -lsr /
lsr: DEPRECATED: Please use 'ls -R' instead.
-rw-r--r-- 1 root supergroup 30 2019-05-30 00:25 /1.txt
drwxr-xr-x - root supergroup 0 2019-05-30 00:29 /output
-rw-r--r-- 1 root supergroup 0 2019-05-30 00:29 /output/_SUCCESS
-rw-r--r-- 1 root supergroup 25 2019-05-30 00:29 /output/part-r-00000
drwx------ - root supergroup 0 2019-05-30 00:29 /tmp
drwx------ - root supergroup 0 2019-05-30 00:29 /tmp/hadoop-yarn
drwx------ - root supergroup 0 2019-05-30 00:29 /tmp/hadoop-yarn/staging
drwxr-xr-x - root supergroup 0 2019-05-30 00:29 /tmp/hadoop-yarn/staging/history
drwxrwxrwt - root supergroup 0 2019-05-30 00:29 /tmp/hadoop-yarn/staging/history/done_intermediate
drwxrwx--- - root supergroup 0 2019-05-30 00:29 /tmp/hadoop-yarn/staging/history/done_intermediate/root
-rwxrwx--- 1 root supergroup 33561 2019-05-30 00:29 /tmp/hadoop-yarn/staging/history/done_intermediate/root/job_1559144078198_0001-1559147353861-root-word+count-1559147377370-1-1-SUCCEEDED-default-1559147363243.jhist
-rwxrwx--- 1 root supergroup 347 2019-05-30 00:29 /tmp/hadoop-yarn/staging/history/done_intermediate/root/job_1559144078198_0001.summary
-rwxrwx--- 1 root supergroup 134620 2019-05-30 00:29 /tmp/hadoop-yarn/staging/history/done_intermediate/root/job_1559144078198_0001_conf.xml
drwx------ - root supergroup 0 2019-05-30 00:29 /tmp/hadoop-yarn/staging/root
drwx------ - root supergroup 0 2019-05-30 00:29 /tmp/hadoop-yarn/staging/root/.staging
查看运行结果
[root@hadoop opt]# hdfs dfs -cat /output/part-r-00000
a 3
b 3
c 3
d 2
qa 1
v 2