1.hdfs的三个进程以ruozedata001机器名称启动
[root@ruozedata001 ~]# ifconfig
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 172.29.171.249 netmask 255.255.240.0 broadcast 172.29.175.255
ether 00:16:3e:21:6f:f8 txqueuelen 1000 (Ethernet)
RX packets 4722034 bytes 1503204221 (1.3 GiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 6980918 bytes 1047089295 (998.5 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
loop txqueuelen 1 (Local Loopback)
RX packets 704934 bytes 521804039 (497.6 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 704934 bytes 521804039 (497.6 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
[root@ruozedata001 ~]#
[root@ruozedata001 ~]# vi /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
172.29.171.249 ruozedata001
namenode 以ruozedata001启动:
[ruoze@ruozedata001 hadoop]$ cat core-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://ruozedata001:9000</value>
</property>
</configuration>
datanode 以ruozedata001启动:
[ruoze@ruozedata001 hadoop]$ vi workers
ruozedata001
secondarynamenode 以ruozedata001启动:
[ruoze@ruozedata001 hadoop]$
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>ruozedata001:9868</value>
</property>
<property>
<name>dfs.namenode.secondary.https-address</name>
<value>ruozedata001:9869</value>
</property>
</configuration>
2./tmp目录 pid文件
进程启动会写一个
进程停止从pid文件读取进程号,然后kill -9 进程号
[root@ruozedata001 tmp]# cat hadoop-ruoze-datanode.pid
1822
[root@ruozedata001 tmp]# ps -ef| grep 1822
ruoze 1822 1 1 21:18 ? 00:00:04 /usr/java/jdk1.8.0_121/bin/java -Dproc_datanode -Djava.net.preferIPv4Stack=true -Dhadoop.security.logger=ERROR,RFAS -Dyarn.log.dir=/home/ruoze/app/hadoop-3.2.2/logs -Dyarn.log.file=hadoop-ruoze-datanode-ruozedata001.log -Dyarn.home.dir=/home/ruoze/app/hadoop-3.2.2 -Dyarn.root.logger=INFO,console -Djava.library.path=/home/ruoze/app/hadoop-3.2.2/lib/native -Dhadoop.log.dir=/home/ruoze/app/hadoop-3.2.2/logs -Dhadoop.log.file=hadoop-ruoze-datanode-ruozedata001.log -Dhadoop.home.dir=/home/ruoze/app/hadoop-3.2.2 -Dhadoop.id.str=ruoze -Dhadoop.root.logger=INFO,RFA -Dhadoop.policy.file=hadoop-policy.xml org.apache.hadoop.hdfs.server.datanode.DataNode
root 2192 1022 0 21:23 pts/2 00:00:00 grep --color=auto 1822
[root@ruozedata001 tmp]# rm -f hadoop-ruoze-datanode.pid
[root@ruozedata001 tmp]#
[ruoze@ruozedata001 hadoop]$ ps -ef|grep hadoop
root 708 1 0 Nov07 ? 00:00:00 /sbin/dhclient -H hadoop001 -1 -q -lf /var/lib/dhclient/dhclient--eth0.lease -pf /var/run/dhclient-eth0.pid eth0
ruoze 1822 1 1 21:18 ? 00:00:04 /usr/java/jdk1.8.0_121/bin/java -Dproc_datanode -Djava.net.preferIPv4Stack=true -Dhadoop.security.logger=ERROR,RFAS -Dyarn.log.dir=/home/ruoze/app/hadoop-3.2.2/logs -Dyarn.log.file=hadoop-ruoze-datanode-ruozedata001.log -Dyarn.home.dir=/home/ruoze/app/hadoop-3.2.2 -Dyarn.root.logger=INFO,console -Djava.library.path=/home/ruoze/app/hadoop-3.2.2/lib/native -Dhadoop.log.dir=/home/ruoze/app/hadoop-3.2.2/logs -Dhadoop.log.file=hadoop-ruoze-datanode-ruozedata001.log -Dhadoop.home.dir=/home/ruoze/app/hadoop-3.2.2 -Dhadoop.id.str=ruoze -Dhadoop.root.logger=INFO,RFA -Dhadoo.policy.file=hadoop-policy.xml org.apache.hadoop.hdfs.server.datanode.DataNode
ruoze 2165 32344 0 21:19 pts/0 00:00:00 tail -200f hadoop-ruoze-datanode-ruozedata001.log
ruoze 2644 32344 0 21:24 pts/0 00:00:00 grep --color=auto hadoop
[ruoze@ruozedata001 hadoop]$
[ruoze@ruozedata001 hadoop]$
[ruoze@ruozedata001 hadoop]$ sbin/start-dfs.sh
Starting namenodes on [ruozedata001]
Starting datanodes
Starting secondary namenodes [ruozedata001]
[ruoze@ruozedata001 hadoop]$ ps -ef|grep hadoop
root 708 1 0 Nov07 ? 00:00:00 /sbin/dhclient -H hadoop001 -1 -q -lf /var/lib/dhclient/dhclient--eth0.lease -pf /var/run/dhclient-eth0.pid eth0
ruoze 1822 1 1 21:18 ? 00:00:04 /usr/java/jdk1.8.0_121/bin/java -Dproc_datanode -Djava.net.preferIPv4Stack=true -Dhadoop.security.logger=ERROR,RFAS -Dyarn.log.dir=/home/ruoze/app/hadoop-3.2.2/logs -Dyarn.log.file=hadoop-ruoze-datanode-ruozedata001.log -Dyarn.home.dir=/home/ruoze/app/hadoop-3.2.2 -Dyarn.root.logger=INFO,console -Djava.library.path=/home/ruoze/app/hadoop-3.2.2/lib/native -Dhadoop.log.dir=/home/ruoze/app/hadoop-3.2.2/logs -Dhadoop.log.file=hadoop-ruoze-datanode-ruozedata001.log -Dhadoop.home.dir=/home/ruoze/app/hadoop-3.2.2 -Dhadoop.id.str=ruoze -Dhadoop.root.logger=INFO,RFA -Dhadoo.policy.file=hadoop-policy.xml org.apache.hadoop.hdfs.server.datanode.DataNode
ruoze 2165 32344 0 21:19 pts/0 00:00:00 tail -200f hadoop-ruoze-datanode-ruozedata001.log
ruoze 2794 1 23 21:24 ? 00:00:04 /usr/java/jdk1.8.0_121/bin/java -Dproc_namenode -Djava.net.preferIPv4Stack=true -Dhdfs.audit.logger=INFO,NullAppender -Dhadoop.security.logger=INFO,RFAS -Dyarn.log.dir=/home/ruoze/app/hadoop-3.2.2/logs -Dyarn.log.file=hadoop-ruoze-namenode-ruozedata001.log -Dyarn.home.dir=/home/ruoze/app/hadoop-3.2.2 -Dyarn.root.logger=INFO,console -Djava.library.path=/home/ruoze/app/hadoop-3.2.2/lib/native -Dhadoop.log.dir=/home/ruoze/app/hadoop-3.2.2/logs -Dhadoop.log.file=hadoop-ruoze-namenode-ruozedata001.log -Dhadoop.home.dir=/home/ruoze/app/hadoop-3.2.2 -Dhadoop.id.str=ruoze -Dhadoop.root.logger=INFO,RFA -Dhadoop.policy.file=hadoop-policy.xml org.apache.hadoop.hdfs.server.namenode.NameNode
ruoze 3111 1 18 21:25 ? 00:00:02 /usr/java/jdk1.8.0_121/bin/java -Dproc_secondarynamenode -Djava.net.preferIPv4Stack=true -Dhdfs.audit.logger=INFO,NullAppender -Dhadoop.security.logger=INFO,RFAS -Dyarn.log.dir=/home/ruoze/app/hadoop-3.2.2/logs -Dyarn.log.file=hadoop-ruoze-secondarynamenode-ruozedata001.log -Dyarn.home.dir=/home/ruoze/app/hadoop-3.2.2 -Dyarn.root.logger=INFO,console -Djava.library.path=/home/ruoze/app/hadoop-3.2.2/lib/native -Dhadoop.log.dir=/home/ruoze/app/hadoop-3.2.2/logs -Dhadoop.log.file=hadoop-ruoze-secondarynamenode-ruozedata001.log -Dhadoop.home.dir=/home/ruoze/app/hadoop-3.2.2 -Dhadoop.id.str=ruoze -Dhadoop.root.logger=INFO,RFA -Dhadoop.policy.file=hadoop-policy.xml org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode
ruoze 3231 32344 0 21:25 pts/0 00:00:00 grep --color=auto hadoop
[ruoze@ruozedata001 hadoop]$
结果: 晚上维护,你认为更新配置或者更新jar ,DN重启生效了。其实DN压根都没有重启。
3.数据存储目录也在/tmp很危险
[root@ruozedata001 tmp]#
[root@ruozedata001 tmp]# ll hadoop-ruoze/
total 8
drwxrwxr-x 5 ruoze ruoze 4096 Nov 21 10:39 dfs
drwxr-xr-x 3 ruoze ruoze 4096 Nov 21 11:03 mapred
[root@ruozedata001 tmp]# ll hadoop-ruoze/dfs/
total 12
drwx------ 3 ruoze ruoze 4096 Nov 26 21:18 data
drwxrwxr-x 3 ruoze ruoze 4096 Nov 26 21:25 name
drwxrwxr-x 3 ruoze ruoze 4096 Nov 26 21:25 namesecondary
[root@ruozedata001 tmp]#
core-site.xml文件
<property>
<name>hadoop.tmp.dir</name>
<value>/home/ruoze/tmp</value>
</property>
mv --->ln -s ,配置文件还是读取老地方 /tmp ,实际存储在/home/ruoze/tmp
4.pid文件修改
# Where pid files are stored. /tmp by default.
export HADOOP_PID_DIR=/home/ruoze/tmp
5.提醒
NN
DN
dfs.datanode.data.dir : 10块物理磁盘 ,5T
dfs.datanode.data.dir : /data01/dfs/dn,/data02/dfs/dn,/data03/dfs/dn.......
也就是说,一块磁盘写的能力30M/s ,10块磁盘 300M/s
10s 1s
多块磁盘是为了存储空间更大,且高效率的读写IO。 肯定比单块磁盘更快。
所以生产上DN的数据目录参数,必然不能默认使用${hadoop.tmp.dir},需要根据自己实际情况写清楚。
hadoop.tmp.dir /tmp/hadoop-ruoze
hadoop.tmp.dir /home/ruoze/tmp
[ruoze@ruozedata001 tmp]$ pwd
/home/ruoze/tmp
[ruoze@ruozedata001 tmp]$ ll
total 4
drwxrwxr-x 3 ruoze ruoze 4096 Nov 26 22:01 hadoop-ruoze
[ruoze@ruozedata001 tmp]$
<property>
<name>hadoop.tmp.dir</name>
<value>/home/ruoze/tmp/hadoop-${user.name}</value>
</property>
纠正: mv移动目录,从此实验操作来看,数据文件不含有数据目录这块。
namenode NN
secondarynamenode SNN
datanode DN
6.yarn部署
ResourceManager RM
NodeManager NM
[ruoze@ruozedata001 hadoop]$ vi mapred-site.xml
<configuration>
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.application.classpath</name>
<value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*</value>
</property>
</configuration>
[ruoze@ruozedata001 hadoop]$ vi yarn-site.xml
<?xml version="1.0"?>
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.env-whitelist</name>
<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_HOME,PATH,LANG,TZ,HADOOP_MAPRED_HOME</value>
</property>
</configuration>
[ruoze@ruozedata001 hadoop]$ sbin/start-yarn.sh
Starting resourcemanager
Starting nodemanagers
[ruoze@ruozedata001 hadoop]$
在浏览器打开,先去阿里云安全组,放开8088端口号
http://101.132.194.91:8088/cluster
挖矿 中病毒,表现是登录机器 和操作命令很卡,且有个进程占据CPU 100%
1685 root 10 -10 133564 18040 10608 S 500 0.2 891:55.72 AliYunDun
https://segmentfault.com/a/1190000015264170
yarn-site.xml文件:
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>ruozedata001:8123</value>
</property>
http://101.132.194.91:8123/cluster
7.wordcount案例 wc 词频统计
[ruoze@ruozedata001 ~]$
[ruoze@ruozedata001 ~]$ vi .bashrc
# .bashrc
# Source global definitions
if [ -f /etc/bashrc ]; then
. /etc/bashrc
fi
# Uncomment the following line if you don't like systemctl's auto-paging feature:
# export SYSTEMD_PAGER=
export HADOOP_HOME=/home/ruoze/app/hadoop
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
[ruoze@ruozedata001 ~]$ . .bashrc
[ruoze@ruozedata001 ~]$ which hdfs
~/app/hadoop/bin/hdfs
数据准备:
[ruoze@ruozedata001 ~]$ hdfs dfs -mkdir /input
[ruoze@ruozedata001 ~]$
[ruoze@ruozedata001 ~]$ vi 1.log
jepson
ruoze
xingxing
a b c
b a c
jepson
www.ruozedata.com ruoze a b c
[ruoze@ruozedata001 ~]$ ll
total 32
-rw-rw-r-- 1 ruoze ruoze 72 Nov 26 22:32 1.log
drwxrwxr-x 3 ruoze ruoze 4096 Nov 21 09:37 app
drwxrwxr-x 2 ruoze ruoze 4096 Nov 21 09:27 data
drwxrwxr-x 2 ruoze ruoze 4096 Nov 21 09:27 lib
drwxrwxr-x 2 ruoze ruoze 4096 Nov 21 09:27 log
drwxrwxr-x 2 ruoze ruoze 4096 Nov 21 09:31 software
drwxrwxr-x 2 ruoze ruoze 4096 Nov 21 09:27 sourcecode
drwxrwxr-x 3 ruoze ruoze 4096 Nov 26 22:24 tmp
[ruoze@ruozedata001 ~]$ hdfs dfs -put 1.log /input
[ruoze@ruozedata001 ~]$ hdfs dfs -ls /input
Found 1 items
-rw-r--r-- 1 ruoze supergroup 72 2021-11-26 22:32 /input/1.log
[ruoze@ruozedata001 ~]$ hdfs dfs -cat /input/1.log
jepson
ruoze
xingxing
a b c
b a c
jepson
www.ruozedata.com ruoze a b c
[ruoze@ruozedata001 ~]$
[ruoze@ruozedata001 hadoop]$ find ./ -name '*example*'
./libexec/hadoop-layout.sh.example
./lib/native/examples
./share/doc/hadoop/hadoop-yarn/hadoop-yarn-common/apidocs/org/apache/hadoop/yarn/webapp/example
./share/doc/hadoop/api/org/apache/hadoop/security/authentication/examples
./share/doc/hadoop/api/org/apache/hadoop/examples
./share/doc/hadoop/hadoop-mapreduce-examples
./share/doc/hadoop/hadoop-auth-examples
./share/hadoop/yarn/yarn-service-examples
./share/hadoop/mapreduce/sources/hadoop-mapreduce-examples-3.2.2-sources.jar
./share/hadoop/mapreduce/sources/hadoop-mapreduce-examples-3.2.2-test-sources.jar
./share/hadoop/mapreduce/lib-examples
./share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.2.jar 找到
./etc/hadoop/hadoop-user-functions.sh.example
./etc/hadoop/shellprofile.d/example.sh
./etc/hadoop/ssl-server.xml.example
./etc/hadoop/ssl-client.xml.example
[ruoze@ruozedata001 hadoop]$
ion_1637936683364_0002 failed 2 times due to AM Container for appattempt_1637936683364_0002_000002 exited with exitCode: 127
Failing this attempt.Diagnostics: [2021-11-26 22:38:02.615]Exception from container-launch.
Container id: container_1637936683364_0002_02_000001
Exit code: 127
[2021-11-26 22:38:02.617]Container exited with a non-zero exit code 127. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
/bin/bash: /bin/java: No such file or directory
[2021-11-26 22:38:02.618]Container exited with a non-zero exit code 127. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
/bin/bash: /bin/java: No such file or directory
For more detailed output, check the application tracking page: http://ruozedata001:8123/cluster/app/application_1637936683364_0002 Then click on links to logs of each attempt.
. Failing the application.
2021-11-26 22:38:02,863 INFO mapreduce.Job: Counters: 0
做个软连接
[root@ruozedata001 tmp]# ln -s /usr/java/jdk1.8.0_121/bin/java /bin/java
[ruoze@ruozedata001 hadoop]$ yarn jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.2.jar wordcount /input /output1
2021-11-26 22:42:59,054 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
2021-11-26 22:42:59,605 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/ruoze/.staging/job_1637936683364_0004
2021-11-26 22:43:00,260 INFO input.FileInputFormat: Total input files to process : 1
2021-11-26 22:43:01,129 INFO mapreduce.JobSubmitter: number of splits:1 【切片是1 规则 】
2021-11-26 22:43:01,652 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1637936683364_0004
2021-11-26 22:43:01,653 INFO mapreduce.JobSubmitter: Executing with tokens: []
2021-11-26 22:43:01,834 INFO conf.Configuration: resource-types.xml not found
2021-11-26 22:43:01,835 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2021-11-26 22:43:01,902 INFO impl.YarnClientImpl: Submitted application application_1637936683364_0004
2021-11-26 22:43:01,946 INFO mapreduce.Job: The url to track the job: http://ruozedata001:8123/proxy/application_1637936683364_0004/
2021-11-26 22:43:01,947 INFO mapreduce.Job: Running job: job_1637936683364_0004
2021-11-26 22:43:08,067 INFO mapreduce.Job: Job job_1637936683364_0004 running in uber mode : false
2021-11-26 22:43:08,068 INFO mapreduce.Job: map 0% reduce 0%
2021-11-26 22:43:12,127 INFO mapreduce.Job: map 100% reduce 0%
2021-11-26 22:43:17,204 INFO mapreduce.Job: map 100% reduce 100%
2021-11-26 22:43:18,220 INFO mapreduce.Job: Job job_1637936683364_0004 completed successfully
2021-11-26 22:43:18,303 INFO mapreduce.Job: Counters: 54
File System Counters
FILE: Number of bytes read=94
FILE: Number of bytes written=469331
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=173
HDFS: Number of bytes written=60
HDFS: Number of read operations=8
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
HDFS: Number of bytes read erasure-coded=0
Job Counters
Launched map tasks=1 【map 任务 1】
Launched reduce tasks=1 【reduce 任务 1】
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=2101
Total time spent by all reduces in occupied slots (ms)=2322
Total time spent by all map tasks (ms)=2101
Total time spent by all reduce tasks (ms)=2322
Total vcore-milliseconds taken by all map tasks=2101
Total vcore-milliseconds taken by all reduce tasks=2322
Total megabyte-milliseconds taken by all map tasks=2151424
Total megabyte-milliseconds taken by all reduce tasks=2377728
Map-Reduce Framework
Map input records=7
Map output records=15
Map output bytes=131
Map output materialized bytes=94
Input split bytes=101
Combine input records=15
Combine output records=7
Reduce input groups=7
Reduce shuffle bytes=94
Reduce input records=7
Reduce output records=7
Spilled Records=14
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=116
CPU time spent (ms)=1030
Physical memory (bytes) snapshot=541995008
Virtual memory (bytes) snapshot=5572702208
Total committed heap usage (bytes)=467140608
Peak Map Physical memory (bytes)=302088192
Peak Map Virtual memory (bytes)=2783002624
Peak Reduce Physical memory (bytes)=239906816
Peak Reduce Virtual memory (bytes)=2789699584
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=72
File Output Format Counters
Bytes Written=60
[ruoze@ruozedata001 hadoop]$
[ruoze@ruozedata001 hadoop]$ hdfs dfs -cat /output1/part-r-00000
a 3
b 3
c 3
jepson 2
ruoze 2
www.ruozedata.com 1
xingxing 1
[ruoze@ruozedata001 hadoop]$
[ruoze@ruozedata001 hadoop]$ hdfs dfs -cat /input/1.log
jepson
ruoze
xingxing
a b c
b a c
jepson
www.ruozedata.com ruoze a b c
[ruoze@ruozedata001 hadoop]$
第一步map:每一行按空格拆分单词 ,且每个单词赋予默认值为1
(jepson,1)
(ruoze,1)
(xingxing,1)
(a,1) (b,1) (c,1)
(b,1) (a,1) (c,1)
第二步reduce: 按单词维度,统计每个单词出现的次数
a: 1+1=2==> a 2
。。。。。
select 单词,sum(value) from t group by 单词;
https://github.com/apache/hadoop
https://github.com/apache/hadoop/blob/release-3.2.2-RC3/hadoop-mapreduce-project/hadoop-mapreduce-examples/src/main/java/org/apache/hadoop/examples/WordCount.java
spark
flink
顿悟
a 100 --> a 100*10
b 200 --> b 200*10
map 映射 年终奖10倍
reduce 规约 3000块