一、原始文件状况
1.一、文件数bash
[hadoop@emr-worker-10 result]$ ll room54000-htm_2018-*
-rw-rw-r-- 1 hadoop hadoop 2 Oct 9 18:59 room54000-htm_2018-09-30-00.txt
-rw-rw-r-- 1 hadoop hadoop 2 Oct 9 18:59 room54000-htm_2018-09-30-01.txt
-rw-rw-r-- 1 hadoop hadoop 2 Oct 9 19:00 room54000-htm_2018-09-30-02.txt
-rw-rw-r-- 1 hadoop hadoop 2 Oct 9 18:59 room54000-htm_2018-09-30-03.txt
-rw-rw-r-- 1 hadoop hadoop 2 Oct 9 19:00 room54000-htm_2018-09-30-04.txt
-rw-rw-r-- 1 hadoop hadoop 2 Oct 9 19:02 room54000-htm_2018-09-30-05.txt
-rw-rw-r-- 1 hadoop hadoop 2 Oct 9 19:02 room54000-htm_2018-09-30-06.txt
-rw-rw-r-- 1 hadoop hadoop 2 Oct 9 19:03 room54000-htm_2018-09-30-07.txt
-rw-rw-r-- 1 hadoop hadoop 2 Oct 9 19:03 room54000-htm_2018-09-30-08.txt
-rw-rw-r-- 1 hadoop hadoop 2 Oct 9 19:05 room54000-htm_2018-09-30-09.txt
-rw-rw-r-- 1 hadoop hadoop 2 Oct 9 19:06 room54000-htm_2018-09-30-10.txt
-rw-rw-r-- 1 hadoop hadoop 2 Oct 9 19:06 room54000-htm_2018-09-30-11.txt
-rw-rw-r-- 1 hadoop hadoop 2 Oct 9 19:06 room54000-htm_2018-09-30-12.txt
-rw-rw-r-- 1 hadoop hadoop 2 Oct 9 19:07 room54000-htm_2018-09-30-13.txt
-rw-rw-r-- 1 hadoop hadoop 2 Oct 9 19:09 room54000-htm_2018-09-30-14.txt
-rw-rw-r-- 1 hadoop hadoop 2 Oct 9 19:09 room54000-htm_2018-09-30-15.txt
-rw-rw-r-- 1 hadoop hadoop 2 Oct 9 19:10 room54000-htm_2018-09-30-16.txt
1.二、文件内容oop
[hadoop@emr-worker-10 result]$ cat room54000-htm_2018-*
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
7
0
0
0
0
0
二、单列数值统计
cat room54000-htm_2018-*|grep -v 0|awk '{sum +=$1};END {print sum}'
cat room54000-htm_2018-*|awk '{sum +=$1};END {print sum}'spa
[hadoop@emr-worker-10 result]$ cat room54000-htm_2018-*|awk '{sum +=$1};END {print sum}'
516
[hadoop@emr-worker-10 result]$ cat room54000-htm_2018-*|grep -v 0|awk '{sum +=$1};END {print sum}'
516
[hadoop@emr-worker-10 result]$
三、说明
适用于数据跑批后,将每一份的汇总结果再汇总的状况。code