HDFS查看文件的前几行-后几行-随机几行-行数-指定的行


今天想查看HDFS上文件的后30行,发现HDFS命令的tail参数后不能指定行数,只能跟文件,默认后10行

$ cd /opt/cdh-5.7.6/hadoop-2.6.0-cdh5.7.6/
$ bin/hdfs dfs -tail -30 /datas/access_log
-tail: Illegal option -30
Usage: hadoop fs [generic options] -tail [-f] <file>

那怎么去实现呢,其实有一个强大的管道符,可以通过它实现我们需要的功能

管道符大概是意思是将前面输出的结果作为后面的输入

这样我们可以通过将-cat命令的输出通过管道传递给shell中的more、head、tail、wc等命令来实现我们shell中能实现而HDFS不能实现的功能

查看下HDFS文件的行数

$ bin/hdfs dfs -cat /datas/access_log | wc -l
1546

顺便再复习下shell中wc命令的用法

Linux系统中的wc(Word Count)命令的功能为统计指定文件中的行数、字数、字节数,并将统计结果显示输出。

1.命令格式:

wc [选项] 文件…

2.命令功能:

统计指定文件中的行数、字数、字节数,并将统计结果显示输出。该命令统计指定文件中的行数、字数、字节数。如果没有给出文件名,则从标准输入读取。wc同时也给出所指定文件的总统计数。

3.命令参数:

-c 统计字节数。

-l 统计行数。

-m 统计字符数。

-w 统计字数。一个字被定义为由空白、跳格或换行字符分隔的字符串。

-L 打印最长行的长度。

–help 显示帮助信息

–version 显示版本信息

$ bin/hdfs dfs -cat /datas/access_log | wc
   1546   15458  174449

bin/hdfs dfs -cat /datas/access_log | wc -c
174449

$ bin/hdfs dfs -cat /datas/access_log | wc -l
1546

$ bin/hdfs dfs -cat /datas/access_log | wc -l -c
   1546  174449

$ bin/hdfs dfs -cat /datas/access_log | wc -l -m
   1546  174449

$ bin/hdfs dfs -cat /datas/access_log | wc -c -m
 174449  174449
 
$ bin/hdfs dfs -cat /datas/access_log | wc -w -m
  15458  174449

$ bin/hdfs dfs -cat /datas/access_log | wc -l -L
   1546     193

$ wc --help
Usage: wc [OPTION]... [FILE]...
  or:  wc [OPTION]... --files0-from=F
Print newline, word, and byte counts for each FILE, and a total line if
more than one FILE is specified.  With no FILE, or when FILE is -,
read standard input.
  -c, --bytes            print the byte counts
  -m, --chars            print the character counts
  -l, --lines            print the newline counts
      --files0-from=F    read input from the files specified by
                           NUL-terminated names in file F;
                           If F is - then read names from standard input
  -L, --max-line-length  print the length of the longest line
  -w, --words            print the word counts
      --help     display this help and exit
      --version  output version information and exit

Report wc bugs to bug-coreutils@gnu.org
GNU coreutils home page: <http://www.gnu.org/software/coreutils/>
General help using GNU software: <http://www.gnu.org/gethelp/>
For complete documentation, run: info coreutils 'wc invocation'

$ wc --version
wc (GNU coreutils) 8.4
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Paul Rubin and David MacKenzie.

查看HDFS文件的后30行

$ bin/hdfs dfs -cat /datas/access_log | tail -30
ns.wtbts.org - - [12/Mar/2004:11:16:56 -0800] "GET /dccstats/stats-spam-ratio.1year.png HTTP/1.0" 200 1853
ns.wtbts.org - - [12/Mar/2004:11:16:56 -0800] "GET /dccstats/stats-hashes.1year.png HTTP/1.0" 200 1572
67.131.107.5 - - [12/Mar/2004:11:39:14 -0800] "GET / HTTP/1.1" 200 3169
67.131.107.5 - - [12/Mar/2004:11:39:25 -0800] "GET /twiki/bin/view/Main/WebHome HTTP/1.1" 200 10419
67.131.107.5 - - [12/Mar/2004:11:39:31 -0800] "GET /twiki/pub/TWiki/TWikiLogos/twikiRobot46x50.gif HTTP/1.1" 200 2877
10.0.0.153 - - [12/Mar/2004:12:23:11 -0800] "GET / HTTP/1.1" 304 -
10.0.0.153 - - [12/Mar/2004:12:23:17 -0800] "GET /cgi-bin/mailgraph2.cgi HTTP/1.1" 200 2987
10.0.0.153 - - [12/Mar/2004:12:23:18 -0800] "GET /cgi-bin/mailgraph.cgi/mailgraph_0_err.png HTTP/1.1" 200 6324
10.0.0.153 - - [12/Mar/2004:12:23:18 -0800] "GET /cgi-bin/mailgraph.cgi/mailgraph_1.png HTTP/1.1" 200 8964
10.0.0.153 - - [12/Mar/2004:12:23:18 -0800] "GET /cgi-bin/mailgraph.cgi/mailgraph_0.png HTTP/1.1" 200 6225
10.0.0.153 - - [12/Mar/2004:12:23:18 -0800] "GET /cgi-bin/mailgraph.cgi/mailgraph_2_err.png HTTP/1.1" 200 7001
10.0.0.153 - - [12/Mar/2004:12:23:18 -0800] "GET /cgi-bin/mailgraph.cgi/mailgraph_2.png HTTP/1.1" 200 9514
10.0.0.153 - - [12/Mar/2004:12:23:18 -0800] "GET /cgi-bin/mailgraph.cgi/mailgraph_1_err.png HTTP/1.1" 200 6949
10.0.0.153 - - [12/Mar/2004:12:23:18 -0800] "GET /cgi-bin/mailgraph.cgi/mailgraph_3.png HTTP/1.1" 200 6644
10.0.0.153 - - [12/Mar/2004:12:23:18 -0800] "GET /cgi-bin/mailgraph.cgi/mailgraph_3_err.png HTTP/1.1" 200 5554
10.0.0.153 - - [12/Mar/2004:12:23:40 -0800] "GET /dccstats/index.html HTTP/1.1" 304 -
10.0.0.153 - - [12/Mar/2004:12:23:41 -0800] "GET /dccstats/stats-spam.1day.png HTTP/1.1" 200 2964
10.0.0.153 - - [12/Mar/2004:12:23:41 -0800] "GET /dccstats/stats-spam-ratio.1day.png HTTP/1.1" 200 2341
10.0.0.153 - - [12/Mar/2004:12:23:41 -0800] "GET /dccstats/stats-spam-ratio.1week.png HTTP/1.1" 200 2346
10.0.0.153 - - [12/Mar/2004:12:23:41 -0800] "GET /dccstats/stats-spam.1week.png HTTP/1.1" 200 3438
10.0.0.153 - - [12/Mar/2004:12:23:41 -0800] "GET /dccstats/stats-hashes.1week.png HTTP/1.1" 200 1670
10.0.0.153 - - [12/Mar/2004:12:23:41 -0800] "GET /dccstats/stats-spam.1month.png HTTP/1.1" 200 2651
10.0.0.153 - - [12/Mar/2004:12:23:41 -0800] "GET /dccstats/stats-spam-ratio.1month.png HTTP/1.1" 200 2023
10.0.0.153 - - [12/Mar/2004:12:23:41 -0800] "GET /dccstats/stats-hashes.1month.png HTTP/1.1" 200 1636
10.0.0.153 - - [12/Mar/2004:12:23:41 -0800] "GET /dccstats/stats-spam.1year.png HTTP/1.1" 200 2262
10.0.0.153 - - [12/Mar/2004:12:23:41 -0800] "GET /dccstats/stats-spam-ratio.1year.png HTTP/1.1" 200 1906
10.0.0.153 - - [12/Mar/2004:12:23:41 -0800] "GET /dccstats/stats-hashes.1year.png HTTP/1.1" 200 1582
216.139.185.45 - - [12/Mar/2004:13:04:01 -0800] "GET /mailman/listinfo/webber HTTP/1.1" 200 6051
pd95f99f2.dip.t-dialin.net - - [12/Mar/2004:13:18:57 -0800] "GET /razor.html HTTP/1.1" 200 2869
d97082.upc-d.chello.nl - - [12/Mar/2004:13:25:45 -0800] "GET /SpamAssassin.html HTTP/1.1" 200 7368

查看HDFS文件的前20行

$ bin/hdfs dfs -cat /datas/access_log | head -20
64.242.88.10 - - [07/Mar/2004:16:05:49 -0800] "GET /twiki/bin/edit/Main/Double_bounce_sender?topicparent=Main.ConfigurationVariables HTTP/1.1" 401 12846
64.242.88.10 - - [07/Mar/2004:16:06:51 -0800] "GET /twiki/bin/rdiff/TWiki/NewUserTemplate?rev1=1.3&rev2=1.2 HTTP/1.1" 200 4523
64.242.88.10 - - [07/Mar/2004:16:10:02 -0800] "GET /mailman/listinfo/hsdivision HTTP/1.1" 200 6291
64.242.88.10 - - [07/Mar/2004:16:11:58 -0800] "GET /twiki/bin/view/TWiki/WikiSyntax HTTP/1.1" 200 7352
64.242.88.10 - - [07/Mar/2004:16:20:55 -0800] "GET /twiki/bin/view/Main/DCCAndPostFix HTTP/1.1" 200 5253
64.242.88.10 - - [07/Mar/2004:16:23:12 -0800] "GET /twiki/bin/oops/TWiki/AppendixFileSystem?template=oopsmore&param1=1.12&param2=1.12 HTTP/1.1" 200 11382
64.242.88.10 - - [07/Mar/2004:16:24:16 -0800] "GET /twiki/bin/view/Main/PeterThoeny HTTP/1.1" 200 4924
64.242.88.10 - - [07/Mar/2004:16:29:16 -0800] "GET /twiki/bin/edit/Main/Header_checks?topicparent=Main.ConfigurationVariables HTTP/1.1" 401 12851
64.242.88.10 - - [07/Mar/2004:16:30:29 -0800] "GET /twiki/bin/attach/Main/OfficeLocations HTTP/1.1" 401 12851
64.242.88.10 - - [07/Mar/2004:16:31:48 -0800] "GET /twiki/bin/view/TWiki/WebTopicEditTemplate HTTP/1.1" 200 3732
64.242.88.10 - - [07/Mar/2004:16:32:50 -0800] "GET /twiki/bin/view/Main/WebChanges HTTP/1.1" 200 40520
64.242.88.10 - - [07/Mar/2004:16:33:53 -0800] "GET /twiki/bin/edit/Main/Smtpd_etrn_restrictions?topicparent=Main.ConfigurationVariables HTTP/1.1" 401 12851
64.242.88.10 - - [07/Mar/2004:16:35:19 -0800] "GET /mailman/listinfo/business HTTP/1.1" 200 6379
64.242.88.10 - - [07/Mar/2004:16:36:22 -0800] "GET /twiki/bin/rdiff/Main/WebIndex?rev1=1.2&rev2=1.1 HTTP/1.1" 200 46373
64.242.88.10 - - [07/Mar/2004:16:37:27 -0800] "GET /twiki/bin/view/TWiki/DontNotify HTTP/1.1" 200 4140
64.242.88.10 - - [07/Mar/2004:16:39:24 -0800] "GET /twiki/bin/view/Main/TokyoOffice HTTP/1.1" 200 3853
64.242.88.10 - - [07/Mar/2004:16:43:54 -0800] "GET /twiki/bin/view/Main/MikeMannix HTTP/1.1" 200 3686
64.242.88.10 - - [07/Mar/2004:16:45:56 -0800] "GET /twiki/bin/attach/Main/PostfixCommands HTTP/1.1" 401 12846
64.242.88.10 - - [07/Mar/2004:16:47:12 -0800] "GET /robots.txt HTTP/1.1" 200 68
64.242.88.10 - - [07/Mar/2004:16:47:46 -0800] "GET /twiki/bin/rdiff/Know/ReadmeFirst?rev1=1.5&rev2=1.4 HTTP/1.1" 200 5724
cat: Unable to write to output stream.

随机查看HDFS文件的10行

$ bin/hdfs dfs -cat /datas/access_log | shuf -n 10
10.0.0.153 - - [12/Mar/2004:12:23:17 -0800] "GET /cgi-bin/mailgraph2.cgi HTTP/1.1" 200 2987
64.242.88.10 - - [08/Mar/2004:14:23:54 -0800] "GET /twiki/bin/oops/TWiki/RyanFreebern?template=oopsmore&param1=1.2&param2=1.2 HTTP/1.1" 200 11263
p213.54.168.132.tisdip.tiscali.de - - [08/Mar/2004:05:26:06 -0800] "GET /twiki/bin/edit/Main/UvscanAndPostFix?topicparent=Main.WebHome HTTP/1.1" 401 12851
ts05-ip44.hevanet.com - - [10/Mar/2004:08:55:40 -0800] "GET /twiki/bin/view/Main/KevinWGagel HTTP/1.1" 200 4901
10.0.0.153 - - [10/Mar/2004:12:07:07 -0800] "GET /icons/gnu-head-tiny.jpg HTTP/1.1" 304 -
spot.nnacorp.com - - [08/Mar/2004:09:02:54 -0800] "GET /twiki/pub/TWiki/TWikiLogos/twikiRobot46x50.gif HTTP/1.1" 304 -
64.242.88.10 - - [08/Mar/2004:05:56:08 -0800] "GET /twiki/bin/view/TWiki/TWikiRegistration?rev=r1.4 HTTP/1.1" 200 12113
lj1117.inktomisearch.com - - [10/Mar/2004:18:13:54 -0800] "GET /twiki/bin/view/Main/VishaalGolam HTTP/1.0" 200 4577
10.0.0.153 - - [11/Mar/2004:15:52:38 -0800] "GET /dccstats/stats-spam-ratio.1week.png HTTP/1.1" 200 2434
64.242.88.10 - - [08/Mar/2004:10:34:55 -0800] "GET /twiki/bin/view/TWiki/WebSearch?skin=print HTTP/1.1" 200 7196

查看文件的第3行到第10行

$ bin/hdfs dfs -cat /datas/access_log | sed -n '3,10p'
64.242.88.10 - - [07/Mar/2004:16:10:02 -0800] "GET /mailman/listinfo/hsdivision HTTP/1.1" 200 6291
64.242.88.10 - - [07/Mar/2004:16:11:58 -0800] "GET /twiki/bin/view/TWiki/WikiSyntax HTTP/1.1" 200 7352
64.242.88.10 - - [07/Mar/2004:16:20:55 -0800] "GET /twiki/bin/view/Main/DCCAndPostFix HTTP/1.1" 200 5253
64.242.88.10 - - [07/Mar/2004:16:23:12 -0800] "GET /twiki/bin/oops/TWiki/AppendixFileSystem?template=oopsmore&param1=1.12&param2=1.12 HTTP/1.1" 200 11382
64.242.88.10 - - [07/Mar/2004:16:24:16 -0800] "GET /twiki/bin/view/Main/PeterThoeny HTTP/1.1" 200 4924
64.242.88.10 - - [07/Mar/2004:16:29:16 -0800] "GET /twiki/bin/edit/Main/Header_checks?topicparent=Main.ConfigurationVariables HTTP/1.1" 401 12851
64.242.88.10 - - [07/Mar/2004:16:30:29 -0800] "GET /twiki/bin/attach/Main/OfficeLocations HTTP/1.1" 401 12851
64.242.88.10 - - [07/Mar/2004:16:31:48 -0800] "GET /twiki/bin/view/TWiki/WebTopicEditTemplate HTTP/1.1" 200 3732

查看指定时间的前后5行(包括指定的行)

$ bin/hdfs dfs -cat /datas/access_log | grep -C 5 07/Mar/2004:16:31:48
64.242.88.10 - - [07/Mar/2004:16:20:55 -0800] "GET /twiki/bin/view/Main/DCCAndPostFix HTTP/1.1" 200 5253
64.242.88.10 - - [07/Mar/2004:16:23:12 -0800] "GET /twiki/bin/oops/TWiki/AppendixFileSystem?template=oopsmore&param1=1.12&param2=1.12 HTTP/1.1" 200 11382
64.242.88.10 - - [07/Mar/2004:16:24:16 -0800] "GET /twiki/bin/view/Main/PeterThoeny HTTP/1.1" 200 4924
64.242.88.10 - - [07/Mar/2004:16:29:16 -0800] "GET /twiki/bin/edit/Main/Header_checks?topicparent=Main.ConfigurationVariables HTTP/1.1" 401 12851
64.242.88.10 - - [07/Mar/2004:16:30:29 -0800] "GET /twiki/bin/attach/Main/OfficeLocations HTTP/1.1" 401 12851
64.242.88.10 - - [07/Mar/2004:16:31:48 -0800] "GET /twiki/bin/view/TWiki/WebTopicEditTemplate HTTP/1.1" 200 3732
64.242.88.10 - - [07/Mar/2004:16:32:50 -0800] "GET /twiki/bin/view/Main/WebChanges HTTP/1.1" 200 40520
64.242.88.10 - - [07/Mar/2004:16:33:53 -0800] "GET /twiki/bin/edit/Main/Smtpd_etrn_restrictions?topicparent=Main.ConfigurationVariables HTTP/1.1" 401 12851
64.242.88.10 - - [07/Mar/2004:16:35:19 -0800] "GET /mailman/listinfo/business HTTP/1.1" 200 6379
64.242.88.10 - - [07/Mar/2004:16:36:22 -0800] "GET /twiki/bin/rdiff/Main/WebIndex?rev1=1.2&rev2=1.1 HTTP/1.1" 200 46373
64.242.88.10 - - [07/Mar/2004:16:37:27 -0800] "GET /twiki/bin/view/TWiki/DontNotify HTTP/1.1" 200 4140

查看指定时间的前5行(包括指定的行)

$ bin/hdfs dfs -cat /datas/access_log | grep -B 5 07/Mar/2004:16:31:48
64.242.88.10 - - [07/Mar/2004:16:20:55 -0800] "GET /twiki/bin/view/Main/DCCAndPostFix HTTP/1.1" 200 5253
64.242.88.10 - - [07/Mar/2004:16:23:12 -0800] "GET /twiki/bin/oops/TWiki/AppendixFileSystem?template=oopsmore&param1=1.12&param2=1.12 HTTP/1.1" 200 11382
64.242.88.10 - - [07/Mar/2004:16:24:16 -0800] "GET /twiki/bin/view/Main/PeterThoeny HTTP/1.1" 200 4924
64.242.88.10 - - [07/Mar/2004:16:29:16 -0800] "GET /twiki/bin/edit/Main/Header_checks?topicparent=Main.ConfigurationVariables HTTP/1.1" 401 12851
64.242.88.10 - - [07/Mar/2004:16:30:29 -0800] "GET /twiki/bin/attach/Main/OfficeLocations HTTP/1.1" 401 12851
64.242.88.10 - - [07/Mar/2004:16:31:48 -0800] "GET /twiki/bin/view/TWiki/WebTopicEditTemplate HTTP/1.1" 200 3732

查看指定时间的后5行(包括指定的行)

$ bin/hdfs dfs -cat /datas/access_log | grep -A 5 07/Mar/2004:16:31:48
64.242.88.10 - - [07/Mar/2004:16:31:48 -0800] "GET /twiki/bin/view/TWiki/WebTopicEditTemplate HTTP/1.1" 200 3732
64.242.88.10 - - [07/Mar/2004:16:32:50 -0800] "GET /twiki/bin/view/Main/WebChanges HTTP/1.1" 200 40520
64.242.88.10 - - [07/Mar/2004:16:33:53 -0800] "GET /twiki/bin/edit/Main/Smtpd_etrn_restrictions?topicparent=Main.ConfigurationVariables HTTP/1.1" 401 12851
64.242.88.10 - - [07/Mar/2004:16:35:19 -0800] "GET /mailman/listinfo/business HTTP/1.1" 200 6379
64.242.88.10 - - [07/Mar/2004:16:36:22 -0800] "GET /twiki/bin/rdiff/Main/WebIndex?rev1=1.2&rev2=1.1 HTTP/1.1" 200 46373
64.242.88.10 - - [07/Mar/2004:16:37:27 -0800] "GET /twiki/bin/view/TWiki/DontNotify HTTP/1.1" 200 4140
  • 6
    点赞
  • 22
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值