Shell 编程：利用字典统计文本次数

最新推荐文章于 2023-04-29 05:00:00 发布

毕小宝

最新推荐文章于 2023-04-29 05:00:00 发布

阅读量922

点赞数 1

分类专栏：项目开发问题文章标签： Shell 字典文本统计

本文为博主原创文章，未经博主允许不得转载。

本文链接：https://blog.csdn.net/wojiushiwo945you/article/details/90515056

版权

项目开发问题专栏收录该内容

179 篇文章 14 订阅

订阅专栏

问题

CSDN 问答上看到的一个问题，有一个 test.log ，内容如下：

a,e
a,
b,e
b,
c,e
c,e
c,
d,e
d,e

统计规则是这样的：每一行以逗号分割，如果第二个字段为 e 就统计该行，否则将第一个字段相同且第二个字段不为 e 的行数累加。

为了换换脑子、调节一下大脑思维，所以就花了点时间写了下这个脚本，整理过程如下。

脚本思路

利用 Shell 脚本进行统计，可以考虑字典这个数据类型，循环遍历文件内容，对每一行进行如下的处理：

awk 语句分割该行获取第一列和第二列
取出第一列的数据在字段中的值
如果第二列为 e 且字典中没有第一列这个文本，则加入字段，数值为0（即不统计）
如果第二列为 e 且字典中有第一列这个文本，则不做处理（即不统计）
如果第二列不为 e 且字典中没有，则首次加入，值为1
如果第二列不为 e 且字典中没有，则累加1

编写 Shell 脚本如下：

INPUT_FILE=/home/test.log
declare -A dic
echo 'start sumup'
while read -r line
do
  firstCol=`echo ${line} | awk -F ',' '{print $1}'`
  secondCol=`echo ${line} | awk -F ',' '{print $2}'`

  storedValue=${dic[$firstCol]}
  if [ "${secondCol}" = 'e' ] && [ -z $storedValue ]; then
     echo $firstCol" not exist and second field is e" set 0
     dic[$firstCol]=0
  elif [ "${secondCol}" != 'e' ] && [ -z $storedValue ]; then
     echo $firstCol" not exist and second field is not e,set 1"
     dic[$firstCol]=1
  elif [ "${secondCol}" = 'e' ] && [ -n $storedValue ];then
     echo $firstCol" exist and second field is e ,do nothing"
  else
     echo $firstCol" sumup 1"
     let dic[$firstCol]+=1
  fi
done < ${INPUT_FILE}

echo 'print the dictionary content'

for key in $(echo ${!dic[*]})
do
   value=${dic[$key]}
     if [ $value = 0 ] ; then
       echo $key null
   else
    echo "$key : ${dic[$key]}"
   fi
done

运行结果：

start sumup
a not exist and second field is e set 0
a sumup 1
b not exist and second field is e set 0
b sumup 1
c not exist and second field is e set 0
c exist and second field is e ,do nothing
c sumup 1
d not exist and second field is e set 0
d exist and second field is e ,do nothing
print the dictionary content
a : 1
b : 1
c : 1
d null