1.先看看文本,40多万行的log信息。处理要求:将第6列打印信息tbmt:后面的数值按照十六进制排序
2.shell脚本如下图,40万行文本,脚本没关打印要运行6-8个小时左右才能处理完。
#! /bin/bash
# --------------------------------------------------------------------------------
# File name : sort_beta.sh
# Date : 2018/07/28
# Author : Gick
# Mail : gickleeeee@gmail.com
# Usage : Sort the logFile in ascending
# Version : 1.0
# --------------------------------------------------------------------------------
#################### test weather file exists or not #####################
if test -z "$1"
then
echo "please input your log_file!"
exit 0
fi
if test -z "$2"
then
echo "please input your log_file for storing the fixed data!"
exit 0
fi
#################### classified according to the length of string of last column #####################
cat $1 | while read line
do
echo $line > temp.txt
ColumnData=`awk '{print $6}' ./temp.txt`
echo $ColumnData
ColumnData=`echo ${#ColumnData}`
case $ColumnData in
6)
echo "================= SIZE = 6 ===================="
echo $line >> temp1.txt
;;
7)
echo "================= SIZE = 7 ===================="
echo $line >> temp2.txt
;;
8)
echo "================= SIZE = 8 ===================="
echo $line >> temp3.txt
;;
9)
echo "================= SIZE = 6 ===================="
echo $line >> temp4.txt
;;
esac
done
#################### sort data of different string lengths separately #####################
echo "================= SIZE = 6 ===================="
sort -k 6.6 temp1.txt > $2
echo "================= SIZE = 7 ===================="
sort -k 6.6 temp2.txt >> $2
echo "================= SIZE = 8 ===================="
sort -k 6.6 temp3.txt >> $2
echo "================= SIZE = 9 ===================="
sort -k 6.6 temp4.txt >> $2
#################### remove temporary files #####################
rm temp*
3.再来看看python脚本,运行时间未测,但应该比上面少。别人写的,先mark,回头学习,是时候转python了。