Linux Data Manipulation

 

1.How Can I Sort Linux Files? Use Sort Command.

The syntax of the sort command is pretty strange, but if you study the following examples, you should be able to adapt one of them for your own use. The general form of the sort command is

sort <flags> <sort fields> <file name>

 

The most common flags are as follows:

-f Make all lines uppercase before sorting (so "Bill" and "bill" are treated the same).
-r Sort in reverse order (so "Z" starts the list instead of "A").
-n Sort a column in numerical order
-tx Use x as the field delimiter (replace x with a comma or other character).
-u Suppress all but one line in each set of lines with equal sort fields.

 

Specify the sort keys like this:

+m Start at the first character of the m+1th field.
-n End at the last character of the nth field (if -N omitted, assume the end of the line).

 

For example: there is a file named company.data as the following:

Jan Itorre 406378 Sales
Jim Nasium 031762 Marketing
Mel Ancholie 636496 Research
Ed Jucacion 396082 Sales

sort -r +2 -3 company.data > sorted.data

Mel Ancholie 636496 Research
Jan Itorre 406378 Sales
Ed Jucacion 396082 Sales
Jim Nasium 031762 Marketing

Change the data in company.data as the following:

Itorre, Jan:406378:Sales
Nasium, Jim:031762:Marketing
Ancholie, Mel:636496:Research
Jucacion, Ed:396082:Sales

sort -t: +1 -2 company.data

Nasium, Jim:031762:Marketing
Jucacion, Ed:396082:Sales
Itorre, Jan:406378:Sales
Ancholie, Mel:636496:Research

sort -t: -u +2 company.data

Nasium, Jim:031762:Marketing
Ancholie, Mel:636496:Research
Itorre, Jan:406378:Sales

 

2.How Can I Eliminate Duplicates in a Linux File? Use uniq command.

uniq <flags> <file name>

 

Here are the flags you can use with the uniq command:

-u Print only lines that appear once in the input file.

-d Print only the lines that appear more than once in the input file.
-c Precede each output line with a count of the number of times it was found.

For example: there is a file named my.books  as the following:

Atopic Dermatitis for Dummies
Atopic Dermatitis for Dummies
Chronic Rhinitis Unleashed
Chronic Rhinitis Unleashed
Chronic Rhinitis Unleashed
Learn Nasal Endoscopy in 21 Days

 

uniq my.books

Atopic Dermatitis for Dummies
Chronic Rhinitis Unleashed
Learn Nasal Endoscopy in 21 Days

uniq -u my.books

Learn Nasal Endoscopy in 21 Days

uniq -d my.books

Atopic Dermatitis for Dummies
Chronic Rhinitis Unleashed

uniq -c my.books   

2 Atopic Dermatitis for Dummies
3 Chronic Rhinitis Unleashed
1 Learn Nasal Endoscopy in 21 Days

 

3.How do I Select Columns From a File? Use cut command.

The cut command takes a vertical slice of a file, printing only the specified columns or fields. Like the sort command, the cut command defines a field as a word set off by blanks, unless you specify your own delimiter. It's easiest to think of a column as just the nth character on each line. In other words, "column 5" consists of the fifth character of each line.

Here is a summary of the most common flags for the cut command:

-c [n | n,m | n-m] Specify a single column, multiple columns (separated by a comma), or range of columns (separated by a dash).

-f [n | n,m | n-m] Specify a single field, multiple fields (separated by a comma), or range of fields (separated by a dash).

-dc Specify the field delimiter.

-s Suppress (don't print) lines not containing the delimiter.

 

For example: there is a file named company.data  as the following:

406378:Sales:Itorre:Jan

031762:Marketing:Nasium:Jim

636496:Research:Ancholie:Mel

396082:Sales:Jucacion:Ed

If you want to print just columns 1 to 6 of each line (the employee serial numbers), use the -c1-6 flag, as in this command:

cut -c1-6 company.data

406378

031762

636496

396082

If you want to print just columns 4 and 8 of each line (the first letter of the department and the fourth digit of the serial number), use the -c4,8 flag, as in this command:

cut -c4,8 company.data

3S

7M

4R

0S

And since this file obviously has fields delimited by colons, we can pick out just the last names by specifying the -d: and -f3 flags, like this:

cut -d: -f3 company.data

Itorre

Nasium

Ancholie

Jucacion

 

4.How do I search & replace words From a File? Use sed command.

The general forms of the sed command are as follows:

Substitution sed 's/<oldstring>/<newstri ng>/g' <file>
Deletion sed '<start>,<end>d' < file>

 

For example: there is a file named poem.txt as the following:

Mary had a little lamb

Mary fried a lot of spam
Jack ate a Spam sandwich
Jill had a lamb spamwich

 

sed 's/lamb/ham/g' poem.txt

Mary had a little ham
Mary fried a lot of
spam
Jack ate a Spam sandwich
Jill had a ham spamwich

sed '2,3d' poem.txt

Mary had a little lamb
Jill had a lamb spamwich

sed '1,/Jack/d' poem.txt

Jill had a lamb spamwich

sed 's/lamb$/ham/g' poem.txt > new.file

Mary had a little ham
Mary fried a lot of spam
Jack ate a Spam sandwich
Jill had a lamb spamwich

 

5.How do I Select Certain Records From a File? Use grep command.

 

6.How do I crunch data From a File? Use awk command.

The awk command combines the functions of grep and sed, making it one of the most powerful Unix commands

awk <pattern> '{print <stuff>}' <file>

For example: there is a file named words.data as the following:

nail hammer wood
pedal foot car
clown pie circus

awk '{print "Hit the",$1,"with your",$2}' words.data

Hit the nail with your hammer
Hit the pedal with your foot
Hit the clown with your pie

awk /^clown/'{print "See the",$1,"at the",$3}' words.data

See the clown at the circus

 

For example: there is a file named grades.data as the following:

Rogers 87 100 95
Lambchop 66 89 76
Barney 12 36 27

awk '{print "Avg for",$1,"is",($2+$3+$4)/3}' grades.data

Avg for Rogers is 94
Avg for Lambchop is 77
Avg for Barney is 25

 

7.How Do I Find Files with Linux? Use find command.

8 如何统计字数、行数、字节数?

wc -l file1 file2 file3

120 file1

200 file2

150 file3

- c 统计字节数。
- l
统计行数。
- w
统计字数。

缺省为-lcw

 

 

 

 

 

 

 

 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值