Unix and perl primer for Biologists - Part2 :Advanced Unix- Reading Notes(U33-U36)

U33: Match making
search files to find lines that match a certain pattern. The Unix command grep does this (and much more). To find only those header lines in a FASTA file, we can use grep, which just requires you specify a pattern to search for, and one or more files to search:
U34_match_making_grep_the_same_patterned_shown
one common option is to get grep to show lines that don’t match your input pattern. You can do this with the -v option and in this example we are seeing just the sequence part of the FASTA file.
U34_match_making_grep_v_the_different_pattern_shown
U34: Your first ever Unix pipe
look at the output from any command in a controlled manner. Can send the output from any command to any other Unix program (as long as the second program accepts input of some sort). By using what is known as a pipe. This is implemented using the ‘|’ character. Press the forward slash (/) key in less, you can then specify a search pattern. Type ATGTGA after the slash and press enter. The less program will highlight the location of these matches on each line. Note that grep matches patterns on a per line basis. So if one line ended ATG and the next line started TGA, then grep would not find it.Any time you run a Unix program or command that outputs a lot of text to the screen, you can instead pipe that output into the less program.
U34_Unix_pip_grep_less
U35: Heads and tails
just want to see a few lines to get a feeling for what the output looks like, or just check that our program (or Unix command) is working properly : head and tail , show (by default) the first or last 10 lines of a file. use the -i option of grep which ‘ignores’ case(upper-case or lower-case letters) when searching
U35_Unix_pip_grep_i_*_head
U35_unix_grep_i_*_authour_original

The * character acts as a wildcard meaning ‘search all files in the current directory’ and the head command restricts the total amount of output to 10 lines. Notice that the output also includes the name of the file containing the matching pattern.

U36: Getting fancy with regular expressions
A concept that is supported by many Unix programs and also by most programming languages (including Perl) is that of using regular expressions.

# grep "^ATG.*ACACAC.*TGA$"  chr1.fasta | less

u36_grep_regular_expression

# grep "^ATG*ACACAC*TGA$"  chr1.fasta | less

None

The asterisk in a regular expression is similar to, but NOT the same, as the other asterisks that we have seen so far. An asterisk in a regular expression means: ‘match zero or more of the preceding character or pattern’.

# grep "ACGT" chr1.fasta | less

U36_regular_expression_ACGT

# grep "AC.GT" chr1.fasta | less

U36_regular_expression_AC.GT

# grep "AC*GT" chr1.fasta | less

在这里插入图片描述

grep "A...T" chr1.fasta  | less

u36_regular_expression_A...T

grep "AG*T" chr1.fasta  | less

u36_regular_expression_AG*T

grep "A*G*C*T*" chr1.fasta  | less

u36_regular_expression_A*G*C*T*

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值