编程小甜点(持续更新)

打算做一个R,python,shell的编程习题合集,每道题都用这三种语言来写,会附有我写的code(欢迎指正)。给和我一样自己摸索入门的编程小白。
学习编程主要掌握的是思想,拿到一个具体的问题,如何去拆解,这也是考验逻辑的。这点在才入门的时候要多看别人写的代码,有大概的方向,然后逐渐形成自己的思路。其次,在掌握逻辑方法之后,不同语言其实就是编程规则不同。对于同时要用多种语言的人,要学会多去对比找到不同之处。我推荐的入门方法是先看两三本该语言的入门书,不必全部记住,但至少心里有个框架,然后,无论是编程思想的掌握还是规则的熟练,最快的都是在实战里提高,共勉~


Project Euler


Problem 1:Multiples of 3 and 5

If we list all the natural numbers below 10 that are multiples of 3 or 5, we get 3, 5, 6 and 9. The sum of these multiples is 23. Find the sum of all the multiples of 3 or 5 below 1000.

R
sum=0
for(i in 1:999){
  if (i%%3==0 | i%%5==0){
    sum=sum+i
    }
  }
print(sum)

linux运行:Rscript 1.R

python
sum=0
for i in range(1,1000):
  if i%3==0 or i%5==0:
    sum=sum+i
print(sum)

linux运行:python 1.py

shell
sum=0
for (( i = 0; i <= 999; i++ ))
do
  if [ $[i%3] -eq 0 ] || [ $[i%5] -eq 0 ]
  then
    sum=$[ $sum + $i]
  fi
done
echo $sum

linux运行:sh 1.sh
在shell里做运算相比R和python麻烦很多,并且变量写法也与R,python中不同,还要多注意空格问题。


Problem 2:Even Fibonacci numbers

Each new term in the Fibonacci sequence is generated by adding the previous two terms. By starting with 1 and 2, the first 10 terms will be:1, 2, 3, 5, 8, 13, 21, 34, 55, 89, … By considering the terms in the Fibonacci sequence whose values do not exceed four million, find the sum of the even-valued terms.

R
c=c(1,2)
i=3
c[i]=c[i-1]+c[i-2]
even_sum=2
while (c[i]<4000000){
  i=i+1
  c[i]=c[i-1]+c[i-2]
  if (c[i]%%2==0){
    even_sum=even_sum+c[i]
  }
}
print(even_sum)
python
list=[1,2]
i=2
list.append(list[i-1]+list[i-2])
even_sum=2
while list[i]<4000000:
  i=i+1
  list.append(list[i-1]+list[i-2])
  if list[i]%2==0:
    even_sum=even_sum+list[i]
print(even_sum)

python的list不能直接添加元素,如list=[1,2],不能直接list[2]=3,会报错index out of range。

shell
c=(1 2)
i=2
c[i]=$[c[i-1]+c[i-2]]
even_sum=2
while [ ${c[i]} -lt 4000000 ]
do
  i=$[$i+1]
  c[i]=$[c[i-1]+c[i-2]]
  if [ $[${c[i]}%2] -eq 0 ]
  then
    even_sum=$[ $even_sum + ${c[$i]}]
  fi
done
echo $even_sum

每次写shell的时间是R+python的两倍。相比shell,另两个写起来真的太友好了。注意空格。


Problem 3:Largest prime factor

The prime factors of 13195 are 5, 7, 13 and 29.
What is the largest prime factor of the number 600851475143 ?
#这题好像面试中比较常考到的

R
num = 600851475143
i = 2
while(T){
  i = i + 1
  if(num%%i == 0){
    num = num/i
    j = i
  }else if(i>num){
    break
  }
}
print(j)

while(T)是一个死循环,一般和break连用。并且在三种语言里写法不同,注意细微区别。

python
num = 600851475143
i = 2
while True:
  i = i + 1
  if num%i == 0:
    num=num/i
    j = i
  elif i>num:
    break
  else:
    continue
print(j)
shell
num=600851475143
i=2
while true
do
  i=$[$i+1]
  if [ $[$num%$i] -eq 0 ]
  then
    num=$[$num/$i]
    j=$i
  elif [ $i -gt $num ]
  then
    break
  else
    continue
  fi
done
echo $j

tips:
=,==,-eq的区别:
==既适用于 STRING 类型比较,又适用于 INTEGER 类型比较
= 与 == 在 [ ] 中表示判断(字符串比较)时是等价的。
-eq 用于 INTEGER 类型比较。


Problem 4:Largest palindrome product

A palindromic number reads the same both ways.
The largest palindrome made from the product of two 2-digit numbers is 9009 = 91 × 99.
Find the largest palindrome made from the product of two 3-digit numbers.

R
num=c(1,2)
c=1
for(k in 100:999){
  for(n in 100:k){
    i=as.character(n*k)
    b=unlist(strsplit(i,split=""))
    a=length(b) 
    ri=list() 
    while(a>0){
      ri=append(ri,b[a])
      a=a-1
    }
    ri=unlist(ri)
    if(all(b==ri)){
      i=as.numeric(i) 
      num[c]=i
      c=c+1 
    }
  } 
}   
print(max(num))

相当于写了个reverse的功能。

python
num=[]
for k in range(100,999):
  for n in range(100,k):
    i=str(n*k)
    ri=''.join(reversed(i))
    if i == ri:
      num.append(int(i))
print(max(num))
shell
c=1
for(( k=100 ; k<=999 ; k++ ))
do
  for(( n=100 ; n<=k ; n++ ))
  do
    i=$[$n*$k]
    LEN=${#i}
    ri=
    for((a=LEN;a>=0;a--))
    do
      ri=$ri${i:$a:1}
    done
    if [ $i -eq $ri ]
    then
      num[c]=$i
      c=$[$c+1]
    fi
  done
done

MAX=${num[1]}
for I in ${!num[*]}
do
  if [[ ${MAX} -lt ${num[${I}]} ]]
  then
    MAX=${num[${I}]}
  fi
done
echo $MAX

Problem 5:Smallest multiple

520 is the smallest number that can be divided by each of the numbers from 1 to 10 without any remainder.
What is the smallest positive number that is evenly divisible by all of the numbers from 1 to 20?

R
prime=c(2)
b=2
for(i in 3:20){
  k=ceiling(sqrt(i))
  a=0
  for(n in 1:k){
    if(i%%n != 0){
      a=a+1
    }
  }
  if(a == k-1){
    prime[b]=i
    b=b+1
  }
}

num=c(1)
d=2
for(c in 1:length(prime)){
  m=1
  r=prime[c]
  while(r^m<20){
    m=m+1
  }
num[d]=r^(m-1)
  d=d+1
}

product=1
for(s in 1:length(num)){
  product=product*num[s]
}
print(product)

思路:1.找出小于20的质数(质数i不能被2:根号i的数整除) 2.求他们小于20的最高次幂 3.将他们的最高次幂相乘即为所求。

python
import math
prime=[2]
for i in range(3,21):
  k=math.ceil(math.sqrt(i))
  a=0
  for n in range(1,k+1):
    if i%n != 0:
      a=a+1
  if a == k-1:
    prime.append(i)
    
num=[]
for r in prime:
  m=1
  while r**m < 20:
    m=m+1
  num.append(r**(m-1))
  
product=1
print(num)
for s in num:
  product=product*s
print(product)

注意:python的区间是左闭右开。R中幂表示为^,python中表示为**。

shell
p=1
prime[0]=2
for(( i=3 ; i<=20 ; i++ ))
do
  k=$[$i-1]
  a=0
  for(( n=1 ; n<=k ; n++ ))
  do
    if [ $[$i%$n] -ne 0 ]
    then
      a=$[$a+1]
    fi
  if [ $a -eq $[$i-2] ]
  then
    prime[p]=$i
    p=$[$p+1]
  fi
  done
done

e=0
for r in ${!prime[*]}
do
  m=1
  while [ $[${prime[${r}]}**$m] -lt 20 ]
  do
    m=$[$m+1]
  done
  num[e]=$[${prime[${r}]}**$[$m-1]]
  e=$[e+1]
done

product=1
for s in ${!num[*]}
do
  product=$[$product*${num[${s}]}]
done
echo $product
Problem 6:Smallest multiple

The sum of the squares of the first ten natural numbers is,
12 + 22 + … + 102 = 385
The square of the sum of the first ten natural numbers is,
(1 + 2 + … + 10)2 = 552 = 3025
Hence the difference between the sum of the squares of the first ten natural numbers and the square of the sum is 3025 − 385 = 2640.
Find the difference between the sum of the squares of the first one hundred natural numbers and the square of the sum.

R
s1=1
sum=1
for(i in 2:100){
  s1=s1+i^2
  sum=sum+i
  i=i+1
}
s2=sum^2
diff_s=s2-s1
print(diff_s)
python
s1=1
sum=1
for i in range(2,101):
  s1=s1+i**2
  sum=sum+i
s2=sum**2
diff_s=s2-s1
print(diff_s)
shell
s1=1
sum=1
for(( i=2 ; i<=100 ;i++ ))
do
  s1=$[$s1+$[$i**2]]
  sum=$[$sum+$i]
done
s2=$[$sum**2]
diff_s=$[$s2-$s1]
echo $diff_s

Problem 7:10001st prime

By listing the first six prime numbers: 2, 3, 5, 7, 11, and 13, we can see that the 6th prime is 13.
What is the 10 001st prime number?

R
i=1
p=2
prime=c(2)
while(T){
  p=p+1
  m=0
  for(n in 1:ceiling(sqrt(p))){
    if(p%%n == 0){
      m=m+1
    }
  }
  if(m==1){
    i=i+1
    prime[i]=p
  }
  if(i==10001){
    break
  }
}
print(prime[10001])
#104743
python
import math
i=0
p=2
prime=list()
prime.append(2)
while True:
  p=p+1
  m=0
  k=math.ceil(p**0.5)
  for n in range(1,k+1):
    if p%n==0:
      m=m+1
  if m==1:
    i=i+1
    prime.append(p)
  if i==10000:
    break
print(prime[10000])
shell
i=0
p=2
n=0
prime=(2)
while true
do
  p=$[$p+1]
  m=$n
  o=$(echo "sqrt($p)" | bc)
  o=$[$o+1]
  for(( k=1 ; k<=o ; k++ ))
  do
    if [ $[$p%$k] -eq 0 ]
    then
      m=$[$m+1]
    fi
  done
  if [ $m -eq 1 ]
  then
    i=$[$i+1]
    prime[i]=$p
  fi
  if [ $i -eq 10000 ]
  then
    break
  fi
done
prime10001=${prime[10000]}
echo $prime10001

Problem 8:Largest product in a series

The four adjacent digits in the 1000-digit number that have the greatest product are 9 × 9 × 8 × 9 = 5832.
73167176531330624919225119674426574742355349194934
96983520312774506326239578318016984801869478851843
85861560789112949495459501737958331952853208805511
12540698747158523863050715693290963295227443043557
66896648950445244523161731856403098711121722383113
62229893423380308135336276614282806444486645238749
30358907296290491560440772390713810515859307960866
70172427121883998797908792274921901699720888093776
65727333001053367881220235421809751254540594752243
52584907711670556013604839586446706324415722155397
53697817977846174064955149290862569321978468622482
83972241375657056057490261407972968652414535100474
82166370484403199890008895243450658541227588666881
16427171479924442928230863465674813919123162824586
17866458359124566529476545682848912883142607690042
24219022671055626321111109370544217506941658960408
07198403850962455444362981230987879927244284909188
84580156166097919133875499200524063689912560717606
05886116467109405077541002256983155200055935729725
71636269561882670428252483600823257530420752963450
Find the thirteen adjacent digits in the 1000-digit number that have the greatest product. What is the value of this product?

R
num="7316717653133062491922511967442657474235534919493496983520312774506326239578318016984801869478851843858615607891129494954595017379583319528532088055111254069874715852386305071569329096329522744304355766896648950445244523161731856403098711121722383113622298934233803081353362766142828064444866452387493035890729629049156044077239071381051585930796086670172427121883998797908792274921901699720888093776657273330010533678812202354218097512545405947522435258490771167055601360483958644670632441572215539753697817977846174064955149290862569321978468622482839722413756570560574902614079729686524145351004748216637048440319989000889524345065854122758866688116427171479924442928230863465674813919123162824586178664583591245665294765456828489128831426076900422421902267105562632111110937054421750694165896040807198403850962455444362981230987879927244284909188845801561660979191338754992005240636899125607176060588611646710940507754100225698315520005593572972571636269561882670428252483600823257530420752963450"

##找出最大的13位数
ndata=c()
for(i in 1:988){
  n=substring(num,i,i+12)
  n=as.numeric(n) 
  ndata=c(ndata,n)
}        
ndata=sort(ndata)
print(ndata[988])

##找出乘积最大的13位数
ndata=c()
for(i in 1:988){
  n=substring(num,i,i+12)
  value=1
  for(k in 1:13){
    a=as.numeric(substring(n,k,k))
    value=a*value
  }      
  ndata=c(ndata,value)
}
maxvalue=0   
for(v in 1:988){  
  if(ndata[v] >= maxvalue){
    maxvalue=ndata[v]
  } 
}   
print(maxvalue)

之前写过排序算法,这里就直接用R里内置的排序函数sort()

python
num="7316717653133062491922511967442657474235534919493496983520312774506326239578318016984801869478851843858615607891129494954595017379583319528532088055111254069874715852386305071569329096329522744304355766896648950445244523161731856403098711121722383113622298934233803081353362766142828064444866452387493035890729629049156044077239071381051585930796086670172427121883998797908792274921901699720888093776657273330010533678812202354218097512545405947522435258490771167055601360483958644670632441572215539753697817977846174064955149290862569321978468622482839722413756570560574902614079729686524145351004748216637048440319989000889524345065854122758866688116427171479924442928230863465674813919123162824586178664583591245665294765456828489128831426076900422421902267105562632111110937054421750694165896040807198403850962455444362981230987879927244284909188845801561660979191338754992005240636899125607176060588611646710940507754100225698315520005593572972571636269561882670428252483600823257530420752963450"

##找出最大的13位数
ndata=[]
for i in range(0,988):
  n=num[i:i+13]
  n=int(n)
  ndata.append(n)
ndata.sort()
print(ndata[987])

##找出乘积最大的13位数
ndata=[]
for i in range(0,988):
  n=num[i:i+13]
  value=1
  for k in range(0,13):
    a=n[k:k+1]
    a=int(a)
    value=a*value
  ndata.append(value)
maxvalue=0
for v in range(0,988):
  if ndata[v]>maxvalue:
    maxvalue=ndata[v]
print(maxvalue)
shell

num="7316717653133062491922511967442657474235534919493496983520312774506326239578318016984801869478851843858615607891129494954595017379583319528532088055111254069874715852386305071569329096329522744304355766896648950445244523161731856403098711121722383113622298934233803081353362766142828064444866452387493035890729629049156044077239071381051585930796086670172427121883998797908792274921901699720888093776657273330010533678812202354218097512545405947522435258490771167055601360483958644670632441572215539753697817977846174064955149290862569321978468622482839722413756570560574902614079729686524145351004748216637048440319989000889524345065854122758866688116427171479924442928230863465674813919123162824586178664583591245665294765456828489128831426076900422421902267105562632111110937054421750694165896040807198403850962455444362981230987879927244284909188845801561660979191338754992005240636899125607176060588611646710940507754100225698315520005593572972571636269561882670428252483600823257530420752963450"

##找出最大的13位数
for(( i=0 ; i<=987 ; i++ ))
do
  n=${num:$i:13}
  ndata[$i]=$n
done

MAX=${ndata[1]}
for I in ${!ndata[*]}
do
  if [[ ${MAX} -lt ${ndata[${I}]} ]]
  then
    MAX=${ndata[${I}]}
  fi
done
echo $MAX

##找出乘积最大的13位数
for(( i=0 ; i<=987 ; i++ ))
do
  n=${num:$i:13}
  value=1
  for(( k=0 ; k<=12 ; k++))
  do
    a=${n:$k:1}
    value=$[$a*$value]
  done
  ndata[$i]=$value
done
MAX=${ndata[1]}
for I in ${!ndata[*]}
do
  if [[ ${MAX} -lt ${ndata[${I}]} ]]
  then
    MAX=${ndata[${I}]}
  fi
done
echo $MAX

shell中的排序可以通过awk来实现,但讲shell中的数组传入awk中操作较复杂不太建议。因此用找出最大值的方式。

Problem 9:Special Pythagorean triplet

A Pythagorean triplet is a set of three natural numbers, a < b < c, for which,
a2 + b2 = c2
For example, 32 + 42 = 9 + 16 = 25 = 52.
There exists exactly one Pythagorean triplet for which a + b + c = 1000.
Find the product abc.

R
for(a in 1:1000){
  b=(500000-1000*a)/(1000-a)
  if(b%%1==0 && b<1000 && b>0 && a<b){
    c=sqrt(a^2+b^2)
    print(a)
    print(b)
    print(c)
  }
}
#200,375,425
python
for a in range(1,1000):
  b=(500000-1000*a)/(1000-a)
  if b%1==0 and b>0 and b<1000 and a<b:
    c=(a**2+b**2)**0.5
    print(a)
    print(b)
    print(c)
shell
for(( a=1 ; a<1000 ; a++ ))
do
  b=$(echo "scale=10;$[500000-1000*$a]/$[1000-$a]" | bc)
  k=".00000"
  if [[ $b == *$k ]] && [ $(echo "$b > 0"|bc) = 1 ] && [ $(echo "$b < 1000"|bc) = 1 ] && [ $(echo "$b > $a"|bc) = 1 ]
  then
    c=$(echo "sqrt($a**2+$b**2)" | bc)
    echo $a
    echo $b
    echo $c
  fi
done

shell中的/是整除。如果要保留小数,即浮点数需要用到bc。以及判断一个数是否为整数时,也不能套用R和python里的方法。

Problem 10:Summation of primes

The sum of the primes below 10 is 2 + 3 + 5 + 7 = 17.
Find the sum of all the primes below two million.

R
sum=2
for(i in 3:2000000){
  m=0
  n=ceiling(sqrt(i))
  for(k in 1:n){
    if(i%%k == 0){
      m=m+1
    }
  }
  if(m==1){
    sum=sum+i
  }
}
print(sum)
python
import math
sum=2
for i in range(3,2000000):
  m=0
  for k in range(1,i):
    if i%k==0:
      m=m+1
  if m==1:
    sum=sum+i
print(sum)
shell
sum=2
for(( i=3 ; i<=2000000 ; i++))
do
  m=0
  for(( k=1 ; k<i ; k++ ))
  do
    if [ $[$i%$k] -eq 0 ]
    then
      m=$[$m+1]
    fi
  done
  if [ $m -eq 1 ]
  then
    sum=$[$sum+$i]
  fi
done
echo $sum

Rosalind

Problem 1:Counting DNA Nucleotides

A string is simply an ordered collection of symbols selected from some alphabet and formed into a word; the length of a string is the number of symbols that it contains.
An example of a length 21 DNA string (whose alphabet contains the symbols ‘A’, ‘C’, ‘G’, and ‘T’) is
“ATGCTTCAGAAAGGTCTTACG.”
Given: A DNA string s of length at most 1000 nt.
Return: Four integers (separated by spaces) counting the respective number
of times that the symbols ‘A’, ‘C’, ‘G’, and ‘T’ occur in s.
Sample Dataset
AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGC
Sample Output
20 12 17 21

R
string="AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGC"
list=unlist(strsplit(string,split=""))
len=length(list)
a=0
t=0
c=0
g=0
for(n in 1:len){
  if(list[n]=="A"){
    a=a+1
  }else if(list[n]=="T"){
    t=t+1
  }else if(list[n]=="C"){
    c=c+1
  }else if(list[n]=="G"){
    g=g+1
  }else{
    next
  }
}
sprintf("A:%s",a)
sprintf("T:%s",t)
sprintf("C:%s",c)
sprintf("G:%s",g)

sprintf,字符串和变量同时输出的方法。也可以用print+paste。

python
string="AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGC"
a="A"
t="T"
c="C"
g="G"
print ("A:",string.count(a))
print ("T:",string.count(t))
print ("C:",string.count(c))
print ("G:",string.count(g))
shell
string="AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGC"
LEN=${#string}
a=0       
t=0      
c=0       
g=0      
for(( n=0 ; n<LEN ; n++ ))
do
  if [[ "${string:${n}:1}" = "A" ]]
  then
    a=$[$a+1]
  elif [[ ${string:${n}:1} = "T" ]]
  then 
    t=$[$t+1] 
  elif [[ ${string:${n}:1} = "C" ]]
  then
    c=$[$c+1]
  else 
    g=$[$g+1]
  fi
done
echo "A:${a}"
echo "T:${t}"
echo "C:${c}"
echo "G:${g}"

Problem 2:Transcribing DNA into RNA

An RNA string is a string formed from the alphabet containing ‘A’, ‘C’, ‘G’, and ‘U’.
Given a DNA string t corresponding to a coding strand, its transcribed RNA string u is formed by replacing all occurrences of ‘T’ in t with ‘U’ in u.
Given: A DNA string t having length at most 1000 nt.
Return: The transcribed RNA string of t.
Sample Dataset
GATGGAACTTGACTACGTAAATT
Sample Output
GAUGGAACUUGACUACGUAAAUU

R
string="GATGGAACTTGACTACGTAAATT"
string2=gsub("T","U",string)
print(string2)
python
import re
str="GATGGAACTTGACTACGTAAATT"
table = ''.maketrans("T","U")
RNA=str.translate(table)
print(RNA)
shell
string="GATGGAACTTGACTACGTAAATT"
echo ${string//T/U}

Problem 3:Complementing a Strand of DNA

In DNA strings, symbols ‘A’ and ‘T’ are complements of each other, as are ‘C’ and ‘G’.
The reverse complement of a DNA string s is the string sc formed by reversing the symbols of s, then taking the complement of each symbol (e.g., the reverse complement of “GTCA” is “TGAC”).
Given: A DNA string s of length at most 1000 bp.
Return: The reverse complement sc of s.
Sample Dataset
AAAACCCGGT
Sample Output
ACCGGGTTTT

R
string="AAAACCCGGT"
DNA=unlist(strsplit(string,split=""))
len=length(DNA)
reverse=list()
while(len>0){
  if(DNA[len]=="A"){
    DNA[len]="T"
  }else if(DNA[len]=="T"){
    DNA[len]="A"
  }else if(DNA[len]=="C"){
    DNA[len]="G"
  }else if(DNA[len]=="G"){
    DNA[len]="C"
  }
  reverse=append(reverse,DNA[len])
  len=len-1
}
print(paste(unlist(reverse),collapse=""))
python
import re
string="AAAACCCGGT"
reverse=''.join(reversed(string))
table = ''.maketrans("ATCG","TAGC")
DNA=reverse.translate(table)
print(DNA)
shell
string="AAAACCCGGT"
LEN=${#string}
n=0
DNA=""
for((a=$[$LEN-1];a>=0;a--))
do
  if [[ ${string:${a}:1} = A ]]
  then
    rc[n]=T
    DNA=$DNA${rc[n]}
  elif [[ ${string:${a}:1} = T ]]
  then
    rc[n]=A
    DNA=$DNA${rc[n]}
  elif [[ ${string:${a}:1} = C ]]
  then
    rc[n]=G
    DNA=$DNA${rc[n]}
  elif [[ ${string:${a}:1} = G ]]
  then
    rc[n]=C
    DNA=$DNA${rc[n]}
  else
    continue
  fi
  n=$[$n+1]
done
echo $DNA

Problem 4:Rabbits and Recurrence Relations

A sequence is an ordered collection of objects (usually numbers), which are allowed to repeat. Sequences can be finite or infinite. Two examples are the finite sequence (π,−2–√,0,π) and the infinite sequence of odd numbers (1,3,5,7,9,…). We use the notation an to represent the n-th term of a sequence.
A recurrence relation is a way of defining the terms of a sequence with respect to the values of previous terms. In the case of Fibonacci’s rabbits from the introduction, any given month will contain the rabbits that were alive the previous month, plus any new offspring. A key observation is that the number of offspring in any month is equal to the number of rabbits that were alive two months prior. As a result, if Fn represents the number of rabbit pairs alive after the n-th month, then we obtain the Fibonacci sequence having terms Fn that are defined by the recurrence relation Fn=Fn−1+Fn−2 (with F1=F2=1 to initiate the sequence). Although the sequence bears Fibonacci’s name, it was known to Indian mathematicians over two millennia ago.
When finding the n-th term of a sequence defined by a recurrence relation, we can simply use the recurrence relation to generate terms for progressively larger values of n. This problem introduces us to the computational technique of dynamic programming, which successively builds up solutions by using the answers to smaller cases.
Given: Positive integers n≤40 and k≤5.
Return: The total number of rabbit pairs that will be present after n months, if we begin with 1 pair and in each generation, every pair of reproduction-age rabbits produces a litter of k rabbit pairs (instead of only 1 pair).
Sample Dataset
5 3
Sample Output
19

R
rabbit<-function(n,k){
  F[1]=1
  F[2]=1
  i=3
  while(i<=40 && k<=5){
    F[i]=F[i-1]+k*F[i-2]
    i=i+1
  }
  print (F[n])
}
rabbit(5,3)
python
def rabbit(n,k):
  F={}
  F[1]=1
  F[2]=1
  i=3
  while i<=40 and k<=5:
    F[i]=F[i-1]+k*F[i-2]
    i=i+1
  print (F[n])
rabbit(5,3)
shell
function rabbit(){
  F[1]=1
  F[2]=1
  i=3
  while [ $i -lt 40 ] && [ $k -lt 5 ]
  do
    F[i]=$[F[i-1]+$[$k*${F[$[$i-2]]}]]
    i=$[$i+1]
  done
  echo ${F[$n]}
}
n=5
k=3
rabbit $n $k

Problem 5:Computing GC Content

The GC-content of a DNA string is given by the percentage of symbols in the string that are ‘C’ or ‘G’.
For example, the GC-content of “AGCTATAG” is 37.5%.
Note that the reverse complement of any DNA string has the same GC-content.
DNA strings must be labeled when they are consolidated into a database.
A commonly used method of string labeling is called FASTA format.
In this format, the string is introduced by a line that begins with ‘>’, followed by some labeling information.
Subsequent lines contain the string itself; the first line to begin with ‘>’ indicates the label of the next string.
In Rosalind’s implementation, a string in FASTA format will be labeled by the
ID “Rosalind_xxxx”, where “xxxx” denotes a four-digit code between 0000 and 9999.
Given: At most 10 DNA strings in FASTA format (of length at most 1 kbp each).
Return: The ID of the string having the highest GC-content, followed by the GC-content of that string.
Rosalind allows for a default error of 0.001 in all decimal answers unless otherwise stated;
please see the note on absolute error below.
Sample Dataset
Rosalind_6404
CCTGCGGAAGATCGGCACTAGAATAGCCAGAACCGTTTCTCTGAGGCTTCCGGCCTTCCC
TCCCACTAATAATTCTGAGG
Rosalind_5959
CCATCGGTAGCGCATCCTTAGTCCAATTAAGTCCCTATCCAGGCGCTCCGCCGAAGGTCT
ATATCCATTTGTCAGCAGACACGC
Rosalind_0808
CCACCCTCGTGGTATGGCTAGGCATTCAGGAACCGGAGAACGCTTCAGACCAGCCCGGAC
TGGGAACCTGCGGGCAGTAGGTGGAAT
Sample Output
Rosalind_0808
60.919540

R
sample <- file("sample.txt", "r")
line=readLines(sample,n=1)
rosaland=data.frame(name=0,GC=0)
n=1

while( length(line) != 0 ) {
  a=0
  t=0
  c=0
  g=0
  line2=unlist(strsplit(line,split=""))
  if(line2[1]==">"){
    id=line2[-1]
    rosaland[n,1]=paste(id,collapse="")
  }else{
    len=length(line2)
    for(k in 1:len){
      if(line2[k]=="A"){
        a=a+1
      }else if(line2[k]=="T"){
        t=t+1
      }else if(line2[k]=="C"){
        c=c+1
      }else{
        g=g+1
      }
    }
    rosaland[n,2]=(g+c)/(a+t+g+c)
    n=n+1
  }
  line=readLines(sample,n=1)
}

print(rosaland[which.max(rosaland$GC),])

R中逐行读取的方法:
file <- file(“file.txt”, “r”)
line=readLines(file,n=1)
while( length(line) != 0 ) {
line=readLines(file,n=1)
}

python
import sys
args=sys.argv
count={}
bases=['A','T','C','G']
fp=open("sample.txt","r")
for line in fp:
  if line.startswith('>'):
    line=line.rstrip()
    id=line[1:]
    count[id]={}
    for base in bases:
      count[id][base]=0
  else:
    for base in bases:
      count[id][base]+=line.count(base)

list_GC={}
for id,atcg_count in count.items():
  GC=atcg_count['G']+atcg_count['C']
  sum=atcg_count['G']+atcg_count['C']+atcg_count['A']+atcg_count['T']
  GCp=GC*1.0/sum
  list_GC[id]=GCp

max=max(zip(list_GC.values(),list_GC.keys()))
print(max)
shell
function GC(){
  n=0
  while read LINE
  do
    LEN=${#LINE}
    a=0
    t=0
    g=0
    c=0
    if [[ "${LINE:0:1}" = ">" ]]
    then
      name=${LINE:1}
    else
      for(( k=0; k<LEN ; k++ ))
      do
        if [[ ${LINE:${k}:1} = A ]]
        then
          a=$[$a+1]
        elif [[ ${LINE:${k}:1} = T ]]
        then
          t=$[$t+1]
        elif [[ ${LINE:${k}:1} = C ]]
        then
          c=$[$c+1]
        else
          g=$[$g+1]
        fi
      done
      GC=$[$g+$c]
      sum=$[$a+$t+$g+$c]
      GCp=`echo "scale=2; $GC/$sum" | bc`
      GClist[n]=$GCp
      namelist[n]=$name
      n=$[$n+1]
    fi
  done < sample.txt
  MAX=${GClist[0]}
  for I in ${!GClist[*]}
  do
    if [ $(echo "${MAX} < ${GClist[${I}]}"|bc) -eq 1 ]
    then
      MAX=${GClist[${I}]}
      NAME=${namelist[${I}]}
    fi
  done
  echo $NAME
  echo $MAX
}

GC

注意:1.shell中逐行读取的方法。
2.浮点数的运算和比较。

Problem 6:Counting Point Mutations

Given two strings and of equal length, the Hamming distance between and , denoted , is the number of corresponding symbols that differ in and . See Figure 2.
Given: Two DNA strings and of equal length (not exceeding 1 kbp).
Return: The Hamming distance .
Sample Dataset
GAGCCTACTAACGGGAT
CATCGTAATGACGGCCT
Sample Output
7

R
A="GAGCCTACTAACGGGAT"
B="CATCGTAATGACGGCCT"
len=nchar(A)
HD=0
for(i in 1:len){
  a=substring(A,i,i)
  b=substring(B,i,i)
  if(a!=b){
    HD=HD+1
  }
}
print(HD)         
python
A="GAGCCTACTAACGGGAT"
B="CATCGTAATGACGGCCT"
len=len(A)
HD=0
for i in range(0,len):
  if A[i]!=B[i]:
    HD=HD+1
print(HD)
shell
A="GAGCCTACTAACGGGAT"
B="CATCGTAATGACGGCCT"
len=${#A}
HD=0
for(( i=0 ; i<len ; i++))
do
  if [[ "${A:${i}:1}" != "${B:${i}:1}" ]]
  then
    HD=$[$HD+1]
  fi
done
echo $HD

shell中的字符串比较[ “ A &quot; = / ! = &quot; A&quot; =/!= &quot; A"=/!="B” ]。

Problem 7:Mendel’s First Law

Probability is the mathematical study of randomly occurring phenomena. We will model such a phenomenon with a random variable, which is simply a variable that can take a number of different distinct outcomes depending on the result of an underlying random process.
For example, say that we have a bag containing 3 red balls and 2 blue balls. If we let represent the random variable corresponding to the color of a drawn ball, then the probability of each of the two outcomes is given by and .
Random variables can be combined to yield new random variables. Returning to the ball example, let model the color of a second ball drawn from the bag (without replacing the first ball). The probability of being red depends on whether the first ball was red or blue. To represent all outcomes of and , we therefore use a probability tree diagram. This branching diagram represents all possible individual probabilities for and , with outcomes at the endpoints (“leaves”) of the tree. The probability of any outcome is given by the product of probabilities along the path from the beginning of the tree; see Figure 2 for an illustrative example.
An event is simply a collection of outcomes. Because outcomes are distinct, the probability of an event can be written as the sum of the probabilities of its constituent outcomes. For our colored ball example, let be the event " is blue." is equal to the sum of the probabilities of two different outcomes: , or (see Figure 2 above).
Given: Three positive integers , , and , representing a population containing organisms: individuals are homozygous dominant for a factor, are heterozygous, and are homozygous recessive.
Return: The probability that two randomly selected mating organisms will produce an individual possessing a dominant allele (and thus displaying the dominant phenotype). Assume that any two organisms can mate.
Sample Dataset
2 2 2
Sample Output
0.78333

R
k=2
m=2
n=2

name=c("AAAA","AAAa","AAaa","AaAA","AaAa","Aaaa","aaAA","aaAa","aaaa")
value=c(1,1,1,1,0.75,0.5,1,0.5,0)
x<-data.frame(name,value)

event=c(rep("AA",k),rep("Aa",m),rep("aa",n))
len=length(event)
probaility=0

for(i in 1:len){
  event2=event[-i]
  len2=len-1
  for(m in 1:len2){
    n=paste(event[i],event2[m],sep="", collapse = NULL)
    a=which(x$name==n)
    probaility=probaility+(x[a,2]/(len*len2))
  }
}
print(probaility)
python
k=2
m=2
n=2

x=[('AAAA',1),('AAAa',1),('AAaa',1),('AaAA',1),('AaAa',0.75),('Aaaa',0.5),('aaAA',1),('aaAa',0.5),('aaaa',0)]
x=dict(x)

event=['AA']*k+['Aa']*m+['aa']*n
len=len(event)
probaility=0

for i in range(len):
  event1=['AA']*k+['Aa']*m+['aa']*n
  event1.pop(i)
  len2=len-1
  for o in range(len2):
    var=[event[i],event1[o]]
    p=''.join(var)
    for name,value in x.items():
      if p==name:
        probaility=probaility+(value/(len*len2))
print(probaility)               
shell
k=2
m=2
n=2

name=('AAAA' 'AAAa' 'AAaa' 'AaAA' 'AaAa' 'Aaaa' 'aaAA' 'aaAa' 'aaaa')
value[0]=1
velue[1]=1
value[2]=1
value[3]=1
value[4]=$(echo "0.75" | bc)
value[5]=$(echo "0.5" | bc)
value[6]=1
value[7]=$(echo "0.5" | bc)
value[8]=0

event=('AA' 'AA' 'Aa' 'Aa' 'aa' 'aa')
len=${#event[@]}
          
probaility=0
          
for(( i=0 ; i<len ; i++ ))
do        
  event2=('AA' 'AA' 'Aa' 'Aa' 'aa' 'aa')
  unset event2[i]
  len2=${#event2[*]}
  for(( o=0 ; o<5 ; o++ ))
  do
    var=${event[@]:$i:1}${event2[@]:$o:1}
    for(( p=0 ; p<9 ; p++ ))
    do
      if [[ "$var" = "${name[@]:$p:1}" ]]
      then
        probaility=`echo "scale=10; $probaility+(${value[@]:$p:1}/($len*$len2))" | bc` 
      fi
  done 
  done
done 
echo $probaility

判断字符串是否相等加引号并用=。并且数组中元素的提取最好用 KaTeX parse error: Expected '}', got 'EOF' at end of input: {name[@]:p:1},而非${name[p]}。
写了这么多shell真的只适合linux系统的基础操作,在写程序方面R和python都会比它方便很多。

Problem 8:Translating RNA into Protein

The 20 commonly occurring amino acids are abbreviated by using 20 letters from the English alphabet (all letters except for B, J, O, U, X, and Z). Protein strings are constructed from these 20 symbols. Henceforth, the term genetic string will incorporate protein strings along with DNA strings and RNA strings.
The RNA codon table dictates the details regarding the encoding of specific codons into the amino acid alphabet.
Given: An RNA string corresponding to a strand of mRNA (of length at most 10 kbp).
Return: The protein string encoded by .
Sample Dataset
AUGGCCAUGGCGCCCAGAACUGAGAUCAAUAGUACCCGUAUUAACGGGUGA
Sample Output
MAMAPRTEINSTRING

R
string="AUGGCCAUGGCGCCCAGAACUGAGAUCAAUAGUACCCGUAUUAACGGGUGA"
len0=nchar(string)
len=floor(len0/3)-1

mRNA=c("UUU","UUC","UUA","UUG","CUU","CUC","CUA","CUG","AUU","AUC","AUA","AUG","GUU","GUC","GUA","GUG","UCU","UCC","UCA","UCG","CCU","CCC","CCA","CCG","ACU","ACC","ACA","ACG","GCU","GCC","GCA","GCG","UAU","UAC","CAU","CAC","CAA","CAG","AAU","AAC","AAA","AAG","GAU","GAC","GAA","GAG","UGU","UGC","UGG","CGU","CGC","CGA","CGG","AGU","AGC","AGA","AGG","GGU","GGC","GGA","GGG")
AA=c("F","F","L","L","L","L","L","L","I","I","I","M","V","V","V","V","S","S","S","S","P","P","P","P","T","T","T","T","A","A","A","A","Y","Y","H","H","Q","Q","N","N","K","K","D","D","E","E","C","C","W","R","R","R","R","S","S","R","R","G","G","G","G")
x<-data.frame(mRNA,AA)
k=0

protein=c()
seq=""

for(i in 0:len){
  label=1+3*i
  label2=3+3*i
  string2=substring(string,label,label2)
  for(n in 1:61){
    if(string2==mRNA[n]){
      k=k+1
      protein[k]=AA[n]
      seq=paste(seq,protein[k],sep="",collapse=NULL)
    }else{
      next
    }
  }
}
print(seq)
python
import math

str="AUGGCCAUGGCGCCCAGAACUGAGAUCAAUAGUACCCGUAUUAACGGGUGA"

intab=["UUU","UUC","UUA","UUG","CUU","CUC","CUA","CUG","AUU","AUC","AUA","AUG","GUU","GUC","GUA","GUG","UCU","UCC","UCA","UCG","CCU","CCC","CCA","CCG","ACU","ACC","ACA","ACG","GCU","GCC","GCA","GCG","UAU","UAC","CAU","CAC","CAA","CAG","AAU","AAC","AAA","AAG","GAU","GAC","GAA","GAG","UGU","UGC","UGG","CGU","CGC","CGA","CGG","AGU","AGC","AGA","AGG","GGU","GGC","GGA","GGG"]

outtab=["F","F","L","L","L","L","L","L","I","I","I","M","V","V","V","V","S","S","S","S","P","P","P","P","T","T","T","T","A","A","A","A","Y","Y","H","H","Q","Q","N","N","K","K","D","D","E","E","C","C","W","R","R","R","R","S","S","R","R","G","G","G","G"]

len0=len(str)
len=math.floor(len0/3)
protein=[]
k=0

for i in range(0,len):
  label=3*i
  label2=3*i+3
  str2=str[label:label2]
  for n in range(0,61):
    if str2==intab[n]:
      protein.append(outtab[n])
      k=k+1
seq=''.join(protein)
print(seq)

创建元组也行。maketrans不能用于列表。

shell
str="AUGGCCAUGGCGCCCAGAACUGAGAUCAAUAGUACCCGUAUUAACGGGUGA"

intab=("UUU" "UUC" "UUA" "UUG" "CUU" "CUC" "CUA" "CUG" "AUU" "AUC" "AUA" "AUG" "GUU" "GUC" "GUA" "GUG" "UCU" "UCC" "UCA" "UCG" "CCU" "CCC" "CCA" "CCG" "ACU" "ACC" "ACA" "ACG" "GCU" "GCC" "GCA" "GCG" "UAU" "UAC" "CAU" "CAC" "CAA" "CAG" "AAU" "AAC" "AAA" "AAG" "GAU" "GAC" "GAA" "GAG" "UGU" "UGC" "UGG" "CGU" "CGC" "CGA" "CGG" "AGU" "AGC" "AGA" "AGG" "GGU" "GGC" "GGA" "GGG")

outtab=("F" "F" "L" "L" "L" "L" "L" "L" "I" "I" "I" "M" "V" "V" "V" "V" "S" "S" "S" "S" "P" "P" "P" "P" "T" "T" "T" "T" "A" "A" "A" "A" "Y" "Y" "H" "H" "Q" "Q" "N" "N" "K" "K" "D" "D" "E" "E" "C" "C" "W" "R" "R" "R" "R" "S" "S" "R" "R" "G" "G" "G" "G")

len0=${#str}
len=$[$len0/3]
k=0
seq=""

for(( i=0 ; i<len ; i++ ))
do
  label=$[3*$i]
  str2=${str:$label:3}
  for(( n=0 ; n<61 ; n++ ))
  do
    if [[ $str2 == ${intab[${n}]} ]]
    then
      protein[k]=${outtab[${n}]}
      k=$[$k+1]
      seq=$seq${outtab[${n}]}
    fi
  done
done

echo $seq
Problem 9:Finding a Motif in DNA

Given two strings and , is a substring of if is contained as a contiguous collection of symbols in (as a result, must be no longer than ).
The position of a symbol in a string is the total number of symbols found to its left, including itself (e.g., the positions of all occurrences of ‘U’ in “AUGCUUCAGAAAGGUCUUACG” are 2, 5, 6, 15, 17, and 18). The symbol at position of is denoted by .
A substring of can be represented as , where and represent the starting and ending positions of the substring in ; for example, if = “AUGCUUCAGAAAGGUCUUACG”, then = “UGCU”.
The location of a substring is its beginning position ; note that will have multiple locations in if it occurs more than once as a substring of (see the Sample below).
Given: Two DNA strings and (each of length at most 1 kbp).
Return: All locations of as a substring of .
Sample Dataset
GATATATGCATATACTT
ATAT
Sample Output
2 4 10

R
string="GATATATGCATATACTT"
sub="ATAT"

len=nchar(string)-3
position=c()

for(i in 1:len){
  str2=substring(string,i,i+3)
  if(str2==sub){
    position=c(position,i)
  }
}
print(position)
python
string="GATATATGCATATACTT"
sub="ATAT"

len=len(string)-3
position=[]

for i in range(0,len):
  str2=string[i:i+4]
  if str2==sub:
    position.append(i+1)
print(position)
shell
string="GATATATGCATATACTT"
sub="ATAT"

len=$[${#string}-3]
k=0

for(( i=0 ; i<len ; i++))
do
  str2=${string:$i:4}
  if [[ $str2 == $sub ]]
  then
    position[k]=$[$i+1]
    k=$[$k+1]
  fi
done

echo ${position[*]}
Problem 10:Consensus and Profile

A matrix is a rectangular table of values divided into rows and columns. An matrix has rows and columns. Given a matrix , we write to indicate the value found at the intersection of row and column .

Say that we have a collection of DNA strings, all having the same length . Their profile matrix is a matrix in which represents the number of times that ‘A’ occurs in the th position of one of the strings, represents the number of times that C occurs in the th position, and so on (see below).

A consensus string is a string of length formed from our collection by taking the most common symbol at each position; the th symbol of therefore corresponds to the symbol having the maximum value in the -th column of the profile matrix. Of course, there may be more than one most common symbol, leading to multiple possible consensus strings.
A T C C A G C T
G G G C A A C T
A T G G A T C T
DNA Strings A A G C A A C C
T T G G A A C T
A T G C C A T T
A T G G C A C T
A 5 1 0 0 5 5 0 0
Profile C 0 0 1 4 2 0 6 1
G 1 1 6 3 0 1 0 0
T 1 5 0 0 0 1 1 6
Consensus A T G C A A C T
Given: A collection of at most 10 DNA strings of equal length (at most 1 kbp) in FASTA format.
Return: A consensus string and profile matrix for the collection. (If several possible consensus strings exist, then you may return any one of them.)
Sample Dataset
Rosalind_1
ATCCAGCT
Rosalind_2
GGGCAACT
Rosalind_3
ATGGATCT
Rosalind_4
AAGCAACC
Rosalind_5
TTGGAACT
Rosalind_6
ATGCCATT
Rosalind_7
ATGGCACT
Sample Output
ATGCAACT
A: 5 1 0 0 5 5 0 0
C: 0 0 1 4 2 0 6 1
G: 1 1 6 3 0 1 0 0
T: 1 5 0 0 0 1 1 6

R
str1="ATCCAGCT"
str2="GGGCAACT"
str3="ATGGATCT"
str4="AAGCAACC"
str5="TTGGAACT"
str6="ATGCCATT"
str7="ATGGCACT"

len=nchar(str1)
matrix=c("A","T","C","G")
seq=c()
finseq=""

for(i in 1:len){
  col=c(substring(str1,i,i),substring(str2,i,i),substring(str3,i,i),substring(str4,i,i),substring(str5,i,i),substring(str6,i,i))
  d=factor(col,levels=c("A","T","C","G"))
  sta=as.data.frame(table(d))
  matrix=cbind(matrix,sta$Freq)
  seq[i]=names(table(d))[which.max(sta$Freq)]
  finseq=paste(finseq,seq[i],sep="",collapse=NULL)
}

print(finseq)
print(matrix)
python
from collections import Counter
from numpy import array as matrix,arange
str1="ATCCAGCT"
str2="GGGCAACT"
str3="ATGGATCT"
str4="AAGCAACC"
str5="TTGGAACT"
str6="ATGCCATT"
str7="ATGGCACT"

len=len(str1)
seq=[]
bases=['A','T','C','G']
matrix=arange(4*8).reshape(4,8)

for i in range(0,len):
  col=[str1[i:i+1],str2[i:i+1],str3[i:i+1],str4[i:i+1],str5[i:i+1],str6[i:i+1]]
  max=Counter(col).most_common(1)
  seq.append((max[0])[0])
  dic=dict(Counter(col))
  row=0
  for base in bases:
    if base not in dic.keys():
      dic[base]=0
  dicsort=sorted(dic.items())
  for base in bases:
    matrix[row,i]=(dicsort[row])[1]
    row=row+1

finseq="".join(seq)
print(finseq)
print(matrix)

主要掌握字典。Counter模块。sorted函数。和矩阵相关内容。

shell
str1="ATCCAGCT"
str2="GGGCAACT"
str3="ATGGATCT"
str4="AAGCAACC"
str5="TTGGAACT"
str6="ATGCCATT"
str7="ATGGCACT"

len=${#str1}

declare -A seq

finseq=()

for(( i=0 ; i<len ; i++ ))
do
  col=(${str1:$i:1} ${str2:$i:1} ${str3:$i:1} ${str4:$i:1} ${str5:$i:1} ${str6:$i:1})
  len2=${#col[*]}
  freq1=0
  freq2=0
  freq3=0
  freq4=0
  for(( n=0 ; n<len2 ; n++ ))
  do
    if [[ "${col[n]}" = "A" ]]
    then
      freq1=$[$freq1+1]
    elif [[ "${col[n]}" = "T" ]]
    then
      freq2=$[$freq2+1]
    elif [[ "${col[n]}" = "C" ]]
    then
      freq3=$[$freq3+1]
    else
      freq4=$[$freq4+1]
    fi
  done
  seq=(["A"]="$freq1" ["T"]="$freq2" ["C"]="$freq3" ["G"]="$freq4")

  max=${seq["A"]}
  key="A"
  for key in ${!seq[*]}
  do
    if [ ${seq["$key"]} -ge $max ]
    then
      max=${seq["$key"]}
      k="$key"
    fi
  done
  finseq[$i]=$k
done

finseq2=""
for(( m=0 ; m<${#finseq[*]} ; m++ ))
do
  finseq2=$finseq2${finseq[m]}
done
echo $finseq2
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值