有两个都可以在新浪爱问资料
Bioinformatics.For.Dummies.2nd.Ed.2007.pdf
An Introduction
to Bioinformatics Algorithms.pdf
另外看到 Virginia 大学的一些课程
The 2012 Computational Genomics Course has been rescheduled to
November 28 - December 4, 2012
用mothur从*.sff进行数据处理的流程
#create flowgram file from sff from 454 machine
#sffinfo(sff=test.sff,flow=T)
#trim and bin sequences in flowgram
#trim.flows(flow=test.flow,
oligos=test.oligos,bdiffs=0,pdiffs=1,processors=2)
#denoise to remove sequencing errors, create fasta and qual
files
#shhh.flows(file=test.flow.files, processors=2)
#bin by barcode, trim fasta files, remove low quality
sequence
#trim.seqs(fasta=test.shhh.fasta, name=test.shhh.names,
oligos=test.oligos,flip=T,minlength=200,maxlength=500,maxambig=0,maxhomop=8,bdiffs=0,pdiffs=1,processors=2)
#remove redundant sequences
#unique.seqs(fasta=test.shhh.trim.fasta,
name=test.shhh.trim.names)
#align sequences to template 16S rRNA gene
#align.seqs(fasta=test.shhh.trim.unique.fasta,
reference=silva.bacteria/silva.bacteria.fasta, processors=2)
#summarize results so far
#summary.seqs(fasta=test.shhh.trim.unique.align,
name=test.shhh.trim.names)
#determine clear-span region
#screen.seqs(fasta=test.shhh.trim.unique.align,
name=test.shhh.trim.names, group=test.shhh.groups, end=6333,
optimize=start, criteria=85, processors=2)
#remove sequences and cut alignment to clear span region
#filter.seqs(fasta=test.shhh.trim.unique.good.align, vertical=T,
trump=., processors=2)
#remove redundant sequences
#unique.seqs(fasta=test.shhh.trim.unique.good.filter.fasta,
name=test.shhh.trim.good.names)
#pre cluster sequences
#pre.cluster(fasta=test.shhh.trim.unique.good.filter.unique.fasta,
name=test.shhh.trim.unique.good.filter.names,
group=test.shhh.good.groups, diffs=2)
#run ClimeraSlayer to identify potential chimeras
#chimera.uchime(fasta=test.shhh.trim.unique.good.filter.unique.precluster.fasta,
name=test.shhh.trim.unique.good.filter.unique.precluster.names,
group=test.shhh.good.groups, processors=2)
#remove chimeras identified by ChimeraSlayer
#remove.seqs(accnos=test.shhh.trim.unique.good.filter.unique.precluster.uchime.accnos,
fasta=test.shhh.trim.unique.good.filter.unique.precluster.fasta,
name=test.shhh.trim.unique.good.filter.unique.precluster.names,
group=test.shhh.good.groups)
CSHL Computational Genomics - November, 2011 -- Metagenomics
I
This workshop will run from the Unix/MacOS command line.
Students on Mac's can login to their workstations, and start the
Terminal application (in
Applications/Utilitles.
Students on PC's need to login to courses.cshl.edu.
Once in a terminal window, run the script
/ecg/seqprg/scripts/init_meta1.sh. Type:
/ecg/seqprg/scripts/init_meta1.sh
If things have worked properly, you should have the directory
meta1_work in your home directory, and it should contain
several files.
cd meta1_work
ls
To do the analysis, we will run the mothur program to
analyze microbial communities. We are using an example from
http://www.mothur.org/wiki/Schloss_SOP.
To run the mothur program, type:
/ecg/seqprg/bin/mothur
All commands in mothur look like functions, and need to
end with (), for example: help() or
quit().
In this workshop, we will be doing microbial community analysis
on a sample set of 16S rRNA sequences from a human stool sample.
The mothur program will allow us to address the following
questions:
What is the taxonomic makeup of the sample.
How diverse is the community (what is the dynamic range of
abundance); species "evenness", species "richness".
How to compare two microbial community samples.
In real analyses, the sequences must be pre-processed to remove
bar codes, primers, and non-rRNA sequences. This pre-processing has
already been done for the sequences labeled "final.*". The
preprocessing steps are listed here
The first step classifies the sequences taxonomically, and bins
them into clades. This has already been done, producing the files
final.names,final.taxonomy,final.group,final.fasta,
etc. These are linked into your meta1_work directory. In
addition, a time-consuming step produces final.dist.
To characterize the taxonomic makeup, we first cluster sequences
into OTUs (taxa, clades), use make.shared to count
sequence abundance within those OTUs.
The lines below show commands that can be copied and pasted into
the command line of mothur. Lines beginning with
# are comments to explain the steps; you only need to copy
lines that do not begin with #.
#Cluster sequences into OTU -- we are using the command cluster.split to do that as it allows us to cluster
# sequences according a taxonomic level like order or family
cluster.split(fasta=final.fasta, taxonomy=final.taxonomy, name=final.names, taxlevel=3, processors=4)
# The make.shared creates a file that represent the number of times
# that an OTU is observed in multiple samples
make.shared(list=final.an.list, group=final.groups, label=0.03)
# Since some samples might have better coverage, sub-sample to get a dataset
# with the same number of sequences per sample
sub.sample(shared=final.an.shared,size=400)
# assign a consensus taxonomy to each OTU
classify.otu(list=final.an.list, name=final.names, taxonomy=final.taxonomy, label=0.03, cutoff=80)
# Calculate the relative abundance of taxa
phylotype(taxonomy=final.taxonomy, name=final.names, label=1)
# The make.shared creates a file that represent the number of times that an OTU is observed in multiple samples
make.shared(list=final.tx.list, group=final.groups, label=1)
# Since some samples might have better coverage, subsample to get a dataset
# with the same number of sequences per sample
sub.sample(shared=final.tx.shared, size=400)
# assign a consensus taxonomy to each OTU
classify.otu(list=final.tx.list, name=final.names, taxonomy=final.taxonomy, label=1)
Once the sequences have been classified, we can build a tree
that represents the population:
#Here we want to build a phylogenetic tree
sub.sample(fasta=final.fasta, name=final.names, group=final.groups, persample=T, size=400)
dist.seqs(fasta=final.subsample.unique.fasta, output=lt, processors=2)
#Build Tree
clearcut(phylip=final.subsample.unique.phylip.dist)
You can visualize the tree with TreeView (in
/ecg/Applications/TreeView).
To quantify population diversity, collector's curves are
produced:
#Generates collector's curves
collect.single(shared=final.an.0.03.subsample.shared, calc=chao-invsimpson, freq=100)
rarefaction.single(freq=100)
summary.single(calc=nseqs-coverage-sobs-invsimpson)
To compare two populations, one can use a heat map or venn
diagram:
#Generate heatmaps and a venn diagram to compare samples
heatmap.bin(scale=log2, numotu=50)
heatmap.sim(calc=jclass-thetayc)
venn(groups=26-23-28-44)
#PCOA Analysis to Compare samples
dist.shared(shared=final.an.0.03.subsample.shared, calc=thetayc-jclass)
pcoa(phylip=final.an.0.03.subsample.thetayc.0.03.lt.dist)
pcoa(phylip=final.an.0.03.subsample.jclass.0.03.lt.dist)
#Non-metric multidimensional scaling
nmds(phylip=final.an.0.03.subsample.thetayc.0.03.lt.dist)
nmds(phylip=final.an.0.03.subsample.jclass.0.03.lt.dist)
These programs create .svg files, which you should be
able to visualize with a web browser.