ratt lifeover

The purpose of this project is to annotate the X. couchianus and X. hellerii genome. It mainly uses the tool called "RATT(http://ratt.sourceforge.net/)" to lift the annotation from X. maculatus transcriptome and aligned the genome and lift the gene model to two other species.

I. Install RATT on Xiphos

A. Install pre-requisite software "MUMMER" and NCBI Blast for RATT

1. Download the Mummer 3.23 and install it. Add the path /opt/research/Software/MUMmer3.23/ to the PATH environment

MUMMER 64bit needed to be installed using make CPPFLAGS="-O3 -DSIXTYFOURBITS"


2. Download and install NCBI BLAST Blast+(optional)

rpm -ql ncbi-blast-2.2.26+-3.x86_64 to list the command


B. Download the RATT

1. makedir /opt/research/Software/ratt


svn co https://ratt.svn.sourceforge.net/svnroot/ratt ratt 


2. Setup the variable

RATT_HOME

RATT_CONFIG


3. Download the example file and run the example

Tb_H37Rv.embl as the annoated file and fasta file for target transcriptom start.ratt.sh embl F11.fasta F11 Strain


II. Use the X. maculatus transcriptome to liftover X. couchianus transcriptome

1. Download embl file from Ensembl for X. maculatus

EMBL file was download from ftp://ftp.ensembl.org/pub/release-70/embl/xiphophorus_maculatus/ and extracted in /home/ys14/research/storage1/reference_assemblies/Ensembl/release70


2. Clean and split the embl file

emblCorrecterPreRATT.pl to convert line, so RATT won't choke on new version of embl file


complement(join(1..10),join(20..30))

into

complement(join(1..10,20..30))



use emblSplitter.pl to split each sequence into its own embl file otherwise fasta file created from embl file will only have one name.




2. Get the v4 genome from X. couchianus

/opt/research/storage1/couchianus_genome/v4.0.1/supercontigs.fasta


3a. Run the code using species

There are several options for RATT and mostly related to the distance between species. The "Species" option is for sequences with 40-90% similarity and "Strain" option for 90-99% similarity. Try both to see which give the best option.

./start.ratt.sh embl supercontigs.fasta XcouFromXM70 Species

./start.ratt.sh embl supercontigs.fasta XcouFromXM70strain Strain


Species option seems to more sensitive than Strain option although it is slightly slower. Use Species for all downstream analyses.  



3b. Run the parallele version of RATT start.rattForkC1.sh (optional)

It allow RATT to use multiple CPUs(change the number of CPU in the variable $MAX_PROCESSES).

Cautions: This program might use a lot of RAM. A safe bet of RAM usage is 10 X genome size per CPU. So for a genome with 1Gb, each CPU will need 10G of RAM.


4. Process the RATT output

cat *.final.embl >someName.final.embl
emblCorrecterPostRATT.pl someName.final.embl targetSpeciesGenome.fa >someName.final.clean.embl
embl2GFF3.pl someName.final.clean.embl >someName.gff


Cautions:

The gff output is a pseudo GFF3 file because exon do not have a parent tag. Only works for some GFF parser tool.

Also need to clean the reference name in the first column, remove the runID. and .final before and after the sequenceID (use sed command) and save the new file as someName.clean.gff


cat someName.clean.gff |perl gffGetmRNA.pl --genome=targetSpeciesGenome.fa --mrna=targeOutputTranscriptome.fa




III Lift X. hellerii genome annotation

A. Repeat above steps using X. hellerii genome v3 and the result under /home/ys14/research/Software/ratt/Xhel




IV. Identify the overlapping regions of transcriptome for differential gene expression analyses in the hybrids.


A. Adding genes missing in X. couchianus lifted over transcriptome


1. For 153 genes  not transfered by RATT and 30 X.maculatus manual annotated genes. Extract the X. maculatus sequences.


2. Perform a reciprocal blast hits with de novo X. couchianus transcirptome( ~/Yingjia/mafft/mp_Xc_5stage_combinedSkin_11202012.fa) and extract the X. couchianus genes from de novo X. cou transcriptome and add them to lifted over transcirptome.

The new file is /home/ys14/research/storage1/reference_assemblies/Ensembl/release70/XcouRATTLifed/XcouV4genomeFromXmacEn70DenovSuppl.fa


B. Run mafft to get aligned sequence between X. mac and X. cou

1. Change the ID of the sequences so X. macualtus and X. couchianus trasncriptome are exactly the same between two species. (updated, change the name slightly before run mafft so mafft_parallel.pl will not confused by the same name).

For X.cou

/home/ys14/research/storage1/reference_assemblies/Ensembl/release70/XMvsXC/XcouV4GenomeFromXmac70DenovoSuppl.fa

For X. mac

XmacEB70en.fa



2. Run the mafft command

perl mafftPaired.pl XcouList.txt XmacEB70en.fa XcouV4GenomeFromXmac70DenovoSuppl.fa




C. Run mafft to get aligned sequence in X. maculatus, X. couchianus and X. hellerii

perl mafft_parallel.pl threeListWithoutTitanModID.txt XmacEB70en.fa XcouV4GenomeFromXmac70DenovoSupplModID.fa XhelV3FromXmacEB70DenovsuppleModID.fa -n=50






评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值