mysql和ORTHOMCL_OrthoMCL 安装配置与使用

最新推荐文章于 2022-01-16 23:03:40 发布

weixin_39723678

最新推荐文章于 2022-01-16 23:03:40 发布

阅读量453

点赞数

文章标签： mysql和ORTHOMCL

本文链接：https://blog.csdn.net/weixin_39723678/article/details/113692727

版权

本文详细介绍了如何安装配置mysql以及为mysql安装perl依赖包，特别是针对非root用户的自定义安装方法。之后，文章讲解了MCL软件的安装，并提供了OrthoMCL软件的安装配置和使用步骤，包括数据库表的安装、输入文件调整、过滤、BLAST比对、数据解析、数据加载、对齐计算、数据导出和MCL聚类等关键操作。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

1. mysql安装配置(点我)

2. 创建数据库和数据库用户

2.1 以root身份登陆mysql server

./bin/mysql --defaults-file=mysql.cnf -u root -p

2.2 创建数据库和用户

CREATE DATABASE orthomcl; #建数据库

GRANT SELECT,INSERT,UPDATE,DELETE,CREATE VIEW,CREATE, INDEX, DROP on orthomcl.* TO orthomcl@localhost; #授权

set password for orthomcl@localhost = password('123456');

3. 为mysql安装perl依赖包

3.1 检查依赖包是否安装(不报错说明已经安装，反之报错)

$ perl -MDBI -e 1 (我的不报错说明已安装)

$ perl -MDBD::mysql -e 1 (报错，需要安装，见步骤3.2)

3.2 以root身份安装，没有root见3.2s

perl -MCPAN -e shell

cpan> o conf makepl_arg "mysql_config=/share/soft/mysql/bin/mysql_config"

cpan> install Data::Dumper

cpan> install DBI

cpan> force install DBD::mysql

若遇到：“ install_driver(mysql) failed: Can’t load ‘/usr/local/lib64/perl5/auto/DBD/mysql/mysql.so’ for module DBD::mysql: libmysqlclient.so.18: ” 类似的错误，把libmysqlclient.so.18所在目录的路径名加到“/etc/ld.so.conf”文件中,接着运行ldconfig即可(详情点我)

方法一：定义用户自定义的CPAN，通过CPAN安装DBI::mysql模块(orthomcl软件中mysqlInstallGuide.txt提供的安装方法)

Installing modules as a standard user

– Follow the steps below to install modules as a non-root user. We

assume /myperl in your home directory is your custom perl directory.

1. In your home directory, create the PERL and CPAN directories, and

a blank CPAN config module:

\$ mkdir myperl

\$ mkdir .cpan

\$ mkdir .cpan/CPAN

\$ echo “\$CPAN::Config = {}”> ~/.cpan/CPAN/MyConfig.pm

2. Configure your environment by adding the following to your

.bash_profile file in your home directory:

######################################

if [ -z “\$PERL5LIB” ]

then

# If PERL5LIB wasn’t previously defined, set it…

PERL5LIB=~/myperl/lib

else

# …otherwise, extend it.

PERL5LIB=$PERL5LIB:~/myperl/lib

MANPATH=$MANPATH:~/myperl/man

export PERL5LIB MANPATH

######################################

3. Create the necessary directories and process your .bash_profile:

\$ mkdir -p ~/myperl/lib

\$ mkdir -p ~/myperl/man/man{1,3}

\$ source ~/.bash_profile

4. Confirm that your custom per5lib paths have been set:

$ perl -wle’print for grep /myperl/, @INC’

– You should see pathing relative to your home directory. If not,

repeat steps 1-3.

5. Invoke the CPAN shell and complete CPAN configuration:

$ perl -MCPAN -we shell

– CPAN will request that you set your config. Accepting the

default (type install Data::Dumper

cpan> install DBI

cpan> force install DBD:mysql

方法二: 普通的perl模块自定义安装方法(参考以前的博文)

mkdir /share/workdir/yangl/perl_lib (因为我的home目录是/share/workdir/yangl，所以一般用户应该是mkdir ～/perl_lib，请对应的修改为自己的目录)

perl Makefile.PL PREFIX=/share/workdir/yangl/perl_lib --cflags="-I/share/workdir/yangl/soft/mysql/include/" --libs="-L/share//workdir/yangl/soft/mysql/lib/ -lmysqlclient -lz" --mysql_config="/share/workdir/yangl/soft/mysql/bin/mysql_config"

说明：可以用 perl Makefile.PL –help查看说明或者是查看模块的说明文档 DBD-mysql-3.0008/INSTALL.html

–cflags 指定MySQL的C语言头文件,redhat linux的头文件的目录默认是/usr/include/mysql，由于我的Mysql是自定义安装的，我的头文件目录在/share/workdir/yangl/soft/mysql/include/

–libs 指定MySQL库文件，redhat linux的库文件的目录默认是 /usr/lib/mysql，由于我的Mysql是自定义安装的，我的库文件目录在/share//workdir/yangl/soft/mysql/lib/

–mysql_config 指定Mysql的mysql_config 文件，一般在Mysql安装目录的bin目录下，由于我的Mysql是自定义安装的，我的mysql_config文件目录在/share/workdir/yangl/soft/mysql/bin/mysql_config

另外，默认的–cflags，–libs 文件目录位置可以通过mysql_config脚本获得；

MySQL的C语言头文件默认目录： mysql_config –cflags

MySQL库文件默认目录：mysql_config –libs

make

make install

在.bashrc或者.bash_profile最后一行加入：

exportPERL5LIB="/share/workdir/yangl/perl_lib/lib/perl5:/share/workdir/yangl/perl_lib/lib64/perl5:/share/workdir/yangl/perl_lib/share/perl5:$PERL5LIB"

接着执行：

source .bashrc

或者 source .bash_profile (你所修改的文件)

4 安装MCL软件

mcl，即Markov Clustering algorithm，其最新的软件下载地址：http://www.micans.org/mcl/src/mcl-latest.tar.gz。下载后使用’./configure ， make ，make test， make install’ 安装MCL到 /usr/local/bin/mcl。(这一步比较简单一般没有问题出现)

解压: tar -xvf mcl-latest.tar.gz (解压到mcl-14-137)

cd mcl-14-137

./configure –prefix=/share/workdir/yangl/soft/mcl_14_137/

make

make install

vi ~/.bash_profile 在最后一行加入 export PATH=/share/workdir/yangl/soft/mcl_14_137/bin:$PATH

source ~/.bash_profile

5 安装配置orthomcl软件

a. 下载OrthoMCL软件(http://orthomcl.org/common/downloads/software/)后，解压缩后，其中包含文件夹:bin、config、doc、lib四个文件夹。

b . 将bin目录加入\$PATH: vi /etc/profile 在最后一行加入：export PATH=/share/soft/OrthoMCL/orthomclSoftware-v2.0.9/bin:$PATH

至此运行环境已配好，下面几步是每次分析都需要进行的操作，这里测试下所有的步骤：

6. orthomclInstallSchema 安装orthomcl数据库的表

在解压好的目录中(其实这个目录可在随意位置)建立my_orthomcl_dir目录，并将orthomcl.config.template复制到该目录，下面是如何设置该文件：

dbVendor=mysql #使用的数据库为mysql

dbConnectString=dbi:mysql:orthomcl #使用mysql数据库中名为orthomcl的数据库

dbLogin=orthomcl #数据库的用户名

dbPassword=orthomcl123 #相应的密码

similarSequencesTable=SimilarSequences #

orthologTable=Ortholog

inParalogTable=InParalog

coOrthologTable=CoOrtholog

interTaxonMatchView=InterTaxonMatch

percentMatchCutoff=50

evalueExponentCutoff=-5

oracleIndexTblSpc=NONE

Usage:

orthomclInstallSchema config_file sql_log_file [table_suffix]

#比如 [myname@localhost my_orthomcl_dir]$ orthomclInstallSchema orthomcl.config.template install_schema.log

#使用orthomcl.config.template配置文件中的设置生成了数据库中相应的表(初始化orthomcl数据库用以进行接下来的运算)

7 orthomclAdjustFasta 创建orthomcl的输入文件

注意：

First, for any organism that has multiple protein fasta files, combine them all into one single proteome fasta file

第一，每个物种的蛋白序列可能由多个fasta文件组成，将它们合并成单个文件。

Then, create an empty my_orthomcl_dir/compliantFasta/ directory, and change to that directory. Run orthomclAdjustFasta once for each input proteome fasta file. It will produce a compliant file in the new directory. Check each file to ensure that the proteins all have proper IDs.

第二，建立目录 my_orthomcl_dir/compliantFasta/，并切换到该目录；对每个蛋白序列的fasta文件，运行一次orthomclAdjustFasta。检测各输出文件，确保每个序列名称均合适。

Usage:

orthomclAdjustFasta taxon_code fasta_file id_field

#比如：orthomclAdjustFasta hum human.fasta 4

taxon_code：3到4个字符组成，物种名称的缩写；

fasta_file：输入的蛋白序列

id_field：整数；确定物种名称中，那个区域段的字符作为输出的protein的id(unique_protein_id)；区域段由“空格”或“|”分开。例如： ID (AP_000668.1)在第4区域段，>gi|89106888|ref|AP_000668.1|

输出文件名为 taxon_code.fasta 比如 hum,fasta；序列名称为 >taxoncode|unique_protein_id 比如 >hum|AP_000668.1

8 orthomclFilterFasta过滤文件

(需要进入到compliantFasta的上层目录：运行：cd ../ #进入my_orthomcl_dir) ，这一步将会对你刚才改写的蛋白文件进行过滤，去除长度小于XX(自己设定min_length)，stop coden(自己设定max_percent_stops)所占百分比的序列

运行要求：1)compliantFasta目录里，每个物种的所有的蛋白序列包含在一个fasta文件中；

2)每个fasta文件的名称格式必须为xxxx.fasta，其中xxxx是unique的物种名称，3-4个letter

3)每个序列的名称格式统一成>xxxx|yyyyyyyy；其中xxxx为3-4个letter的unique物种名，yyyyyyy为unique的ID。

Usage:

orthomclFilterFasta input_dir min_length max_percent_stops [good_proteins_file poor_proteins_file]

#比如 orthomclFilterFasta compliantFasta/ 10 20

输出：my_orthomcl_dir/goodProteins.fasta、my_orthomcl_dir/poorProteins.fasta、以及report of suspicious proteomes

9 blast 对上一步得到的goodProteins.fasta序列进行BLAST

(这里用的是blast-2.2.26)

Usage:

formatdb -i goodProteins.fasta

blastall -p blastp -i goodProteins.fasta -d goodProteins.fasta -m 8 -F F -b 1000 -v 1000 -a 2 -o all_VS_all.out.tab #这一步事实上为MCL提供相似矩阵

10 orthomclBlastParser将上一步得到的blast比对结果进行解析，以用于导入到orthomcl数据库中.

Usage：

orthomclBlastParser my_blast_results compliantFasta/ > similarSequences.txt

#比如 orthomclBlastParser all_VS_all.out.tab compliantFasta/ > similarSequences.txt

11 orthomclLoadBlast将similarSequences.txt载入到数据库中

Usage:

orthomclLoadBlast orthomcl.config.template similarSequences.txt

#比如 orthomclLoadBlast orthomcl.config.template similarSequences.txt

12 orthomclPairs将在database中SimilarSequences表中的数据，进行pairs的运算

产生三个表格存在于 mysql

– PotentialOrthologs table

– PotentialInParalogs table

– PotentialCoOrthologs table

Usage:

orthomclPairs orthomcl.config orthomcl_pairs.log cleanup=no

#比如 orthomclPairs orthomcl.config.template orthomcl_pairs.log cleanup=no

13 orthomclDumpPairsFiles 将数据从数据库中导出

生成mclInput文件和另外一个文件夹pairs，在这个pairs中，包含着这些蛋白之间的关系

Usage:

orthomclDumpPairsFiles orthomcl.config.template

#比如 orthomclDumpPairsFiles orthomcl.config.template

14 mcl 这一步开始对上一步给出的输出文件，进行mcl操作，开始聚类, 输出文件为mclOutput文件

Usage：

mcl my_orthomcl_dir/mclInput --abc -I 1.5 -o my_orthomcl_dir/mclOutput

#比如 mcl mclInput --abc -I 1.5 -o mclOutput

15 orthomclMclToGroups对mcl的聚类结果进行编号

在这个文件中，每一行表示一个家族.

注：my_prefix 指定在groups.txt中每个家族的前缀，如：GF_ 则在groups.txt中，每个家族以GF_开始

1 表示家族从1开始编码

Usage:

orthomclMclToGroups my_prefix 1 < mclOutput > groups.txt

#比如 orthomclMclToGroups G_ 1 < mclOutput > groups.txt