AMPs.fasta文件格式:
>AP00001 |antibacterial |anticancer/tumor |antifungal GLWSKIKEVGKEAAKAAAKAAGKAALGAVSEAV >AP00004 |antibacterial |antifungal NLCERASLTWTGNCGNTGHCDTQCRNWESAKHGACHKRGNWKCFCYFDC >AP00005 |antibacterial VFIDILDKVENAIHNAAQVGIGFAKPFEKLINPK >AP00006 |antibacterial GNNRPVYIPQPRPPHPRI >AP00008 |antibacterial RLCRIVVIRVCR >AP00020 |antibacterial |anticancer/tumor |antifungal GLFDIVKKIAGHIAGSI >AP00026 |antibacterial |anticancer/tumor |antifungal |anti-HIV |antiviral FKCRRWQWRMKKLGAPSITCVRRAF >AP00027 |antibacterial ITPATPFTPAIITEITAAVIA >AP00035 |antibacterial |anticancer/tumor KSSAYSLQMGATAIKQVKKLFKKWGW >AP00050 |antibacterial GIGASILSAGKSALKGLAKGLAEHFAN
代码任务:读取序列并统计序列长度分布
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt
sns.set_style('dark')
train_pos_data_path = r"./AMPs.fasta"
trai