前段时间在逛网站意外发现一个详细解释生物信息学中各种文件类型的站点,觉得对初学者很有帮助,特地分享一下:
比如说我点击BED格式,看看它是做什么的
BAM
To load a set of BAM files merged into a single track see Merged BAM File.
A BAM file (.bam) is the binary version of a SAM file. A SAM file (.sam) is a tab-delimited text file that contains sequence alignment data. These formats are described on the SAM Tools web site: http://www.htslib.org.
BAM, rather than SAM, is the recommended format for IGV. Starting with IGV 2.0.11, IUPAC ambiguity codes in BAM files are supported.
Indexing: IGV requires that both SAM and BAM files be sorted by position and indexed, and that the index files follow a specific naming convention. Specifically, a BAM index file should be named by appending .BAI to the bam file name. A SAM index filename is created by appending .SAI.
- The index files must have the same base file name and must reside in the same directory as the file that it indexes.
- For example, the index file for test-xyz.bam would be named test-xyz.bam.bai or test-xyz.bai.
- Multiple tools are available for sorting and indexing BAM files, including igvtools, the samtools package, and in GenePattern. The GenePattern module for sorting and indexing is Picard.SortSam.
- SAM files can be sorted and indexed using igvtools. Note: The .SAI index is an IGV format, and it does not work with samtools or any other application.
Chromosome names: Chromosome names must be consistent between the selected reference genome and the SAM/BAM data files. For convenience, IGV equates chromosome numbers and names of the form chr# (e.g., 1 and chr1 are equivalent).
One-based index: Start and end positions are identified using a one-based index. The end position is included. For example, setting start-end to 1-2 describes two bases, the first and second in the sequence.
里面解释道:BAM是SAM文件的二进制格式版本,包含了序列联配信息... ...具体格式详见:http://www.htslib.org 。
链接如下:
http://software.broadinstitute.org/software/igv/FileFormats