ReferenceSequence
是 HTSJDK 库中的一个类,用于表示基因组参考序列的一个特定区域。它是处理参考基因组数据时的关键类,提供了对参考序列的访问和操作功能。
ReferenceSequence
类概述
功能
-
表示参考序列:
ReferenceSequence
类封装了参考基因组中某个 contig(染色体或 contig)的序列数据,包括序列的名称、起始位置以及实际的核苷酸序列。
-
提供序列数据:
- 该类提供了对序列的各种操作,如获取序列的碱基字符串、访问序列的部分区域等。
主要属性和方法
-
序列名称和位置:
getName()
:返回参考序列的名称(即 contig 名称)。getStart()
:返回序列的起始位置(1-based)。getEnd()
:返回序列的结束位置(1-based)。
-
序列数据:
getBaseString()
:返回参考序列的核苷酸字符串(即序列本身)。例如,"ACGT"。getBases()
:返回参考序列的碱基数组。
-
区域访问:
subSequence(int start, int end)
:返回参考序列中指定区域的子序列。
源代码
/*
* The MIT License
*
* Copyright (c) 2009 The Broad Institute
*
* Permission is hereby granted, free of charge, to any person obtaining a copy
* of this software and associated documentation files (the "Software"), to deal
* in the Software without restriction, including without limitation the rights
* to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the Software is
* furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice shall be included in
* all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
* AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
* THE SOFTWARE.
*/
package htsjdk.samtools.reference;
import htsjdk.beta.plugin.HtsRecord;
import htsjdk.samtools.util.StringUtil;
/**
* Wrapper around a reference sequence that has been read from a reference file.
*
* @author Tim Fennell
*/
public class ReferenceSequence implements HtsRecord {
private final String name;
private final byte[] bases;
private final int contigIndex;
private final int length;
/**
* creates a fully formed ReferenceSequence
*
* @param name the name of the sequence from the source file
* @param index the zero based index of this contig in the source file
* @param bases the bases themselves stored as one-byte characters
*/
public ReferenceSequence(String name, int index, byte[] bases) {
this.name = name;
this.contigIndex = index;
this.bases = bases;
this.length = bases.length;
}
/** Gets the set of names given to this sequence in the source file. */
public String getName() { return name; }
/**
* Gets the array of bases that define this sequence. The bases can include any
* letter and possibly include masking information in the form of lower case
* letters. This array is mutable (obviously!) and it NOT a clone of the array
* held interally. Do not modify it!!!
*/
public byte[] getBases() { return bases; }
/**
* Returns the bases represented by this ReferenceSequence as a String. Since this will copy the bases
* and convert them to two-byte characters, this should not be used on very long reference sequences,
* but as a convenience when manipulating short sequences returned by
* {@link ReferenceSequenceFile#getSubsequenceAt(String, long, long)}
*
* @return The set of bases represented by this ReferenceSequence, as a String
*/
public String getBaseString() { return StringUtil.bytesToString(bases); }
/** Gets the 0-based index of this contig in the source file from which it came. */
public int getContigIndex() { return contigIndex; }
/** Gets the length of this reference sequence in bases. */
public int length() { return length; }
public String toString() {
return "ReferenceSequence " + getName();
}
}