Lecture 3——DNA-seq-1

本文图片来自于学习视频——新一代测序技术数据分析第三讲 DNA-seq

Review

Alignment srategies
Smith-Waterman(speed too slow to use)
Fast alignment
Hash table
Seed and extension
Mask(for mismatches)
Suffix tree/prefix tree
Suffix array
Burrows-Wheeler Transformation

File format

VCF(Variant Call Format)

Usually stored in a compressed manner and can be indexed
QUAL: phred p-value of the variant call quality
Higher QUAL value —— less mistake
Filter
PASS—— if this position passed all the filters in the header files
q10:s50——list of filters that are not met
在这里插入图片描述
INFO: additional information
Optional
18 predefined options
Examples:
DB: dbSNP membership
DP: combined depth across all the samples
NS: number of samples with data
AF: estimated allele frequency
SB: strand bias at this position
AA: ancestral allele
Genotype fields: individual samples
Examples:
GT: genotype (0 reference, 1 first alternative, 2 second alternative…)
GQ: conditional genotype quality
-10long10[p-value(GT call is wrong | variants exist)]
DP: read depth at this position in this sample
HQ: hyplotype qualities

Data visualization

Genome Browsers

Bring together genome data and additional annotation data for viewing in a single browser of the genome
Genome Browsers provide context
Organize data based on chromosomal locations
Search for or navigate to genomic areas of interest to select and view annotation track for the region
EBI(Ensemble) genome browser
NCBI(Map Viewer)
UCSC Genome Browser
http://genome.ucsc.edu
在这里插入图片描述
Use this Gateway to search by
Gene names, symbols, IDs
Chromosome number chr7. or region: chr11:1038475-1075482
Keywords: kinase, receptor…
在这里插入图片描述
在这里插入图片描述
Viewing NGS data
Text files
Upload data/files to GENOME BROWSER sites
BED, GFF, GFT, WIG, MAF, BED detail, Personal Genome SNP, PSL
Binary files
Only portions of the files needed for display are transferred to UCSC
Enable to display files are very large
BAM, bigBED, bigWig,…
Viewing options
Hide: removes a track from view
Dense: all items collapsed into a single line
Squish: each items = separate line, but 50% height + packed
Pack: each item separate, but efficiently stacked(full height)
Full: each item on separate line

Integrative Genomics Viewer(IGV)

Supports a wide variety of data including sequence alignments, microarrays and genomic annotations
Java-based

在这里插入图片描述

Genetic variation

SNP(Single nucleotide polymorphism)
1 in every few hundred bp
Mutation rate ~= 10-9
Short indels(insertion/deletion)
1 in every few kb
Mutation rate: variable
Microsatellite(STR) repeat number
1 in every few kb
2-6 bp repeat units
Mutation rate < 10-3
Minisatellites
1 in every few kb
10-100bp repeat units
Mutation rate < 10-1
Repeated genes
rRNA, histones
Large structure variations
Insertion/deletions
Duplications
Inversions
Copy number variations

SNP

Types of SNP
Transition: A,G or C,T
Transversion: substitution between purine and a pyrimidine
for whole human genome, ts/tv of around 2-2.1 is generally correct, in exon, it is 2.8~3.0
SNPs and haplotype
Haplotypes are ‘blocks’ of associated SNPs
Structure variations
Traditionally defined as deletions
insertions or inversions > 1kb
Often involves repetitive regions of the genome and complex rearrangements
No optimal method for SV discovery (before NGS)
在这里插入图片描述
Underlying hypothesis for GWAS
Common disease, common variants
Common variants present in more than 1-5% of the population contribute to common disease
GWAS generally do not capture rare variants
Successful GWAS stories
Significant associations reported through March 2010( Manollo. New England J OF med. 2010)
~800 SNPs, 545 studies, 150 diseases/traits
GWAS limitations: lack of functional information
Disease/trait-associated SNPs are not necessarily causative variants
statistical powers
reduce false-positives and improve reproducibility of results
Missing heritability
Median odds ratio copy of the risk allele 1.33
NGS breakthrough in genetics of complex disease
Whole genome sequencing following GWAS(Holm et al. Nat Gen 2011)——Sick Sinum Syndrome
Exome sequencing (Ng et al. Nat Gen 2011)—— Miller Syndrome
Pooled sequencing (Calvo et al. Nat Gen 2011)——Human Complex 1 disorder

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值