用途:主要用于统计fastq整体统计和抽样。
包含的结果:reads数,碱基数,随机抽样的reads数和碱基数,碱基长度的平均值、标准差(std)、最小值和最大值,碱基质量的平均值、标准差(std)、最小值和最大值,质量类型,atcgn碱基总体所占比例,每个位置碱基的平均质量,每个位置atcgn
碱基的比例。
绘常规质控图:可绘制每个位置各种碱基比例折线图。
缺点:缺少Q10,Q20,Q30和Q40的统计结果。
安装
编译好的文档
wget http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/fastqStatsAndSubsample -O ./fastqStatsAndSubsample
chmod +x fastqStatsAndSubsample
使用示例
统计fastq质量与抽样
./fastqStatsAndSubsample ./SRR12495872_R1.fastq.gz SRR12495872_R1.fastq.stats out.fastq
仅统计fastq质量
./fastqStatsAndSubsample ./SRR12495872_R1.fastq.gz SRR12495872_R1.fastq.stats /dev/null -sampleSize=0
仅抽样fastq
./fastqStatsAndSubsample ./SRR12495872_R1.fastq.gz /dev/null out.fastq -sampleSize=100000 -seed=0 -smallOk
结果说明
不介绍抽样结果。
对于统计结果SRR12495872_R1.fastq.stats
,内容如下:
包含的结果:reads数,碱基数,随机抽样的reads数和碱基数,碱基长度的平均值、标准差(std)、最小值和最大值,碱基质量的平均值、标准差(std)、最小值和最大值,质量类型,atcgn碱基总体所占比例,每个位置碱基的平均质量,每个位置atcgn
碱基的比例。
readCount 50821729
baseCount 5082172900
sampleCount 0
basesInSample 0
readSizeMean 100
readSizeStd 0
readSizeMin 100
readSizeMax 100
qualMean 35.1354
qualStd 4.54484
qualMin 0
qualMax 38
qualType sanger
qualZero 33
atRatio 0.480138
aRatio 0.244547
cRatio 0.258272
gRatio 0.261249
tRatio 0.235276
nRatio 0.000656494
posCount 100
qualPos 35.4974,35.1461,35.3933,35.4193,35.316,35.2175,35.3414,35.3884,35.3579,34.986,35.4138,35.3698,35.3432,35.3768,35.4025,35.3268,35.3399,35.311,35.3328,35.3289,35.3423,35.3589,35.2835,35.3695,35.2807,35.3289,35.1453,35.2607,35.3204,35.3035,35.3366,35.2983,35.0874,35.3182,35.2416,35.3461,35.3459,35.2278,35.2765,35.1016,35.3196,35.3376,35.254,35.2512,35.3065,35.2289,35.2178,35.1355,35.0732,35.073,35.1848,35.2288,35.1898,35.1463,35.1982,35.1285,35.1596,35.1605,35.0551,35.0179,35.0736,35.1559,35.1659,35.0121,35.0153,35.1173,35.1311,35.12,35.0524,35.0479,35.0677,35.0814,35.0918,35.0347,34.9949,35.1458,35.1236,34.9751,34.9741,34.9135,35.0164,34.9108,34.8027,34.8002,34.9373,34.8415,34.581,34.9733,34.8735,34.888,34.7643,34.8559,34.8722,34.9029,34.6963,34.8232,34.8793,34.742,34.6882,34.5478,
aAtPos 0.13281,0.186559,0.220777,0.273199,0.28819,0.331917,0.192625,0.229008,0.217459,0.331929,0.254028,0.220821,0.245343,0.248402,0.247138,0.243193,0.248404,0.253101,0.24568,0.251505,0.250563,0.246021,0.251884,0.25156,0.245253,0.2533,0.251119,0.24647,0.250663,0.250725,0.243986,0.251421,0.248719,0.244188,0.249781,0.249128,0.244515,0.249786,0.247222,0.242237,0.248423,0.249178,0.242485,0.248702,0.24863,0.243347,0.249039,0.246634,0.241177,0.246753,0.246521,0.241486,0.248243,0.247614,0.241603,0.24748,0.246434,0.241299,0.246628,0.246968,0.241017,0.246185,0.244467,0.239992,0.245016,0.245479,0.239743,0.244439,0.245193,0.239619,0.24622,0.24523,0.240471,0.244428,0.243326,0.240225,0.242786,0.244985,0.238887,0.245635,0.243936,0.240062,0.244657,0.243521,0.238854,0.244147,0.24339,0.239019,0.245004,0.244758,0.238022,0.24376,0.243973,0.23936,0.243513,0.244155,0.238228,0.242724,0.242533,0.238429,
cAtPos 0.377866,0.222265,0.347485,0.249054,0.213259,0.213088,0.236759,0.280406,0.265232,0.2024,0.271404,0.277954,0.247559,0.249691,0.257943,0.257043,0.252333,0.253302,0.25637,0.249411,0.255635,0.259531,0.252262,0.256691,0.260232,0.250832,0.255154,0.259318,0.253819,0.258041,0.261697,0.253976,0.257975,0.261697,0.253386,0.258603,0.262567,0.253385,0.261161,0.260433,0.253378,0.259395,0.262477,0.255074,0.258569,0.260444,0.254387,0.25906,0.260468,0.256164,0.259006,0.261099,0.255177,0.258021,0.262061,0.254938,0.260426,0.261468,0.255034,0.258772,0.262152,0.254136,0.258027,0.26071,0.256739,0.259583,0.262487,0.256214,0.259784,0.26213,0.255073,0.259187,0.261963,0.255003,0.260059,0.261673,0.255412,0.258875,0.262235,0.254022,0.257529,0.260284,0.255476,0.258737,0.261333,0.253182,0.256638,0.26247,0.255204,0.257763,0.260316,0.256143,0.258791,0.260398,0.255424,0.25573,0.261752,0.255288,0.256227,0.260407,
gAtPos 0.416133,0.314979,0.233949,0.298618,0.292352,0.227698,0.213388,0.225708,0.231686,0.246319,0.283614,0.254625,0.249291,0.252021,0.252307,0.262263,0.261516,0.252783,0.264729,0.262718,0.252599,0.262364,0.259634,0.25439,0.260479,0.260966,0.25331,0.263079,0.26252,0.252568,0.262108,0.26115,0.251707,0.262516,0.261485,0.254005,0.262794,0.262375,0.253658,0.261654,0.262616,0.253426,0.263096,0.261641,0.253456,0.263781,0.261373,0.254265,0.260864,0.259481,0.25441,0.264416,0.262886,0.256189,0.262564,0.262664,0.25503,0.263135,0.262179,0.255805,0.263789,0.26295,0.256215,0.263102,0.261148,0.256222,0.262248,0.262816,0.255624,0.26343,0.263162,0.256645,0.2634,0.264364,0.256026,0.264167,0.264207,0.25651,0.262403,0.262224,0.257549,0.264861,0.264253,0.257821,0.264354,0.262461,0.254854,0.263931,0.263296,0.257041,0.263676,0.263796,0.257404,0.266807,0.265043,0.259135,0.264878,0.265008,0.258417,0.264295,
tAtPos 0.0729129,0.27619,0.19748,0.17912,0.206198,0.227229,0.357227,0.264874,0.28561,0.215163,0.190941,0.245672,0.257713,0.249656,0.242611,0.23742,0.237747,0.240632,0.233215,0.236363,0.241197,0.232084,0.234756,0.237359,0.231709,0.2349,0.237278,0.230096,0.232938,0.238649,0.232163,0.23345,0.236505,0.231596,0.234099,0.238259,0.230117,0.233026,0.237957,0.231515,0.235582,0.238,0.231712,0.234582,0.239344,0.232425,0.235198,0.240036,0.234652,0.234627,0.240032,0.232992,0.233677,0.237233,0.233102,0.234876,0.238096,0.234037,0.23473,0.238421,0.231697,0.235883,0.240377,0.231943,0.234274,0.238646,0.235512,0.236513,0.239328,0.234782,0.23554,0.238837,0.234099,0.236182,0.240456,0.233924,0.237588,0.239592,0.235505,0.236034,0.240922,0.234087,0.235497,0.239882,0.235433,0.238759,0.237692,0.234575,0.236486,0.24043,0.23503,0.236288,0.239802,0.233424,0.235932,0.240909,0.235129,0.236956,0.242732,0.233975,
nAtPos 0.000278483,6.55232e-06,0.000308726,7.77227e-06,7.67388e-07,6.83566e-05,9.64155e-07,2.99085e-06,1.26127e-05,0.00418892,1.36359e-05,0.000928835,9.38772e-05,0.000230826,1.96766e-08,7.971e-05,7.87065e-08,0.000181261,6.00137e-06,3.52212e-06,6.82779e-06,5.50945e-07,0.00146449,2.75473e-07,0.00232747,1.8496e-06,0.00313897,0.00103631,6.00334e-05,1.62529e-05,4.7342e-05,3.69921e-06,0.00509404,2.75473e-06,0.00124889,5.11592e-06,6.59167e-06,0.00142789,1.67251e-06,0.0041609,1.67251e-06,9.05125e-07,0.00023059,1.31833e-06,7.08358e-07,4.26983e-06,1.98734e-06,5.52913e-06,0.00283865,0.0029744,3.16203e-05,6.49329e-06,1.72761e-05,0.000942609,0.000669497,4.24622e-05,1.46001e-05,6.21781e-05,0.00142894,3.42767e-05,0.00134486,0.000845013,0.000914038,0.00425283,0.00282377,7.01865e-05,9.07092e-06,1.83386e-05,7.07178e-05,3.82317e-05,4.40756e-06,0.000100941,6.63496e-05,2.3061e-05,0.000133152,1.1196e-05,7.53615e-06,3.71888e-05,0.00096988,0.0020846,6.37129e-05,0.000707198,0.000116958,3.91762e-05,2.57961e-05,0.00145021,0.0074269,5.50945e-06,1.01728e-05,8.61836e-06,0.00295545,1.32424e-05,3.01249e-05,1.22585e-05,8.92335e-05,7.119e-05,1.33211e-05,2.373e-05,9.11421e-05,0.00289429,
帮助文档
- 基础格式
fastqStatsAndSubsample <in.fastq> <out.stats> <out.fastq>
<in.fastq>代表输入fastq,可为gzip or bzip2
的压缩格式
<out.stats>整体fastq的统计内容。默认为普通文本格式,通过添加-json
可获取json格式结果文件。
<out.fastq>抽样fastq,默认抽取100000
条。该fastq文件,必须单独压缩。不能通过添加后缀.gz
直接获取到压缩文件。
-
-sampleSize=N
抽取的read数量,默认为100K
条 -
-seed=N
随机抽取read时的seed,对于双端测序数列,可以通过设置相同seed保证抽取的reads为配对儿的reads。 -
-smallOk
添加该选项时,如果fastq文件reads数小于抽取的reads数,则输出整个输入的fastq。 -
-json
添加该选项时,<out.stats> 将转为json格式。 -
对于不想输出的结果文件,可以使用
/dev/null
进行占位。
$ ./fastqStatsAndSubsample
fastqStatsAndSubsample v2 - Go through a fastq file doing sanity checks and collecting stats
and also producing a smaller fastq out of a sample of the data. The fastq input may be
compressed with gzip or bzip2.
Paired-end samples: run on both files, the seed is fixed so it will chose the paired reads
usage:
fastqStatsAndSubsample in.fastq out.stats out.fastq
options:
-sampleSize=N - default 100000
-seed=N - Use given seed for random number generator. Default 0.
-smallOk - Not an error if less than sampleSize reads. out.fastq will be entire in.fastq
-json - out.stats will be in json rather than text format
Use /dev/null for out.fastq and/or out.stats if not interested in these outputs