Keep Learning

学习Spark、CarbonData 、Alluxio等，且为其Contributor，Github为：https://github.com/xubo245。欢迎微信联系601450868！

原创基因数据处理80之disease的DataProcessing

1.代码：/** * @author xubo * more code:https://github.com/xubo245/SparkLearning * more blog:http://blog.csdn.net/xubo245 */package org.gcdss.cli.diseaseimport java.text.SimpleDate

2017-12-26 00:00:36 396

原创基因数据处理79之从vcf关联到omim

1.数据：// var vcfFile = "file/callDisease/input/small.vcf"// var dbSnp2omimFile = "file/callDisease/input/omimFilter9Text.txt"// var omimFile = "file/callDisease/input/genemap.txt"修改过small数据：

2017-12-26 00:00:09 1591

原创基因数据处理78之从vcf使用不同的方法读取结果不一样

1.方法1和2：val path2 = "hdfs://219.219.220.149:9000/xubo/callVariant/vcf/smallAnno2Adam.vcf"val anno2adam = sc.loadParquetVariantAnnotations(path2)println("anno2adam:")anno2adam.foreach(println)val ann

2017-12-25 23:58:46 1316

原创基因数据处理77之从vcf文件中提取某条染色体的数据

1.代码：/** * @author xubo */package org.gcdss.cli.vcfimport org.apache.spark.{SparkConf, SparkContext}/** * Created by xubo on 2016/5/23. */object extractGRCH38chr20vcf { def main(args: Array

2017-12-25 23:58:02 6632

原创基因数据处理76之从HDFS读取fasta并统计条数

读入fasta格式数据：第一次：hadoop@Master:~/xubo/project/load/loadfastqFromHDFSfastaAndCount$ ./load.sh start:1run time:25101 ms*************end*************hadoop@Master:~/xubo/project/load/loadfastqFromHD

2017-12-25 23:57:22 1041

原创基因数据处理75之从HDFS读取vcf文件存为Adam的parquet文件（成功）

1.参考：package org.bdgenomics.adam.cliclass FlattenSuite extends ADAMFunSuite {val loader = Thread.currentThread().getContextClassLoaderval inputPath = loader.getResource("small.vcf").getPathval outp

2017-12-25 23:56:16 399

原创基因数据处理74之从HDFS读取vcf文件存为Adam的parquet文件（有问题）

1.small.vcf: 没记录2.读取：5load time:3287 ms{"variant": {"variantErrorProbability": 139, "contig": {"contigName": "1", "contigLength": null, "contigMD5": null, "referenceURL": null, "assembly": null, "sp

2017-12-25 23:55:42 1107

原创基因数据处理73之从HDFS读取fasta文件存为Adam的parquet文件

1.GRCH38chr14：hadoop@Master:~/xubo/project/load$ ./load.sh start:1SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".SLF4J: Defaulting to no-operation (NOP) logger implementationSLF4J

2017-12-25 23:54:33 757

原创基因数据处理72之GATK安装成功

1.下载：git clone https://github.com/broadgsa/gatk-protected.git2.安装：git checkout 3.5mvn clean package -DskipTests3.安装成功：[INFO] Reactor Summary:[INFO] [INFO] GATK Root ...............................

2017-12-20 00:45:07 7540

原创基因数据处理71之GRCH38 的chr14提取

1.获取开始和结束行号cat GCA_000001405.15_GRCh38_full_analysis_set.fna |grep -i -n '>'2.提取chr14cat GCA_000001405.15_GRCh38_full_analysis_set.fna |head -32835035|tail -1529197 >GRCH38ch14.fasta 3.整理：hadoop@Mc

2017-12-20 00:44:33 2316

原创基因数据处理70之Picard安装没成功

1.下载：https://github.com/broadinstitute/picard.git2.安装：hadoop@Master:~/xubo/tools/picard$ ant clone-htsjdkBuildfile: /home/hadoop/xubo/tools/picard/build.xmlclone-htsjdk: [exec] Cloning into 'hts

2017-12-20 00:44:12 1478

原创基因数据处理69之bowtie安装与使用

1.下载：hadoop@Master:~/xubo/tools$ git clone https://github.com/BenLangmead/bowtie2.gitCloning into 'bowtie2'...remote: Counting objects: 7503, done.remote: Total 7503 (delta 0), reused 0 (delta 0),

2017-12-20 00:43:48 4057

原创基因数据处理68之avocado的配置文件默认无法从hdfs读取

（1）配置文件设置为hdfs路劲有问题val configFile = "hdfs://219.219.220.149:9000/xubo/avocado/avocado-sample-configs/basic.properties"报错：hadoop@Master:~/cloud/adam/xubo/data/GRCH38Sub/cs-bwamem/BWAMEMSparkAll/test$ .

2017-12-20 00:43:23 804

原创基因数据处理67之bwa建立索引时间

两次，GRCH38 的1号染色体hadoop@Master:~/cloud/adam/xubo/data/GRCH38Sub/cs-bwamem/bwaindex$ bwa index GRCH38chr1L3556522.fasta [bwa_index] Pack FASTA... 2.50 sec[bwa_index] Construct BWT for the packed sequen

2017-12-20 00:42:58 4186 2

原创基因数据处理66之avocado集群运行

1.最大问题：老报错的问题：hadoop@Master:~/cloud/adam/xubo/data/GRCH38Sub/cs-bwamem/BWAMEMSparkAll$ ./GcdssCallVariant2.sh start:fqFile:hdfs://219.219.220.149:9000/xubo/avocado/NA12878_snp_A2G_chr20_225058.sam

2017-12-20 00:42:36 786

原创基因数据处理65之bwa处理500bp和1000bp的记录

xubo@xubo:~/xubo/data/alignment/cs-bwamem$ bwa aln bwaindex/GRCH38chr1L3556522.fasta g38l500N10000.fq >g38l500N10000.bwa.sai[bwa_aln] 17bp reads: max_diff = 2[bwa_aln] 38bp reads: max_diff = 3[bwa_a

2017-12-20 00:41:52 842

原创基因数据处理64之bwamem处理500bp和1000bp的记录

xubo@xubo:~/xubo/data/alignment/cs-bwamem$ bwa mem bwaindex/GRCH38chr1L3556522.fasta g38l500N10000.fq >g38l500N10000.bwamem.sam[M::bwa_idx_load_from_disk] read 0 ALT contigs[M::process] read 10000 se

2017-12-20 00:40:20 1559

原创基因数据处理63之snap修改默认设置后处理大于400bp的记录

通过修改Read.h中的400=》4000，之后可以运行，但是匹配的命中率好低。但是bwamen很不错，下一篇有记录。xubo@xubo:~/xubo/data/alignment/cs-bwamem$ snap-aligner single snapindex/ g38l500N10000.fq -o g38l500N10000.snap1.samWelcome to SNAP version

2017-12-20 00:39:06 808

原创基因数据处理62之snap默认无法处理大于400bp的reads

在处理500bp和1000bp的时候，snap都无法处理：xubo@xubo:~/xubo/data/alignment/cs-bwamem$ snap-aligner single snapindex/ g38l500N10000.fq -o g38l500N10000.snap1.samWelcome to SNAP version 1.0beta.23.Loading index from

2017-12-18 23:52:35 522

原创基因数据处理61之idea运行cs-bwamem处理single-end（1条100bp的reads）

代码： package cs.ucla.edu.bwaspark import java.text.SimpleDateFormat import java.util.Date import cs.ucla.edu.bwaspark.FastMap._ import cs.ucla.edu.bwaspark.commandline.{BWAMEMCommand,

2017-12-18 23:52:13 595

原创基因数据处理60之bwa运行single-end（1千万条100bp的reads）

第一次：```hadoop@Master:~/cloud/adam/xubo/data/cs-bwamem$ bwa aln GRCH38BWAindex/GRCH38chr1L3556522.fasta g38L100c10000000Nhs20.fq > bwa/g38L100c10000000Nhs20.bwase1.sai [bwa_aln] 17bp reads: max_diff

2017-12-18 23:51:59 694

原创基因数据处理59之snap运行single-end（1千万条100bp的reads）

记录： hadoop@Master:~/cloud/adam/xubo/data/GRCH38Sub/cs-bwamem$ snap-aligner single snap/snapindex g38L100c10000000Nhs20.fq -o snap/g38L100c10000000Nhs20.snap.sam Welcome to SNAP version 1.0beta.

2017-12-18 23:50:20 682

原创基因数据处理58之snap运行paired-end（1千万条100bp的reads对）

hadoop@Master:~/cloud/adam/xubo/data/GRCH38Sub/cs-bwamem$ snap-aligner index GRCH38BWAindex/GRCH38chr1L3556522.fasta snapindex Welcome to SNAP version 1.0beta.23. Hash table slack 0.300000 L

2017-12-18 23:48:13 936

原创基因数据处理57之BWA-MEM运行single-end(1千万条100bp的reads)

```hadoop@Master:~/cloud/adam/xubo/data/GRCH38Sub/cs-bwamem$ bwa mem GRCH38BWAindex/GRCH38chr1L3556522.fasta g38L100c10000000Nhs20.fq > g38L100c10000000Nhs20.bwamem.sam[M::bwa_idx_load_from_disk] rea

2017-12-18 23:47:44 1067

原创基因数据处理56之bwa运行paird-end（1千万条100bp的reads）.md

（1）pair1.fq》sai bwa aln GRCH38BWAindex/GRCH38chr1L3556522.fasta g38L100c10000000Nhs20Paired1.fq >g38L100c10000000Nhs20Paired1.saipair1记录： hadoop@Master:~/cloud/adam/xubo/data/GRCH38Sub/cs-bwamem

2017-12-18 01:17:02 1537

原创基因数据处理55之cs-bwamem安装记录（idea maven ，没有通过pl）

project下的pom D:\1win7\java\jdk\bin\java "-Dmaven.home=D:\1win7\idea\IntelliJ IDEA Community Edition 15.0.4\plugins\maven\lib\maven3" "-Dclassworlds.conf=D:\1win7\idea\IntelliJ IDEA Community Editio

2017-12-18 01:15:19 548

原创基因数据处理54之bwa-mem运行paird-end（1千万条100bp的reads）

指令：```hadoop@Master:~/cloud/adam/xubo/data/GRCH38Sub/cs-bwamem$ bwa mem GRCH38BWAindex/GRCH38chr1L3556522.fasta g38L100c10000000Nhs20Paired1.fq g38L100c10000000Nhs20Paired2.fq >g38L100c10000000Nhs20Pai

2017-12-18 01:14:46 2017 1

原创基因数据处理53之cs-bwamem集群版运行paird-end（1千万条100bp的reads）

art： art_illumina -ss HS20 -i GRCH38BWAindex/GRCH38chr1L3556522.fna -p -l 100 -m 200 -s 10 -c 10000000 -o g38L100c10000000Nhs20Paired2.fastq上传到hdfs spark-submit --class cs.ucla.edu.bwaspark.B

2017-12-18 01:13:54 779

原创 JNI学习1之资料整理大全

更多代码请见：https://github.com/xubo245/JNILearning1.书籍：Java核心技术卷二第十二章本地方法2.百度云：深入浅出JNI 第一讲（JNI概述、编写首个JNI程序）传智播客_Java培训教程_JNI第1-7讲3.开源项目： https://github.com/xubo245/HelloWorldJNIwithRegisterNatives

2017-04-11 19:46:25 939

原创 Spark问题14之Spark stage retry问题

更多代码请见：https://github.com/xubo245基因数据处理系列之SparkBWA1.解释1.1 简述当partitions超过节点数量的时候Lost executor的问题，已经提交到SparkBWA中，https://github.com/citiususc/SparkBWA/issues/35另外发现，tmp里面有临时文件没有删除，而且stage retry未解决2.记录完整

2017-03-06 10:48:37 5373

原创 Spark问题13之Total size of serialized results of 30 tasks (2.0 GB) is bigger than spark.driver.maxResul

更多代码请见：https://github.com/xubo245/SparkLearningSpark生态之Alluxio学习版本：alluxio（tachyon） 0.7.1，spark-1.5.2,hadoop-2.6.01.问题描述当使用cs-bwamem输出文件到local的sam时，文件过大，出现问题。driver的默认maxResultSize不够，报错2.运行记录：hadoop@M

2017-03-06 10:47:43 8363

原创 Spark问题12之kryoserializer shuffle size 不够,出现overflow

更多代码请见：https://github.com/xubo245/SparkLearningSpark生态之Alluxio学习版本：alluxio（tachyon） 0.7.1，spark-1.5.2,hadoop-2.6.01.问题描述1.1运行cs-bwamem是出现序列化shuffle overflow问题，主要是需要输出sam到本地，文件比较大，默认的是：spark.kryoserial

2017-03-06 10:47:30 5118 2

原创 Spark问题11之广播失败

更多代码请见：https://github.com/xubo245/SparkLearningSpark生态之Alluxio学习版本：alluxio（tachyon） 0.7.1，spark-1.5.2,hadoop-2.6.01.问题描述运行是出现Error cleaning broadcast 72.运行记录：17/02/28 08:28:48 ERROR ContextCleaner: Er

2017-03-06 10:47:20 6151 1

原创 Spark问题10之Spark运行时节点空间不足导致运行报错

更多代码请见：https://github.com/xubo245/SparkLearningSpark生态之Alluxio学习版本：alluxio（tachyon） 0.7.1，spark-1.5.2,hadoop-2.6.01.问题描述1.1 简述在写了脚本运行多个application的时候，运行到十几个之后，报错了。org.apache.spark.SparkException: Job

2017-03-06 10:47:08 2911

原创 Spark问题9之Spark通过JNI调用c的问题解决

更多代码请见：https://github.com/xubo245/SparkLearningSpark生态之Alluxio学习版本：alluxio（tachyon） 0.7.1，spark-1.5.2,hadoop-2.6.01.问题描述1.1 描述当scala通过JNI调用c时，使用spark-submit提交时，会出现错误：no JNIparasail in java.library.pat

2017-03-06 10:46:51 3359 6

原创 Spark问题8之worker lost

更多代码请见：https://github.com/xubo245/SparkLearningSpark生态之Alluxio学习版本：alluxio（tachyon） 0.7.1，spark-1.5.2,hadoop-2.6.01.问题描述1.1 第一次八个节点七个节点dead，worker都lost了，不知道为什么没找到其他日志【3】中也有类似的问题，猜测可能是history增加的原因hadoo

2017-03-06 10:46:39 1551

原创 Spark问题7之如何让节点执行指定的core数

更多代码请见：https://github.com/xubo245/SparkLearningSpark生态之Alluxio学习版本：alluxio（tachyon） 0.7.1，spark-1.5.2,hadoop-2.6.01.问题描述1.1使用1，2的代码会有1个节点执行连个core1.2 ## 代码hadoop@Master:~/disk2/xubo/project/alignment/S

2017-03-06 10:46:18 2234

原创 Spark问题6之Spark丢失excutor之后appport占CPU90%

更多代码请见：https://github.com/xubo245/SparkLearningSpark生态之Alluxio学习版本：alluxio（tachyon） 0.7.1，spark-1.5.2,hadoop-2.6.01.问题描述1.1 spark lost excutorhadoop@Master:~/disk2/xubo/project/alignment/SparkSW/Spark

2017-03-06 10:46:04 980

原创 Spark问题5之ERROR LiveListenerBus SparkListenerBus has already stopped

更多代码请见：https://github.com/xubo245/SparkLearningSpark生态之Alluxio学习版本：alluxio（tachyon） 0.7.1，spark-1.5.2,hadoop-2.6.01.问题描述1.1 描述将uniref按照序列长度[0,100).[100,)…进行划分，然后使用sparkSW从HDFS和Alluxio分别读取，并进行性能分析运行过程中

2017-03-06 10:44:19 13601 4

原创 Spark问题4之Excutor lost

更多代码请见：https://github.com/xubo245/SparkLearningSpark生态之Alluxio学习版本：alluxio（tachyon） 0.7.1，spark-1.5.2,hadoop-2.6.01.问题描述1.1Mcnode1 的一个excutor丢失： ExecutorLostFailure (executor 2 lost)1.2http://Master

2017-03-06 10:44:09 1228

CarbonData学习资料

Apache CarbonData学习文档汇总，包含视频/文档/文件等。

2018-11-22

opencv 3.4.1 jar

opencv-341.jar. for invoking opencv,you can add the code to your project

2018-05-16

高级Shell脚本编程

高级Shell脚本编程,高级Shell脚本编程

2016-03-15

2015年中国软件开发者白皮书

2016-01-12

neo4j-javadocs-2.3.1-javadoc.jar

neo4j-javadocs-2.3.1-javadoc.jar neo4j 2.3.1 API

2015-11-26

neo4j-enterprise-2.3.1-unix.tar.gz

neo4j-enterprise-2.3.1-unix.tar.gz，官网下载

2015-11-25

neo4j-enterprise-2.3.0-M03-unix.tar.gz

neo4j-enterprise-2.3.0-M03-unix.tar.gz,官网下载

2015-11-25

资金流入流出预测大赛冠军答辩PPT

资金流入流出预测大赛冠军答辩PPT，资金流入流出预测冠军答辩PPT 阿里云天池

2015-09-09

redis-3.0.4安装包

redis-3.0.4.tar.gz，redis-3.0.4安装包，官网下载

2015-09-09

JDK.API.7_English.chm

JDK.API.7_English.chm Java™ Platform, Standard Edition 7 API Specification This document is the API specification for the Java™ Platform, Standard Edition.

2015-08-24

Java 2 SE 6 Documentation.chm

Java 2 SE 6 Documentation.chm JavaTM SE 6 Platform at a Glance This document covers the JavaTM Platform, Standard Edition 6 JDK. Its product version number is 6 and developer version number is 1.6.0, as described in Platform Name and Version Numbers. For information on a feature of the JDK, click on a component in the diagram below.

2015-08-24

JavaSE中文API.chm

JavaSE中文API.chm JavaTM 2 Platform Standard Edition 5.0 API 规范本文档是 Java 2 Platform Standard Edition 5.0 的 API 规范。

2015-08-24

jdk api 1.7英文版-带索引

java, jdk api 1.7英文版-带索引,English,Index,Java™ Platform, Standard Edition 7 API Specification

2015-08-24

微软、谷歌、百度、腾讯等各大公司笔试面试题整理全版.rar

2015-08-20

10部算法经典著作的合集

2015-08-20

百度人搜，阿里巴巴，腾讯华为小米搜狗笔试面试八十题.pdf

2015-08-20

色彩空间转换matlab

色彩空间转换matlab RGB HSV YIQ NTSC

2014-04-14

isrgb.m,matlab

isrgb.m matlab rgb function y = isrgb(x) %ISRGB Return true for RGB image. % FLAG = ISRGB(A) returns 1 if A is an RGB truecolor image and % 0 otherwise. % % ISRGB uses these criteria to determine if A is an RGB image: % % - If A is of class double, all values must be in the range % [0,1], and A must be M-by-N-by-3. % % - If A is of class uint8 or uint16, A must be M-by-N-by-3. % % Note that a four-dimensional array that contains multiple RGB % images returns 0, not 1. % % Class Support % ------------- % A can be of class uint8, uint16, or double. If A is of % class logical it is considered not to be RGB. % % See also ISBW, ISGRAY, ISIND. % Copyright 1993-2003 The MathWorks, Inc. % $Revision: 1.15.4.2 $ $Date: 2003/08/23 05:52:55 $ wid = sprintf('Images:%s:obsoleteFunction',mfilename); str1= sprintf('%s is obsolete and may be removed in the future.',mfilename); str2 = 'See product release notes for more information.'; warning(wid,'%s\n%s',str1,str2); y = size(x,3)==3; if y if isa(x, 'logical') y = false; elseif isa(x, 'double') % At first just test a small chunk to get a possible quick negative m = size(x,1); n = size(x,2); chunk = x(1:min(m,10),1:min(n,10),:); y = (min(chunk(:))>=0 && max(chunk(:))=0 && max(x(:))<=1); end end end

2014-03-27

C语言头函数包include

C语言头函数包include stdio.h stdlib.h等

2013-10-18

计算方法实验Gauss_Seidel法和Runge_Kutta法

计算方法实验说明文档 PB10210016 徐波实验要求：第二版208页程序15 第二版208页程序20，将二阶改为四阶，求第二个实验环境：操作系统：Windows8 64位　编译软件:Code::Blocks 版本：10.05 位数：32位实验提交时间：　考前实验说明： Gauss_Seidel：左侧为数据文档，为了方便多次测试，可将txt文档中数据复制到exe中运行，输入规范请见上图上图为正确输出之一 Runge_Kutta 左侧为数据文档，为了方便多次测试，可将txt文档中数据复制到exe中运行，输入规范请见上图上图为正确输出之一附件：程序15：Gauss_Seidel代码、可运行exe程序、输入数据文件和运行截图程序20：Runge_Kutta代码、可运行exe程序、输入数据文件和运行截图实验心得：　　通过这次实验，对Gauss_Seidel法和Runge_Kutta法了解更深，并且有了实际运行经验，而且通过编程，对方法每一步的运算数据的输入输出了解更深，总的来说收获很大，我们应该多写些类似的程序，希望能将其放在网页上，输入数据就能运行出结果。 PB10210016 徐波 2013.5.28 代码请联系QQ：601450868　　

2013-10-17

空空如也

TA创建的收藏夹 TA关注的收藏夹

TA关注的人