Keep Learning

学习Spark、CarbonData 、Alluxio等，且为其Contributor，Github为：https://github.com/xubo245。欢迎微信联系601450868！

原创 Spark问题3之SparkException：Error notifying standalone scheduler's driver endpoint

更多代码请见：https://github.com/xubo245/SparkLearningSpark生态之Alluxio学习版本：alluxio（tachyon） 0.7.1，spark-1.5.2,hadoop-2.6.01.问题描述1.1运行alluxioHDFS.sh的时候出现错误：hadoop@Master:~/disk2/xubo/project/alignment/SparkSW/

2017-03-06 10:43:56 1755

原创 Spark问题2之window下载hdfs文件的hosts设置

更多代码请见：https://github.com/xubo245/SparkLearningSpark生态之Alluxio学习版本：alluxio（tachyon） 0.7.1，spark-1.5.2,hadoop-2.6.01.问题描述window下打开hdfs，会出现问题：hdfs常配置了hosts，访问或者下载hdfs的文件时（通过浏览器），回出现跳转的是hostname，而不是ip，而w

2017-03-06 10:43:10 970

原创 Spark问题1之读入参考序列的adam格式报错empty max

原因：读取方法不对// val rdd = sc.loadParquetContigFragments(args(0))解决办法：val rdd = sc.loadSequence(args(0))运行记录：hadoop@Master:~/xubo/project/load/loadfastaFromHDFSAdamAndCount$ ./loadGRCH38chr14.sh start

2017-03-06 10:42:49 1890

原创 RISELab实验室正在开发的Ray项目初步尝试记录

Ray是RISELab实验室在开发的一个项目，目前还没有发布（20170208，20170124就听说了），想试下，听说架构和性能都有很大提升。之前学的Spark就是RISELab的前生，AMPlab开发的。sudo apt-get update报错了：Err http://archive.ubuntukylin.com:10006 trusty Release.gpg Unable to co

2017-02-08 21:42:41 1642

原创集群部署和配置工具Mark

最近在跟蚂蚁金服的工程师聊的时候，突然想起一个问题，特地请教了下。如果有上千台服务器，如何快速的部署？包括系统和软件。如果一个一个系统安装操作系统，肯定很麻烦，而且耗时。之前也问过一些朋友，但都没得到好的回答。望玄师兄介绍了两个工具ansible和puppet。Mark下。大概查了下，没花太多时间。ansible是新出现的自动化运维工具，基于Python开发，集合了众多运维工具（puppet、cfe

2017-02-08 20:37:24 1204

原创使用阿里云E-MapReduce遇到的那些坑

由于需要做实验来验证自己系统的scalability，实验室机器数又不够，所以选择用商业服务器来完成实验。在AWS和阿里云之间选择了阿里云。在完成试验后对实验过程中遇到的那些坑进行了总结。　　自己的实验主要是做一个分布式序列比对系统（DSA: Distributed Sequence Alignment System）,测试其中相关算法的scalability　　由于是要测性能，所以选择了独享

2017-01-24 08:13:53 7462

原创 Ubuntu下解决IDEA与Maven的配置问题：Dmaven.multiModuleProjectDirectory system property is not set

1 问题描述当在ubuntu下使用IDEA 13.0 maven3.3.9的时候mvn package出现问题/usr/lib/jvm/java-7-openjdk-amd64/bin/java -Dmaven.home=/home/xubo/cloud/apache-maven-3.3.9 -Dclassworlds.conf=/home/xubo/cloud/apache-maven-3.3.9

2017-01-22 21:06:31 2265 2

原创基础知识学习1之哈希(Hash)

1.理解1.1 基本概念理解哈希， Hash的音译，也有翻译做“散列”。可以理解为映射的一种。T[hash[x]]=x, 通过hash函数将值x转换成T对应的下标，然后将值写到该位置. 比如除法hash： 10%9取余1，将10存在T[1].hash平均时间要比链表快，同时可以节省比数组更少的空间。假设关键字集合为K，关键字域为U，空间可以将至O(|K|).hash使用的场景是数的集合比数的

2017-01-15 21:10:33 971

原创 Adam学习27之序列化问题解决办法

1.问题：1.1 描述当读入fastq文件后，需要进行转换，比如collect或者读取属性，会出现没有序列化的问题1.2 问题代码：package org.dsw.coreimport org.apache.spark.{SparkContext, SparkConf}import org.bdgenomics.adam.rdd.ADAMContext._/** * Created by xu

2016-12-16 17:29:55 1157

原创 Spark生态之Alluxio学习25--spark从HDFS和Alluxio读取时间比较

更多代码请见：https://github.com/xubo245/SparkLearningSpark生态之Alluxio学习版本：alluxio-1.3.0（tachyon），spark-1.5.2,hadoop-2.6.01.解释想要分析alluxio加速效果，发现alluxio会出现长尾效应，导致有些task特别耗时，相对于HDFS，并没有明显优势。2.代码：#~/cloud/allux

2016-12-15 18:42:05 935 3

原创 Spark生态之Alluxio学习24--分别读取HDFS和Alluxio的数据进行line count比较分析

更多代码请见：https://github.com/xubo245/SparkLearningSpark生态之Alluxio学习版本：alluxio-1.3.0（tachyon），spark-1.5.2,hadoop-2.6.01.解释测量alluxio和hdfs哪个速度快2.代码：2.1 hdfs for i in 1 2 3 4 5 6 7 8 9 10 11 12 13 14

2016-12-15 18:41:37 1165

原创 Spark生态之Alluxio学习23--alluxio-0.7.1解决数据本地化的问题

更多代码请见：https://github.com/xubo245/SparkLearningSpark生态之Alluxio学习版本：alluxio-1.3.0（tachyon），spark-1.5.2,hadoop-2.6.01.解释spark-1.5.2和alluxio-1.3.0默认不支持数据本地化，所以会有跨节点传输2.解决办法：2.1 方法1使用alluxio-0.7.12.2 方法2设

2016-11-16 19:53:55 755

原创 Spark生态之Alluxio学习22--saveAsTextFile alluxio后count （有bug）

更多代码请见：https://github.com/xubo245/SparkLearningSpark生态之Alluxio学习版本：alluxio-1.3.0（tachyon），spark-1.5.2,hadoop-2.6.00 以下分析是主要是基于spark-1.5.2和alluxio-1.3.0分析的，默认不支持数据本地化，所以会有跨节点传输1.解释先加载到mem，然后count出现问题，D

2016-11-16 19:52:56 894

原创 Spark生态之Alluxio学习21--疑问：为什么master看不到D9,而work节点有D9部分数据？

更多代码请见：https://github.com/xubo245/SparkLearningSpark生态之Alluxio学习版本：alluxio-1.3.0（tachyon），spark-1.5.2,hadoop-2.6.01.解释1.1 疑问为什么master看不到D9,而work节点有D9部分数据？master也显示内存不止D7,因为D7大概2G可以free掉D92.截图2.1 maste

2016-11-16 19:52:39 730

原创 Spark生态之Alluxio学习20--疑问：为什么D7-D9到最后的count时间都是30s左右？

2016-11-16 19:51:51 1564

原创 Spark生态之Alluxio学习19--alluxio性能提升分析与实验1

2016-11-16 19:51:24 1264

原创 Spark生态之Alluxio学习18--alluxio work 3000出现问题

更多代码请见：https://github.com/xubo245/SparkLearningSpark生态之Alluxio学习版本：alluxio-1.3.0（tachyon），spark-1.5.2,hadoop-2.6.01.解释待解决1.1 问题访问http://mcnode6:30000/home时出现的Problem accessing /home. Reason: Server

2016-11-16 19:50:45 739

原创 Spark生态之Alluxio学习17--多次运行Space Usage增加

2016-11-16 19:50:30 1141

原创 Spark生态之Alluxio学习16--将alluxio文件的block大小修改

更多代码请见：https://github.com/xubo245/SparkLearningSpark生态之Alluxio学习版本：alluxio-1.3.0（tachyon），spark-1.5.2,hadoop-2.6.01.alluxio-1.3.01.0 默认大小alluxio-1.3.0的block默认大小为512M，对于我每个节点只有10几个G的小内存来说，而且还需要进行spark运

2016-11-16 19:50:02 2117

原创 Spark生态之Alluxio学习15--alluxio性能分析和加速方式

2016-11-16 19:49:32 1180

原创 Spark生态之Alluxio学习14--alluxio内存文件加载方式和分布情况分析

更多代码请见：https://github.com/xubo245/SparkLearningSpark生态之Alluxio学习版本：alluxio-1.3.0（tachyon），spark-1.5.2,hadoop-2.6.0 alluxio-0.7.1 通过copyFromLocal指令上传文件，发现D1Line.fasta-D6Line.fasta都在一个节点，Mcnode1;

2016-11-16 19:49:14 5193 3

原创 Spark生态之Alluxio学习13--Alluxio-1.3.0启动疑问

更多代码请见：https://github.com/xubo245/SparkLearningSpark生态之Alluxio学习版本：alluxio-1.3.0（tachyon），spark-1.5.2,hadoop-2.6.01.解释1.1 alluxio.env.sh 配置# The directory where a worker stores in-memory data. (Defaul

2016-11-16 19:48:34 924

原创 Spark生态之Alluxio学习12--spark调用alluxio-1.3.0配置

更多代码请见：https://github.com/xubo245/SparkLearningSpark生态之Alluxio学习版本：alluxio（tachyon） 0.7.1，spark-1.5.2,hadoop-2.6.01.解释alluxio-0.7.1 启动的时候spark集群直接可以调用，但是alluxio-1.3.0需要自动配置2.代码：2.1 下载http://www.alluxi

2016-11-10 14:55:00 1017

原创 Spark生态之Alluxio学习11--alluxio-1.3.0集群配置

更多代码请见：https://github.com/xubo245/SparkLearningSpark生态之Alluxio学习版本：alluxio（tachyon） 0.7.1，spark-1.5.2,hadoop-2.6.0在spark上运行alluxio请参考下一个博客1.解释1.1 下载wget http://alluxio.org/downloads/files/1.3.0/alluxi

2016-11-10 14:54:03 1468

原创 ganglia学习2之Spark编译带有ganglia的版本

更多代码请见：https://github.com/xubo245/SparkLearningspark源码解读系列环境：spark-1.5.2、hadoop-2.6.0、scala-2.10.4，ganglia-3.6.11.理解Spark除了自带的UI，还支持ganglia，编辑$SPARK_HOME/conf/metrics.properties文件（没有可以拷贝metrics.propert

2016-11-09 15:37:16 1368

原创 Spark生态之Alluxio学习10--集群问法全部启动问题解决

更多代码请见：https://github.com/xubo245/SparkLearningSpark生态之Alluxio学习版本：alluxio（tachyon） 0.7.1，spark-1.5.2,hadoop-2.6.01.解释1.1 问题描述之前博文没有解决的问题：http://blog.csdn.net/xubo245/article/details/51325834具体：hadoop

2016-11-06 20:52:28 1186

原创 Spark2学习2之window下编译spark-2.0.0

更多代码请见：https://github.com/xubo245/SparkLearning Spark中组件Ml的学习 1.解释(1)基本配置： scala-2.11.8 java1.7 maven3.3.9(2) 下载：github(3) 设定jvm参数：export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=512M -XX:ReservedCode

2016-07-30 17:16:30 3724 3

原创 Spark2学习1之基本环境搭建（win）问题

更多代码请见：https://github.com/xubo245/SparkLearning 版本：Spark-2.0.01解释从【2】中下载release版，idea打开mvn package，报错遇到的问题：main:[INFO] ------------------------------------------------------------------------[INFO

2016-07-30 17:15:08 74822 1

原创 idea遇到的问题-无法新建maven 项目

1.问题：新建maven project的时候一直是： maven loading archetype list 新建后没有maven生命周期等2.问题分析：之前修改了maven的vm配置3.解决办法： 3.1.删除缓存：用户目录下.IdeaIC2016\system的maven文件夹 3.2.去掉vm：-Xmx2g -XX:MaxPermSize=512M -XX:ReservedC

2016-07-23 17:41:48 6078

原创基因数据处理70之Picard安装没成功

1.下载：https://github.com/broadinstitute/picard.git2.安装：hadoop@Master:~/xubo/tools/picard$ ant clone-htsjdkBuildfile: /home/hadoop/xubo/tools/picard/build.xmlclone-htsjdk: [exec] Cloning into 'hts

2016-06-07 20:37:26 4161

原创基因数据处理52之cs-bwamem集群版运行（1千万条100bp的reads）

1.art生成模拟序列： art_illumina -ss HS20 -i GRCH38BWAindex/GRCH38chr1L3556522.fna -l 100 -c 10000000 -o g38L100c10000000Nhs202.上传到hdfs，制定partition数spark-submit --class cs.ucla.edu.bwaspark.BWAMEMSpark --ma

2016-06-03 16:29:02 1327

原创基因数据处理51之cs-bwamem集群版运行*

将master的local改为集群就可以了。集群运行结果：问题：匹配50条的时候，bwa和snap都是生成50条。但是cs-bwamem会生成492条，其中25和50条重读的很多，匹配位置不同。不知道为啥？记录：D:\1win7\java\jdk\bin\java -Didea.launcher.port=7538 "-Didea.launcher.bin.path=D:\1win7\ide

2016-06-03 14:21:19 901

原创基因数据处理50之cs-bwamem、bwa、snap、bwa-mem与art比较

直接看结果：hadoop@Master:~/cloud/adam/xubo/data/GRCH38Sub/cs-bwamem$ cat G38L100c50Nhs20.aln ##ART_Illumina read_length 100@CM art_illumina -ss HS20 -i GRCH38chr1L3556522.fna -l 100 -c 50 -o G38L100c50Nh

2016-06-03 13:58:48 6417

原创基因数据处理49之cloud-scale-bwamem运行成功

1.先使用art生成数据：请看前一篇2.上传fastq到hdfs：hadoop@Master:~/cloud/adam/xubo/data/GRCH38Sub/cs-bwamem$ spark-submit --class cs.ucla.edu.bwaspark.BWAMEMSpark --master local[2] /home/hadoop/xubo/tools/cloud-scal

2016-06-03 12:52:10 1400

原创基因数据处理48之ART使用实例

相关参数请见上一篇1.使用实例1： hadoop@Master:~/cloud/adam/xubo/data/GRCH38Sub/cs-bwamem$ art_illumina -ss HS20 -i GRCH38chr1L3556522.fna -l 100 -f 20 -o G38L100F20Nhs20 ====================ART===================

2016-06-03 10:19:23 4413

原创基因数据处理47之ART基因序列数据生成器（仿真）

1.概念： ART基因序列数据生成器详细请见论文：【1】和官网【2】2.下载： ART-bin-GreatSmokyMountains-04.17.16-Linux64.tgzhttp://www.niehs.nih.gov/research/resources/assets/docs/artbingreatsmokymountains041716linux64tgz.tgz3.配置

2016-06-02 23:12:15 2625

原创基因数据处理46之cloud-scale-bwamem安装（compile.pl安装没问题）

版本：https://github.com/ytchen0323/cloud-scale-bwamem/releases/tag/v0.2.11.需要设置spark路径： <!--<systemPath>/cluster/spark/spark-1.3.1-bin-hadoop2.4/lib/spark-assembly-1.3.1-hadoop2.4.0.jar</syste

2016-06-02 19:00:42 1009

原创基因数据处理45之cloud-scale-bwamem安装（compile.pl安装有问题）

先把jar包导到制定文件夹：hadoop@Master:~/xubo/tools/cloud-scale-bwamem/src/main/alphadata$ sudo mkdir -p /curr/pengwei/github/cloud-scale-bwamem/target/[sudo] password for hadoop: hadoop@Master:~/xubo/tools/clo

2016-06-02 10:19:10 947

原创基因数据处理44之cloud-scale-bwamem安装

cloud-scale-bwamem是在spark等云环境上实现bwa-mem算法，加速对全基因组匹配的处理 1.下载：git clone https://github.com/ytchen0323/cloud-scale-bwamem.git2.编译：mvn clean package3.编译成功：[INFO] -----------------------------------------

2016-06-02 09:37:57 1195

原创基因数据处理43之mango之503错误

HTTP ERROR: 503Problem accessing /. Reason: Service UnavailablePowered by Jetty://更详细请见： https://github.com/bigdatagenomics/mango/issues/181

2016-05-30 20:35:02 594

CarbonData学习资料

Apache CarbonData学习文档汇总，包含视频/文档/文件等。

2018-11-22

opencv 3.4.1 jar

opencv-341.jar. for invoking opencv,you can add the code to your project

2018-05-16

高级Shell脚本编程

高级Shell脚本编程,高级Shell脚本编程

2016-03-15

2015年中国软件开发者白皮书

2016-01-12

neo4j-javadocs-2.3.1-javadoc.jar

neo4j-javadocs-2.3.1-javadoc.jar neo4j 2.3.1 API

2015-11-26

neo4j-enterprise-2.3.1-unix.tar.gz

neo4j-enterprise-2.3.1-unix.tar.gz，官网下载

2015-11-25

neo4j-enterprise-2.3.0-M03-unix.tar.gz

neo4j-enterprise-2.3.0-M03-unix.tar.gz,官网下载

2015-11-25

资金流入流出预测大赛冠军答辩PPT

资金流入流出预测大赛冠军答辩PPT，资金流入流出预测冠军答辩PPT 阿里云天池

2015-09-09

redis-3.0.4安装包

redis-3.0.4.tar.gz，redis-3.0.4安装包，官网下载

2015-09-09

JDK.API.7_English.chm

JDK.API.7_English.chm Java™ Platform, Standard Edition 7 API Specification This document is the API specification for the Java™ Platform, Standard Edition.

2015-08-24

Java 2 SE 6 Documentation.chm

Java 2 SE 6 Documentation.chm JavaTM SE 6 Platform at a Glance This document covers the JavaTM Platform, Standard Edition 6 JDK. Its product version number is 6 and developer version number is 1.6.0, as described in Platform Name and Version Numbers. For information on a feature of the JDK, click on a component in the diagram below.

2015-08-24

JavaSE中文API.chm

JavaSE中文API.chm JavaTM 2 Platform Standard Edition 5.0 API 规范本文档是 Java 2 Platform Standard Edition 5.0 的 API 规范。

2015-08-24

jdk api 1.7英文版-带索引

java, jdk api 1.7英文版-带索引,English,Index,Java™ Platform, Standard Edition 7 API Specification

2015-08-24

微软、谷歌、百度、腾讯等各大公司笔试面试题整理全版.rar

2015-08-20

10部算法经典著作的合集

2015-08-20

百度人搜，阿里巴巴，腾讯华为小米搜狗笔试面试八十题.pdf

2015-08-20

色彩空间转换matlab

色彩空间转换matlab RGB HSV YIQ NTSC

2014-04-14

isrgb.m,matlab

isrgb.m matlab rgb function y = isrgb(x) %ISRGB Return true for RGB image. % FLAG = ISRGB(A) returns 1 if A is an RGB truecolor image and % 0 otherwise. % % ISRGB uses these criteria to determine if A is an RGB image: % % - If A is of class double, all values must be in the range % [0,1], and A must be M-by-N-by-3. % % - If A is of class uint8 or uint16, A must be M-by-N-by-3. % % Note that a four-dimensional array that contains multiple RGB % images returns 0, not 1. % % Class Support % ------------- % A can be of class uint8, uint16, or double. If A is of % class logical it is considered not to be RGB. % % See also ISBW, ISGRAY, ISIND. % Copyright 1993-2003 The MathWorks, Inc. % $Revision: 1.15.4.2 $ $Date: 2003/08/23 05:52:55 $ wid = sprintf('Images:%s:obsoleteFunction',mfilename); str1= sprintf('%s is obsolete and may be removed in the future.',mfilename); str2 = 'See product release notes for more information.'; warning(wid,'%s\n%s',str1,str2); y = size(x,3)==3; if y if isa(x, 'logical') y = false; elseif isa(x, 'double') % At first just test a small chunk to get a possible quick negative m = size(x,1); n = size(x,2); chunk = x(1:min(m,10),1:min(n,10),:); y = (min(chunk(:))>=0 && max(chunk(:))=0 && max(x(:))<=1); end end end

2014-03-27

C语言头函数包include

C语言头函数包include stdio.h stdlib.h等

2013-10-18

计算方法实验Gauss_Seidel法和Runge_Kutta法

计算方法实验说明文档 PB10210016 徐波实验要求：第二版208页程序15 第二版208页程序20，将二阶改为四阶，求第二个实验环境：操作系统：Windows8 64位　编译软件:Code::Blocks 版本：10.05 位数：32位实验提交时间：　考前实验说明： Gauss_Seidel：左侧为数据文档，为了方便多次测试，可将txt文档中数据复制到exe中运行，输入规范请见上图上图为正确输出之一 Runge_Kutta 左侧为数据文档，为了方便多次测试，可将txt文档中数据复制到exe中运行，输入规范请见上图上图为正确输出之一附件：程序15：Gauss_Seidel代码、可运行exe程序、输入数据文件和运行截图程序20：Runge_Kutta代码、可运行exe程序、输入数据文件和运行截图实验心得：　　通过这次实验，对Gauss_Seidel法和Runge_Kutta法了解更深，并且有了实际运行经验，而且通过编程，对方法每一步的运算数据的输入输出了解更深，总的来说收获很大，我们应该多写些类似的程序，希望能将其放在网页上，输入数据就能运行出结果。 PB10210016 徐波 2013.5.28 代码请联系QQ：601450868　　

2013-10-17

空空如也

TA创建的收藏夹 TA关注的收藏夹

TA关注的人