自定义博客皮肤VIP专享

*博客头图:

格式为PNG、JPG,宽度*高度大于1920*100像素,不超过2MB,主视觉建议放在右侧,请参照线上博客头图

请上传大于1920*100像素的图片!

博客底图:

图片格式为PNG、JPG,不超过1MB,可上下左右平铺至整个背景

栏目图:

图片格式为PNG、JPG,图片宽度*高度为300*38像素,不超过0.5MB

主标题颜色:

RGB颜色,例如:#AFAFAF

Hover:

RGB颜色,例如:#AFAFAF

副标题颜色:

RGB颜色,例如:#AFAFAF

自定义博客皮肤

-+
  • 博客(247)
  • 收藏
  • 关注

原创 svm-overview

 svm支持向量机的原理(转)支持向量机SVM(一)Stanford机器学习---第八讲. 支持向量机SVM手把手教你实现SVM算法(一)https://www.zhihu.com/question/21094489 ...

2017-09-17 14:50:24 229

原创 PIC correct errors

https://book.douban.com/people/eaglex/annotation/3056375/lib for svmhttps://www.csie.ntu.edu.tw/~cjlin/libsvm/

2017-09-07 17:37:02 336

原创 random variable distribution

https://www.zybang.com/question/b867b0b455406ce15e84abf27553a9cf.html

2017-09-04 17:46:10 269

原创 radial-basis function

 ref:径向基核函数 (Radial Basis Function)–RBF径向基(Radial basis function)神经网络、核函数的一些理解7 核函数(Kernels)https://en.wikipedia.org/wiki/Radial_basis_functionhttp://www.blogjava.net/zhenandaci/...

2017-09-04 16:59:03 1209

原创 math-dot product and vector product

  ref:https://wenku.baidu.com/view/cdd78d48a58da0116d17498f.html

2017-08-23 16:44:56 195

hbase write flow(byte level)

  here is a byte flow of mutationlevelformatusagetop(abstract,user facing)[Put,Put…]HTable#put(list)encapsulation[HLogKey,WALEdit] WALEdit:kv1,kv2 || v[t...

2017-08-14 17:04:18 99

hadoop-replication written flow

 w:net writer :net read(correspond to net write)wl:write locally,ie. fs writecost time:t4-t0=1 client w+2 DN w+ 1 DN wl~3w + 1wl ~ 1wl (assume disk write is bottle neck)   

2017-08-14 17:00:54 100

原创 bloom filters

wikicsdn blogcnblog java implement

2017-06-09 17:35:00 121

原创 python course

python textbook

2017-06-07 13:40:13 368

原创 the fundation of information theory

 信息论基础各章参考答案北邮信息论2006年期中试题答案标准A卷信息论-姜丹信息习题信息论xxx32.14------------北邮信息论课件2------------例题:信息论基础教程 第二章 (李亦农 李梅 著) 北京邮电大学出版社 ...

2017-05-09 23:26:04 321

原创 algorithms design techniques and analysis

算法设计技巧与分析答案算法设计技巧与分析-答案 任课教师贺全兵(计科系

2017-05-09 23:23:37 436

原创 algorithms abstract

todo  ref:算法的时间复杂度和空间复杂度-总结为什么见周围人描述算法复杂度都用大 O 符号而不是大 Θ? 

2017-05-07 17:50:30 102

原创 可数集与不可数集合

http://blog.csdn.net/zdarks/article/details/46994925

2017-04-11 15:05:25 2314

原创 搜索引擎有多聪明?

ref:https://www.seozac.com/seo/smart-blackhat/

2017-02-11 13:56:13 105

原创 搜索引擎中的信息处理和概率论

   info theory and maths used in search match

2017-02-06 15:57:02 236

spark-broadcast in spark

   go through this block codes below,we will figure out some conclusions:val barr1 = sc.broadcast(arr1) //-broadcast a array with 1M int elements //-this is a embedded broadcast wrapped b...

2016-12-22 15:54:25 169

spark-storage/memory used in spark

  access pattern in spark storage     [1]到目前为止,我们已经了解了spark怎么使用JVM的内存以及集群上执行槽是什么,目前为止还没有谈到task的一些细节,这将在另一个文章中提高,基本上就是spark的一个工作单元,作为exector的jvm进程中的一个线程执行,这也是为什么spark的job启动时间快的原因,在jvm中启...

2016-12-12 16:31:20 376

原创 spark-hive on spark

总体设计Hive on Spark总体的设计思路是,尽可能重用Hive逻辑层面的功能;从生成物理计划开始,提供一整套针对Spark的实现,比如SparkCompiler、SparkTask等,这样Hive的查询就可以作为Spark的任务来执行了。以下是几点主要的设计原则。尽可能减少对Hive原有代码的修改。这是和之前的Shark设计思路最大的不同。Shark对Hive的改动太大...

2016-12-06 15:04:03 142

原创 spark-RDD vs DataFrame vs DataSet

 In summation, the choice of when to use RDD or DataFrame and/or Dataset seems obvious. While the former offers you low-level functionality and control, the latter allows custom view and structure...

2016-11-29 15:38:24 120

原创 [spark-src-core] 8. trivial bug in spark standalone executor assignment

    yep from [1] we know that spark will divide jobs into two steps to be executed:a.launches executors and b.assigns tasks to that executors by driver.so how do executors are assigned to workers ...

2016-11-22 17:24:48 91

[spark-src-core] 7.1 application in spark-PageRank

  below code path are all from sparks' example beside some comments are added by me. val lines = ctx.textFile(args(0), 1) //-1 generate links of <src,targets> pair var links = li...

2016-11-03 15:59:12 118

原创 [spark-src-core] 6. checkpoint in spark

   same as others big data technology,CheckPoint is a well-knowed solution to keep data a snapshot for speeduping failovers,ie. restores to most recent checkpoint state of data ,so u will not need t...

2016-10-19 17:14:46 110

[spark-src-core] 5.big data techniques in spark

  there are several nice techniques in spark,eg. in user api side.here will dive into it check how does spark  implement them. 1.abstract(functions in RDD)groupfunctionfeature principl...

2016-10-12 17:48:38 83

[spark-src-core] 4.2 communications b/t certain kernal components

  there are several component entities run as daemons in spark(standalone),know to what/how they are working is necessary indeed.    akka msg flow similar to tcp  note:register driver =R...

2016-09-27 12:26:41 102

[spark-src-core] 4.1 spark on yarn

  as the officials statements,spark is a computation framework,ie u can use it anywhere on which supplys a platform (eg yarn ,mesos) to run .  so in this cluster manager,the all spark's daemons ar...

2016-09-27 12:16:42 137

[spark-src-core] 3.3 run spark in standalone(cluster) mode

  simiar to the prevous article,this one is focused on cluster mode.1.issue command./bin/spark-submit --class org.apache.spark.examples.JavaWordCount --deploy-mode cluster --master spark://gzsw...

2016-09-19 12:30:17 265

[spark-src-core] 3.2.run spark in standalone(client) mode

1.startup command./bin/spark-submit --class org.apache.spark.examples.JavaWordCount --deploy-mode client --master spark://gzsw-02:7077 lib/spark-examples-1.4.1-hadoop2.4.0.jar hdfs://host02:/user...

2016-09-19 11:55:38 113

[spark-src-core] 3.run spark in cluster(local) mode

  yep ,just the same with your guess,there are many deploy modes in spark,eg standalone,yarn,mesos etc.go advance step,the standalone mode can be devided into standalone,cluster(local) mode.the form...

2016-09-02 17:53:54 191

[spark-src-core] 2.5 core concepts in Spark

1.overview in wordcount-memory tips:Job > Stage > Rdd > DependencyRDDs are linked by Dependencies. 2.terms-RDD is associated by Dependency,ie Dependency is a warpper of RDD....

2016-08-25 17:38:41 91

[spark-src-core] 2.4 communications b/t certain kernal components

1  data flow overview note:-arrow here is means by:bold line is as data line ‘w/o sender and recevier meanings’ but only with data ‘from-to’-two ways to retieve task result:direct result and  i...

2016-08-25 17:36:14 132

[spark-src-core] 2.3 shuffle in spark

1.flow1.1 shuffle abstract    1.2 shuffle flow    1.3 sort flow  in shuffle    1.4 data structure in mem 2.core code paths //SortShuffleWriteroverride def write(records: Iterat...

2016-08-25 16:31:09 111

[spark-src-core] 2.2 job submitted flow for local mode-part II

  in this section,we will verify that  how does spark collect data from prevous stage to next stage(result task)    figure after finishing ShuffleMapTask computation(ie post process ).note:the l...

2016-08-25 11:23:42 170

原创 [spark-src-core] 2.2 job submitted flow for local mode-part I

  now we will dive into spark internal as per this simple example(wordcount,later articles will reference this one by default) belowsparkConf.setMaster("local[2]") //-local[*] by default//leib-c...

2016-08-24 17:36:23 145

原创 [spark-src-core] 2.1 relationships b/t misc spark shells

  similar to other open source projects,spark has several shells are listed theresbinserver side shells   start-all.shstart the whole spark daemons(ie. start-master.sh,start-slav...

2016-06-01 16:01:36 103

原创 scala- Scala对象比较==、eq、ne与java==、equals()

如果你想比较一下看看两个对象是否相等,可以使用或者==,或它的反义 !=。(对所有对象都适用,而不仅仅是基本数据类型)?1234scala> 1 ==  2res24: Boolean = falsescala> 1 !=  2res25: Boolean = true这些操作对所有...

2016-04-22 15:08:50 242

[spark-src-core] given SPARK_PRINT_LAUNCH_COMMAND to output more details

with enabling both system environment 'SPARK_PRINT_LAUNCH_COMMAND' and --verbose ,the spark command is more detailed that outputed from spark-submit.sh: hadoop@GZsw04:~/spark/spark-1.4.1-bin-hado...

2016-04-19 12:19:13 174

原创 scala- type conversion( classOf ,asInstanceOf,isInstanceOf)

ref :scala object 转Class Scala强制类型转换

2016-04-14 15:28:50 127

[spark-src] 1-overview

what is  "Apache Spark™ is a fast and general engine for large-scale data processing....Run programs up to 100x faster than Hadoop MapReduce in memory, or 10x faster on disk." stated in apache spa...

2016-03-20 16:20:31 129

[spark-src]-source reading

 base on :  spark-1.4.1  hadoop-2.5.2   Base from simpleness to complexity and working flow principle,we conform to these steps:1.[spark-src] spark overview2.[spark-src] core   from ...

2016-03-20 15:06:25 108

free talk-intelligent period prediction is undergoing

google AlphaGo vs Lee on 'the game of go' VS 回广州了,再战江湖cheers

2016-03-16 10:15:15 64

空空如也

空空如也

TA创建的收藏夹 TA关注的收藏夹

TA关注的人

提示
确定要删除当前文章?
取消 删除