leibnitz09-CSDN博客

原创 svm-overview

svm支持向量机的原理（转）支持向量机SVM（一）Stanford机器学习---第八讲. 支持向量机SVM手把手教你实现SVM算法（一）https://www.zhihu.com/question/21094489 ...

2017-09-17 14:50:24 284

原创 PIC correct errors

https://book.douban.com/people/eaglex/annotation/3056375/lib for svmhttps://www.csie.ntu.edu.tw/~cjlin/libsvm/

2017-09-07 17:37:02 389

原创 random variable distribution

https://www.zybang.com/question/b867b0b455406ce15e84abf27553a9cf.html

2017-09-04 17:46:10 325

原创 radial-basis function

ref:径向基核函数 (Radial Basis Function)–RBF径向基(Radial basis function)神经网络、核函数的一些理解7 核函数（Kernels）https://en.wikipedia.org/wiki/Radial_basis_functionhttp://www.blogjava.net/zhenandaci/...

2017-09-04 16:59:03 1285

原创 math-dot product and vector product

ref:https://wenku.baidu.com/view/cdd78d48a58da0116d17498f.html

2017-08-23 16:44:56 234

hbase write flow(byte level)

here is a byte flow of mutationlevelformatusagetop(abstract,user facing)[Put,Put…]HTable#put(list)encapsulation[HLogKey,WALEdit] WALEdit:kv1,kv2 || v[t...

2017-08-14 17:04:18 137

hadoop-replication written flow

w:net writer :net read(correspond to net write)wl:write locally,ie. fs writecost time:t4-t0=1 client w+2 DN w+ 1 DN wl~3w + 1wl ~ 1wl (assume disk write is bottle neck)

2017-08-14 17:00:54 128

原创 bloom filters

wikicsdn blogcnblog java implement

2017-06-09 17:35:00 149

原创 python course

python textbook

2017-06-07 13:40:13 410

原创 the fundation of information theory

信息论基础各章参考答案北邮信息论2006年期中试题答案标准A卷信息论-姜丹信息习题信息论xxx32.14------------北邮信息论课件2------------例题:信息论基础教程第二章 (李亦农李梅著) 北京邮电大学出版社 ...

2017-05-09 23:26:04 354

原创 algorithms design techniques and analysis

算法设计技巧与分析答案算法设计技巧与分析-答案任课教师贺全兵(计科系

2017-05-09 23:23:37 491

原创 algorithms abstract

todo ref:算法的时间复杂度和空间复杂度-总结为什么见周围人描述算法复杂度都用大 O 符号而不是大 Θ?

2017-05-07 17:50:30 130

原创可数集与不可数集合

http://blog.csdn.net/zdarks/article/details/46994925

2017-04-11 15:05:25 2384

原创搜索引擎有多聪明？

ref:https://www.seozac.com/seo/smart-blackhat/

2017-02-11 13:56:13 141

原创搜索引擎中的信息处理和概率论

info theory and maths used in search match

2017-02-06 15:57:02 288

spark-broadcast in spark

go through this block codes below,we will figure out some conclusions:val barr1 = sc.broadcast(arr1) //-broadcast a array with 1M int elements //-this is a embedded broadcast wrapped b...

2016-12-22 15:54:25 206

spark-storage/memory used in spark

access pattern in spark storage [1]到目前为止，我们已经了解了spark怎么使用JVM的内存以及集群上执行槽是什么，目前为止还没有谈到task的一些细节，这将在另一个文章中提高，基本上就是spark的一个工作单元，作为exector的jvm进程中的一个线程执行，这也是为什么spark的job启动时间快的原因，在jvm中启...

2016-12-12 16:31:20 436

原创 spark-hive on spark

总体设计Hive on Spark总体的设计思路是，尽可能重用Hive逻辑层面的功能；从生成物理计划开始，提供一整套针对Spark的实现，比如SparkCompiler、SparkTask等，这样Hive的查询就可以作为Spark的任务来执行了。以下是几点主要的设计原则。尽可能减少对Hive原有代码的修改。这是和之前的Shark设计思路最大的不同。Shark对Hive的改动太大...

2016-12-06 15:04:03 182

原创 spark-RDD vs DataFrame vs DataSet

In summation, the choice of when to use RDD or DataFrame and/or Dataset seems obvious. While the former offers you low-level functionality and control, the latter allows custom view and structure...

2016-11-29 15:38:24 156

原创 [spark-src-core] 8. trivial bug in spark standalone executor assignment

yep from [1] we know that spark will divide jobs into two steps to be executed:a.launches executors and b.assigns tasks to that executors by driver.so how do executors are assigned to workers ...

2016-11-22 17:24:48 128

[spark-src-core] 7.1 application in spark-PageRank

below code path are all from sparks' example beside some comments are added by me. val lines = ctx.textFile(args(0), 1) //-1 generate links of <src,targets> pair var links = li...

2016-11-03 15:59:12 146

原创 [spark-src-core] 6. checkpoint in spark

same as others big data technology,CheckPoint is a well-knowed solution to keep data a snapshot for speeduping failovers,ie. restores to most recent checkpoint state of data ,so u will not need t...

2016-10-19 17:14:46 153

[spark-src-core] 5.big data techniques in spark

there are several nice techniques in spark,eg. in user api side.here will dive into it check how does spark implement them. 1.abstract(functions in RDD)groupfunctionfeature principl...

2016-10-12 17:48:38 118

[spark-src-core] 4.2 communications b/t certain kernal components

there are several component entities run as daemons in spark(standalone),know to what/how they are working is necessary indeed. akka msg flow similar to tcp note:register driver =R...

2016-09-27 12:26:41 132

[spark-src-core] 4.1 spark on yarn

as the officials statements,spark is a computation framework,ie u can use it anywhere on which supplys a platform (eg yarn ,mesos) to run . so in this cluster manager,the all spark's daemons ar...

2016-09-27 12:16:42 165

[spark-src-core] 3.3 run spark in standalone(cluster) mode

simiar to the prevous article,this one is focused on cluster mode.1.issue command./bin/spark-submit --class org.apache.spark.examples.JavaWordCount --deploy-mode cluster --master spark://gzsw...

2016-09-19 12:30:17 310

[spark-src-core] 3.2.run spark in standalone(client) mode

1.startup command./bin/spark-submit --class org.apache.spark.examples.JavaWordCount --deploy-mode client --master spark://gzsw-02:7077 lib/spark-examples-1.4.1-hadoop2.4.0.jar hdfs://host02:/user...

2016-09-19 11:55:38 148

[spark-src-core] 3.run spark in cluster(local) mode

yep ,just the same with your guess,there are many deploy modes in spark,eg standalone,yarn,mesos etc.go advance step,the standalone mode can be devided into standalone,cluster(local) mode.the form...

2016-09-02 17:53:54 229

[spark-src-core] 2.5 core concepts in Spark

1.overview in wordcount-memory tips:Job > Stage > Rdd > DependencyRDDs are linked by Dependencies. 2.terms-RDD is associated by Dependency,ie Dependency is a warpper of RDD....

2016-08-25 17:38:41 121

[spark-src-core] 2.4 communications b/t certain kernal components

1 data flow overview note:-arrow here is means by:bold line is as data line ‘w/o sender and recevier meanings’ but only with data ‘from-to’-two ways to retieve task result:direct result and i...

2016-08-25 17:36:14 165

[spark-src-core] 2.3 shuffle in spark

1.flow1.1 shuffle abstract 1.2 shuffle flow 1.3 sort flow in shuffle 1.4 data structure in mem 2.core code paths //SortShuffleWriteroverride def write(records: Iterat...

2016-08-25 16:31:09 140

[spark-src-core] 2.2 job submitted flow for local mode-part II

in this section,we will verify that how does spark collect data from prevous stage to next stage(result task) figure after finishing ShuffleMapTask computation(ie post process ).note:the l...

2016-08-25 11:23:42 200

原创 [spark-src-core] 2.2 job submitted flow for local mode-part I

now we will dive into spark internal as per this simple example(wordcount,later articles will reference this one by default) belowsparkConf.setMaster("local[2]") //-local[*] by default//leib-c...

2016-08-24 17:36:23 186

原创 [spark-src-core] 2.1 relationships b/t misc spark shells

similar to other open source projects,spark has several shells are listed theresbinserver side shells start-all.shstart the whole spark daemons(ie. start-master.sh,start-slav...

2016-06-01 16:01:36 137

原创 scala- Scala对象比较==、eq、ne与java==、equals()

如果你想比较一下看看两个对象是否相等，可以使用或者==，或它的反义 !=。（对所有对象都适用，而不仅仅是基本数据类型）?1234scala> 1 == 2res24: Boolean = falsescala> 1 != 2res25: Boolean = true这些操作对所有...

2016-04-22 15:08:50 270

[spark-src-core] given SPARK_PRINT_LAUNCH_COMMAND to output more details

with enabling both system environment 'SPARK_PRINT_LAUNCH_COMMAND' and --verbose ,the spark command is more detailed that outputed from spark-submit.sh: hadoop@GZsw04:~/spark/spark-1.4.1-bin-hado...

2016-04-19 12:19:13 194

原创 scala- type conversion( classOf ,asInstanceOf,isInstanceOf)

ref :scala object 转Class Scala强制类型转换

2016-04-14 15:28:50 166

[spark-src] 1-overview

what is "Apache Spark™ is a fast and general engine for large-scale data processing....Run programs up to 100x faster than Hadoop MapReduce in memory, or 10x faster on disk." stated in apache spa...

2016-03-20 16:20:31 161

[spark-src]-source reading

base on : spark-1.4.1 hadoop-2.5.2 Base from simpleness to complexity and working flow principle,we conform to these steps:1.[spark-src] spark overview2.[spark-src] core from ...

2016-03-20 15:06:25 139

free talk-intelligent period prediction is undergoing

google AlphaGo vs Lee on 'the game of go' VS 回广州了，再战江湖cheers

2016-03-16 10:15:15 104

空空如也

空空如也