自定义博客皮肤VIP专享

*博客头图:

格式为PNG、JPG,宽度*高度大于1920*100像素,不超过2MB,主视觉建议放在右侧,请参照线上博客头图

请上传大于1920*100像素的图片!

博客底图:

图片格式为PNG、JPG,不超过1MB,可上下左右平铺至整个背景

栏目图:

图片格式为PNG、JPG,图片宽度*高度为300*38像素,不超过0.5MB

主标题颜色:

RGB颜色,例如:#AFAFAF

Hover:

RGB颜色,例如:#AFAFAF

副标题颜色:

RGB颜色,例如:#AFAFAF

自定义博客皮肤

-+
  • 博客(71)
  • 资源 (1)
  • 收藏
  • 关注

原创 kudu - impala

partition nums equal to num of cores in clusterkudu optimizes sql if   =, <=, '\<', '\>', >=, BETWEEN, or IN used, but not for !=, LIKE, or any other predicate type

2018-08-12 15:54:49 257

转载 cassandra, hbase and mongodb

cassandra, AP system, weak consistency, heavy write, high availibility, good for online use hbase, CP system, good support on batch analytics, good for analytics, not typical for online use mongodb,...

2018-07-27 11:36:58 247

翻译 Java offheap memory

- MappedByteBufferpublic void copyFile(String filename,String srcpath,String destpath)throws IOException {      File source = new File(srcpath+"/"+filename);      File dest = new File(destpath+"/...

2018-07-18 09:11:15 335

原创 kafka

- user caselog collectionmessage systemuser activitystream processingevent source- designkafka broker leader, multiple brokers contend for being leader by creating ephemeral node in zookeeper. only on...

2018-07-16 17:40:38 212

转载 Spark Trouble Shooting and Performance Tuning

jjjjj

2018-07-11 10:32:30 145

转载 JVM trouble shooting

- JPS, TOP and JSTACK,  jps to find java info, like classname, parameters of main, JVM arguments, pid, jps -m -ltop to find the most CPU-bound thread, top -Hp pidjstack to dump stacks of thread, jstac...

2018-07-10 18:01:33 216

原创 review list

devopsspringboot and microservicesgmlparser, design patternpersistable queue, java volatile, atomacity and concurrency,  mockitovolatile is not atomic, happen-before, happen-after for memory visibilit...

2018-06-25 11:58:10 314

翻译 Software Design

- Design PrinciplesOpen-close, open for extension, close for modificationLiskov substitution, any subclass can be in the place where base class isDemeter, least known principleinterface segregation, p...

2018-06-01 12:19:49 336

原创 submit spark code to yarn

- configure spark to submit code to remote yarn val sparkConf = new SparkConf().setAppName(s"Bulk Import $manualNbr").setMaster("yarn").set("deploy-mode", "client")// ...

2018-05-27 16:11:37 247

转载 compile spark source code

Change scala version to the scala version in your machine: ./dev/change-scala-version.sh <version>Shutdown zinc: ./build/zinc-<version>/bin/zinc -shutdownCompile Spark: ./build/mvn -Pyarn ...

2018-05-25 18:12:01 187

翻译 LSM Log-Structured Merge-Tree

- Sequential access is better than random access -> WAL, append update to log- Memstore in memory for quick lookup -> Memstore which flushes data to store file when reaches valve- Merge multiple...

2018-05-12 19:43:51 141

翻译 B tree vs B+ tree

- B tree (key+data in every node),  O(log(d)(n))d is degree of treeh is height of tree, h<= log(d)((n+1)/2)non-leaf node has n-1 key and n pointers, d<=n<=2dheights of each leaf are samenodes...

2018-05-12 19:03:07 255

翻译 HBase MapReduce

- Data Locality, block placement policy. the first copy is written to the data node where region server runs.- TableInputFormat, divide table at region boundaries by start row and end rowstatic class ...

2018-05-12 15:54:01 124

翻译 HBase Filters, Counters & Coprocessors

- Filter -> FilterBase. setFilter(filter) method on Get and Scan- CompareFilter, operator + comparator , matched data is keptCompareFilter(CompareOp valueCompareOp, WritableByteArrayComparable valu...

2018-05-12 12:02:29 129

翻译 HBase Region Split

- Split Policy (ConstantSizeRegionSplitPolicy, IncreasingToUpperBoundRegionSplitPolicy, SteppingSplitPolicy)- Split Point, The first row of center block of the biggest file of the store- Split Workflo...

2018-05-09 17:55:27 153

翻译 HBase Concept

- Data Model, sparse, distributed, persisted multidimensional sorted map(row:string, column:string, time:int64) -> string //both key and value are uninterpreted bytesRowsingle row read and update i...

2018-05-08 21:23:51 142

翻译 Java GC

young generation and old generation. 1 eden and 2 survivor spaces.minor GC, mark and copy, from eden and one survivor to the other survivorfull GC, mark, sweep and compact generationsboth will stop th...

2018-05-07 17:55:39 141

翻译 bloom filter

- space efficient look up for fixed number of static elements. - may have, definitely no haven: number of elementsk: number of hash functions, k = n*ln2/mm: number of bits, >= n*lg(1/E)*lgeE: expec...

2018-05-07 13:07:58 108

翻译 spark - Running on Cluster

- package spark app (maven)<plugin>    <groupId>org.apache.maven.plugins</groupId>    <artifactId>maven-shade-plugin</artifactId>    <version>2.3</version>

2018-05-05 09:21:14 191

翻译 spark - Tuning and Debugging Spark

- submit application (sparkconf object cannot be changed after SparkContext creationmethod 1bin/spark-submit \—class com.example.MyApp \—master local[4] \—name “My Spark App” \—conf spark.ui.port=...

2018-05-04 18:42:03 144

翻译 spark - Advanced Spark Programming

- Accumulatorval blankLines = new LongAccumulatorsc.register(blankLines)put accumulate in transformation for debugging purpose because of speculative task. it's not accurate. But in action, the accum...

2018-05-03 20:04:33 210

翻译 spark - Loading and Saving Data

- File FormatsText Filesc.textFile, load a text filesc.wholeTextFiles, load multiple files (filename, entire content) under specified dirJSONsc.textFile.map to JSON object (people.add(mapper.readValue...

2018-05-03 18:03:54 124

翻译 scala notes (7) - Advanced Type and Implicit

- advanced typessingleton typedef setTitle(title: String): this.type = { ...; this } // for subtypesdef set(obj: Title.type): this.type = { useNextArgAs = obj; this } //take object as parameter, no ...

2018-04-29 22:48:12 123

翻译 scala notes (6) - Annotation, Future and Type Parameter

- Annotationclass MyContainer[@specialized T]def country: String @Localized@Test(timeout = 0, expected = classOf[org.junit.Test.None])def testSomeFeature() { ... }Java annotation can be mixed with Sc...

2018-04-27 15:34:52 118

翻译 spark - Pair RDD (Key/Value Pairs)

- Create Pair RDDfrom regular RDD by calling map function.val pairs = lines.map(x => (x.split(" ")(0), x))transformation on Pair RDD (data: {(1,2),(3,4),(3,6)})reduceByKey => {(1,2), (3,10)}grou...

2018-04-27 10:24:24 354

翻译 scala notes (5) - pattern and case class

- Pattern and Case Class ch match{ case _ if Character.isDigit(ch) => .. case '+' => ... case _ => ...}prefix match { case "0" | "0x" | "0X" => ...}case variable should be lowercase....

2018-04-26 12:08:49 96

翻译 scala notes (4) - collection

- CollectionArray is equivalent of Java array, it's mutable in terms of value update. but not sizesequenceVector is immutable equivalent of ArrayBuffer which is indexed sequence with fast random acces...

2018-04-25 18:04:21 113

翻译 scala notes (3) - Files & Regular Expression, Trait, Operation and Function

- Files & Regular Expressionsread from file, url and string, remember to close sourceval source = Source.fromFile("myfile.txt", "UTF-8")val source1 = Source.fromURL("http://horstmann.com", "UTF-8...

2018-04-25 11:14:26 120

翻译 scala notes (2) - Class, Object, Package & Import and Inheritance

- Classclass Counter {    private var value = 0 // You must initialize the field, otherwise it's abstract class.    def increment() { value += 1 } // Methods are public by default    def current() ...

2018-04-24 19:05:24 135

翻译 scala notes (1) - Basic, Control & Function, Array and Map & Tuple

- Basicsval greeting: String = nullval xma, ymax = 100 // both are setString -> StringOps //intersect, sorted...Int -> RichInt // 1.to(10)primitive -> Rich*BigInt & BigDecimal // * can be...

2018-04-24 12:02:34 110

翻译 Programming with RDD

- Passing functions to Spark (be careful the reference to the containing object which need to be serializable)class SearchFunctions(val query: String) {def isMatch(s: String): Boolean = {s.contains(...

2018-04-23 18:51:18 81

翻译 scala type parameters

- type bounds class Pair[T <: Comparable[T]](val first: T, val second: T) {def smaller = if (first.compareTo(second) < 0) first else second //compareTo}class Pair[T](val first: T, val seco

2018-04-23 18:23:37 647

转载 MapReduce Features

- Counters (values are definitive only once job has successfully completed)Task CountersFilesystem CountersJob Counters (only in application master. doesn't need to send across network, mainly about t...

2018-04-22 19:52:21 73

翻译 MapReduce Types and Formats

- typesmap: (K1, V1) → list(K2, V2)combiner: (K2, list(V2)) → list(K2, V2)reduce: (K2, list(V2)) → list(K3, V3)- partition (HashPartitioner)public abstract class Partitioner&lt;KEY, VALUE&gt; {public ...

2018-04-21 19:45:47 86

翻译 MapReduce Workflow

check output foldercalculate splitsapplication master gets progress and completion reports from tasks. it also requests containers for map tasks and reduce tasks. it starts container by the nodemanage...

2018-04-21 16:13:32 294

翻译 MapReduce Application

- Configurationconf.addDefaultResource, conf.addResource, configuration overridden &lt;property&gt;&lt;name&gt;fs.defaultFS&lt;/name&gt;&lt;value&gt;file:/// or hdfs://namenode&lt;/value&gt;&lt;/pr...

2018-04-21 11:22:59 228

翻译 Hadoop I/O

- checksum, CRC-32C, for every 512 bits, write, last datanode of the pipeline verifies checksumread, block verification on client readrawlocalfilesystem, to disable checksum- compression, (default is ...

2018-04-20 15:11:40 92

翻译 YARN (Yet Another Resource Negotiator) - Cluster Manager

- what is yarn- Yarn application run- Resources requestall requests up front (Spark) or dynamic request (MapReduce, mapper tasks requests are up front, but reduce tasks are dynamic)- application lifes...

2018-04-19 17:24:24 209

翻译 HDFS

- suitable very large size, terabyte, petabyte write once and read many times handle node failure without noticeable interruption- not suitable for some applications with, low-latency data access, HBa...

2018-04-19 14:51:12 231

原创 Map

HashMap get containsKey next o(1) o(1) o(h/n)Map key to array index to get complexity to O(1) (constant time).resize when table size &gt;= threshold (= table size * load fact...

2018-04-19 13:52:36 161 1

mongodb architecture

latest mongodb architecture guide latest mongodb architecture guide

2018-07-27

空空如也

TA创建的收藏夹 TA关注的收藏夹

TA关注的人

提示
确定要删除当前文章?
取消 删除