- 博客(33)
- 资源 (1)
- 收藏
- 关注
翻译 scala notes (7) - Advanced Type and Implicit
- advanced typessingleton typedef setTitle(title: String): this.type = { ...; this } // for subtypesdef set(obj: Title.type): this.type = { useNextArgAs = obj; this } //take object as parameter, no ...
2018-04-29 22:48:12 117
翻译 scala notes (6) - Annotation, Future and Type Parameter
- Annotationclass MyContainer[@specialized T]def country: String @Localized@Test(timeout = 0, expected = classOf[org.junit.Test.None])def testSomeFeature() { ... }Java annotation can be mixed with Sc...
2018-04-27 15:34:52 114
翻译 spark - Pair RDD (Key/Value Pairs)
- Create Pair RDDfrom regular RDD by calling map function.val pairs = lines.map(x => (x.split(" ")(0), x))transformation on Pair RDD (data: {(1,2),(3,4),(3,6)})reduceByKey => {(1,2), (3,10)}grou...
2018-04-27 10:24:24 344
翻译 scala notes (5) - pattern and case class
- Pattern and Case Class ch match{ case _ if Character.isDigit(ch) => .. case '+' => ... case _ => ...}prefix match { case "0" | "0x" | "0X" => ...}case variable should be lowercase....
2018-04-26 12:08:49 94
翻译 scala notes (4) - collection
- CollectionArray is equivalent of Java array, it's mutable in terms of value update. but not sizesequenceVector is immutable equivalent of ArrayBuffer which is indexed sequence with fast random acces...
2018-04-25 18:04:21 111
翻译 scala notes (3) - Files & Regular Expression, Trait, Operation and Function
- Files & Regular Expressionsread from file, url and string, remember to close sourceval source = Source.fromFile("myfile.txt", "UTF-8")val source1 = Source.fromURL("http://horstmann.com", "UTF-8...
2018-04-25 11:14:26 116
翻译 scala notes (2) - Class, Object, Package & Import and Inheritance
- Classclass Counter { private var value = 0 // You must initialize the field, otherwise it's abstract class. def increment() { value += 1 } // Methods are public by default def current() ...
2018-04-24 19:05:24 130
翻译 scala notes (1) - Basic, Control & Function, Array and Map & Tuple
- Basicsval greeting: String = nullval xma, ymax = 100 // both are setString -> StringOps //intersect, sorted...Int -> RichInt // 1.to(10)primitive -> Rich*BigInt & BigDecimal // * can be...
2018-04-24 12:02:34 109
翻译 Programming with RDD
- Passing functions to Spark (be careful the reference to the containing object which need to be serializable)class SearchFunctions(val query: String) {def isMatch(s: String): Boolean = {s.contains(...
2018-04-23 18:51:18 80
翻译 scala type parameters
- type bounds class Pair[T <: Comparable[T]](val first: T, val second: T) {def smaller = if (first.compareTo(second) < 0) first else second //compareTo}class Pair[T](val first: T, val seco
2018-04-23 18:23:37 644
转载 MapReduce Features
- Counters (values are definitive only once job has successfully completed)Task CountersFilesystem CountersJob Counters (only in application master. doesn't need to send across network, mainly about t...
2018-04-22 19:52:21 73
翻译 MapReduce Types and Formats
- typesmap: (K1, V1) → list(K2, V2)combiner: (K2, list(V2)) → list(K2, V2)reduce: (K2, list(V2)) → list(K3, V3)- partition (HashPartitioner)public abstract class Partitioner<KEY, VALUE> {public ...
2018-04-21 19:45:47 83
翻译 MapReduce Workflow
check output foldercalculate splitsapplication master gets progress and completion reports from tasks. it also requests containers for map tasks and reduce tasks. it starts container by the nodemanage...
2018-04-21 16:13:32 290
翻译 MapReduce Application
- Configurationconf.addDefaultResource, conf.addResource, configuration overridden <property><name>fs.defaultFS</name><value>file:/// or hdfs://namenode</value></pr...
2018-04-21 11:22:59 225
翻译 Hadoop I/O
- checksum, CRC-32C, for every 512 bits, write, last datanode of the pipeline verifies checksumread, block verification on client readrawlocalfilesystem, to disable checksum- compression, (default is ...
2018-04-20 15:11:40 90
翻译 YARN (Yet Another Resource Negotiator) - Cluster Manager
- what is yarn- Yarn application run- Resources requestall requests up front (Spark) or dynamic request (MapReduce, mapper tasks requests are up front, but reduce tasks are dynamic)- application lifes...
2018-04-19 17:24:24 203
翻译 HDFS
- suitable very large size, terabyte, petabyte write once and read many times handle node failure without noticeable interruption- not suitable for some applications with, low-latency data access, HBa...
2018-04-19 14:51:12 228
原创 Map
HashMap get containsKey next o(1) o(1) o(h/n)Map key to array index to get complexity to O(1) (constant time).resize when table size >= threshold (= table size * load fact...
2018-04-19 13:52:36 158 1
翻译 hadoop general
- schema on read vs RDBMS schema on write- data flow- splits,split size tends to be HDFS block size to avoid split spanning two nodes which are difficult to data localitydata locality. same node ->...
2018-04-18 11:20:51 141
翻译 Biggest Flow
- Residual Graph- Biggest Stream Lowest cut- Ford-Fulkerson, O(E*f) f is value of biggest flow- Edmonds-karp, O(VE^2) used BFS to find the path, p, from the Gf. p is shortest
2018-04-17 19:23:34 119
翻译 shortest path of all vertex pairs
- repeated square, O(n^3*lgn), preceding m-1 edges + weight of one more edge. shortest path is choice of k with minimum weight of L(m)l(m, i, j) = min(l(m-1, i, k) + w(k, j)), 1=<k<=n, L(1) = WL...
2018-04-16 17:44:20 164
翻译 single-source shortest path in directed graph
two common operations vertex initialisation and relaxvertex initialisation -> set key of each vertex to max value; set precedence to null; set key of s (start vertex) to 0;relax(u,v,w) -> set ke...
2018-04-16 16:04:24 202
翻译 minimum spanning tree (connect all vertices in undirected graphic)
both are greedy algorithm- kruskal algorithm, O(ElgV)make set for each vertex. set only contains one vertexsort G.E in non-descending order by weight of edge.for each edge(u,v), if they are not in sam...
2018-04-13 16:27:35 108
翻译 graphic
- G = (V, E)- BFS, use FIFO queue to process all adjacent vertices. generate only one tree. it can be used for shortest path (without weight on path) of two vertices.- DFS, process it with start time ...
2018-04-13 12:28:29 136
翻译 quick sort
- quick sort,- quick random sort select a index randomly and exchange its value with r, which is end of array.
2018-04-13 10:45:25 111
翻译 heap, heap sort, priority queue
- heap backed by array. left node is A[2i], right node is A[2i+1]. parent value is bigger than its children assuming it biggest heap- heap core algorithm: max-heapify(A, i), O(lg(n))- build heap from ...
2018-04-13 10:34:30 106
原创 greedy algorithm
- optimised sub-problem structure- recursive algorithm- only one sub-problem generated after selectiondifferences between greedy algorithm and dynamic programming- generate different numbe
2018-04-12 16:23:14 174
原创 Dynamic Programming
- optimised sub-problem structure- make a selection to generate different sub-problems. choose a optimised selection- overlapped sub-problem- solve sub-problem
2018-04-12 16:15:16 113
原创 counting-sort
pre-requisite: n element values are between 0 to k.sort:auxiliary array to hold how many elements less than current value i, i belongs to [0, k]
2018-04-11 17:01:08 115
转载 red-black tree
- bh(x) <= 2lg(n+1)- insertcase 1: z and z.p are red. z.uncle's color is red -> set colors of z.p and z.uncle to black. set z.p.p to red. make z.p.p as new zcase 2: z and z.p are red. z.uncle's ...
2018-04-08 12:16:24 90
原创 perfect hashing on static data set
- primary number, p, having all static elements, k, in the data set belongs to [0, p-1]- random number, a, belongs to [1, p-1]- random number, b, belongs to [0, p-1]- output set is in length of mwe ha...
2018-04-07 13:42:49 168
原创 insertion sort with binary search applied on ordered array
insertion sort with binary search applied on ordered array => o(nlogn)
2018-04-03 09:39:01 258
空空如也
TA创建的收藏夹 TA关注的收藏夹
TA关注的人