- 博客(21)
- 收藏
- 关注
原创 杂记
For HBase and modern hardware, the number would be more like 10 to 1,000 regions per server, but each between 1 GB and 2 GB in size.Setting the timestamp for the deletes has the effect of only matching the exact cell, that is, the matching column and valu
2021-05-03 18:51:16 104
原创 mapreduce优化
Impala 进程需要16g内存,HBase Region Server process需要12-16GB RAM,operating system需要6.4GB内存,yarn.nodemanager.resource.memory-mb默认8g,每个节点配置可以不同。每个磁盘和每个cpu核分配一到两个containers是好主意,例如一个节点有12个磁盘和12个cpu核。所以分配20个containers。yarn.scheduler.minimum-allocation-mb如何设置,一般而言如果一个节
2021-04-24 05:30:59 222
原创 scala--集合
val x = Vector(1, 2, 3)x.sum // 6x.filter(_ > 1) // Vector(2, 3)x.map(_ * 2) // Vector(2, 4, 6)x.takeWhile(_ < 3) // Vector(1, 2)immutable :ArraySeq属于IndexedSeq,底层数组实现,随机查找性能高,写操作需要线性时间LazyList属于L
2021-04-06 21:02:11 87
原创 scala--方法
def isBetween(a: Int, x: Int, y: Int) = a >= x && a <= y可以不带返回值类型但是最好写上val convert1to5 = new PartialFunction[Int, String] {val nums = Array(“one”, “two”, “three”, “four”, “five”)def apply(i: Int) = nums(i-1)def isDefinedAt(i: Int) = i &g
2021-04-03 19:00:43 155
原创 scala--类和枚举
主构造器:VisibilityAccessor?Mutator?var是是val是否no var or val否否Adding the private keyword to var or val否否在类里面,无论上述条件如何,都可以访问和修改Case classes默认什么都不加的话是val,例如:case class Person(name: String)scala> val p = Person(“Dale Cooper”)
2021-04-01 10:44:29 90
原创 scala--控制结构
for i <- List(1, 2, 3) do println(i)fori <- 1 to 10if i > 3if i < 6doprintln(i)val listOfInts = fori <- 1 to 10if i > 3if i < 6yieldi * 10def isTrue(a: Matchable): Boolean = a matchcase 0 | “” => falsecase _ => tru
2021-03-31 13:09:03 84
原创 scala-Numbers和Dates
“1”.toByte // Byte = 1“1”.toShort // Short = 1“1”.toInt // Int = 1“1”.toLong // Long = 1“1”.toFloat // Float = 1.0“1”.toDouble // Double = 1.0“hello!”.toInt // java.lang.NumberFormatException“1”.toByteOption // Option[Byte
2021-03-30 00:02:51 203
原创 scala--String
val h = "Hello"f"'$h%s'" // 'Hello'f"'$h%10s'" // ' Hello'f"'$h%-10s'" // 'Hello '
2021-03-28 12:15:46 575
原创 spark-rdd
大多数场景下 应该用DataFrame,Structured API或者sql性能更好,但有些场景高级操作不适合,此时可以用低级api-rdd低级api三种场景:你想自己控制数据物理分布、你想维护rdd写的历史代码、你想做一些自定义共享变量操作。Partitioner 使用正确可以极大提高性能和稳定性,正确的自定义分区来控制 数据的物理分布是使用RDD重要原因spark所有代码都会编辑到rdd,rdd的api入口是spark.sparkContext,spark是SparkSession实例rdd代码
2021-03-07 02:26:42 92 1
原创 java7高并发6
创建thread executorimport java.util.Date;import java.util.concurrent.Executors;import java.util.concurrent.ThreadPoolExecutor;import java.util.concurrent.TimeUnit;public class Task implements Runnable { private Date initDate; private String na
2021-02-27 09:21:23 91
原创 java7高并发5
phaseimport java.io.File;import java.util.ArrayList;import java.util.Date;import java.util.List;import java.util.concurrent.Phaser;import java.util.concurrent.TimeUnit;public class FileSearch implements Runnable { private String initPath; p
2021-02-26 05:33:34 148
原创 sql优化
索引数据库 索引和字典目录差不多原理。如果数据不是很多,不用索引查询 会更快,否 则用索引查询会更快。索引和表存储位置是分开的,索引的主要目的是提高数据 检索性能。索引的创建和删除不会影响表中数据。然而删除索引有可能导致数据检索的性能下降。索引表会占据磁盘空间。索引中的数据和 字典目录一样按照字母顺序排序,每个数据指向数据表中的对应一个或多个位置。在where语句中会用到索引,如果没有指定where条件的索引会全表扫描。因为索引表中数据是有顺序的,所以数据库查找索引的时候会进行类似于二分法查找索引数据,然
2021-02-25 22:53:00 200
原创 java7高并发4
semaphoresemaphore 有个参数counter 控制资源数量。counter 》0表示有资源可以访问,=0表示没有资源线程阻塞。semaphore.acquire()表示counter -1,semaphore.release()表示counter +1public class PrintQueue { private final Semaphore semaphore; public PrintQueue(){ semaphore=new Semaphor
2021-02-23 18:18:08 68
原创 java7高并发3
线程工厂类生成线程public class MyThreadFactory implements ThreadFactory {private int counter;private String name;private List stats;public MyThreadFactory(String name){ counter=0; this.name=name; stats=new ArrayList<>();}@Overridepublic Th
2021-02-23 03:34:00 246
原创 java7高并发2
睡眠和恢复线程public class FileClock implements Runnable {@Overridepublic void run() {for (int i = 0; i < 10; i++) {System.out.printf("%s\n", new Date());try {TimeUnit.SECONDS.sleep(1);} catch (InterruptedException e) {System.out.printf(“The FileCloc
2021-02-22 17:44:07 111 1
原创 java7高并发1
创建线程两种方式1.继承Thread 类,重写run()方法2.创建实现Runnable 接口的类,然后创建参数为此类的Thread类对象例:public class Calculator implements Runnable {private int number;public Calculator(int number) {this.number=number;}@Overridepublic void run() {for (int i=1; i<=10; i++){S
2021-02-22 16:11:18 114
原创 kafka Exactly Once Semantics
Chapter 5. Exactly Once SemanticsA note for Early Release readersWith Early Release ebooks, you get books in their earliest form—the authors’ raw and unedited content as they write—so you can take advantage of these technologies long before the official
2021-02-18 22:09:46 335
原创 Adding nodes to the cluster
1.ssh to Namenode and edit the file hdfs-site.xml to add the following property to it:<property><name>dfs.hosts</name><value>/home/hadoop/includes</value><final>true</final></property>2.Make sure the
2021-02-18 18:14:09 184
原创 hadoop语法
Decommissioning nodes1.ssh to Namenode and edit the file hdfs-site.xml by adding the following property to it:<property><name>dfs.hosts.exclude</name><value>/home/hadoop/excludes</value><final>true</final>&l
2021-02-18 18:05:23 259
原创 java正则表达式语法
The expression [01] matches a 0 or a 1 character./[0-7]/ The range 0-7 is equivalent to 01234567.0123456789abcdefABCDEF is equivalent to 0-9a-fA-F./[^0-9a-fA-F]/\d stands for a digit ([0-9]).\D stands for a nondigit character ([^0-9]).\s stands for
2021-02-18 15:21:39 50
空空如也
空空如也
TA创建的收藏夹 TA关注的收藏夹
TA关注的人