2017年11月_anoperA

原创 YARN--Core components

组件关系图(from hadoop yarn home): Container At the fundamental level, a container is a collection of physical resources such as RAM, CPU cores, and disks on a single node. There can be multiple containe

2017-11-30 11:22:21 415

The fundamental idea of YARN is to split up the functionalities of resource management and job scheduling/monitoring into separate daemons. The idea is to have a global ResourceManager (RM) and per-app

2017-11-29 16:58:46 168

原创 MR--MaxTemperature的Mapreduce程序注释

程序基于Hadoop2.7.4开发, 可运行天气数据请到ncdc或者hadoop权威指南书籍网站获取.public class MaxTemperature { public static class MaxTemperatureMapper extends Mapper<Object, Text, Text, IntWritable> { //天气温度9999,代

2017-11-28 21:19:53 283

原创 MR--WordCount的MapReduce程序注释

程序基于Hadoop2.7.4开发, 可运行public class WordCount { public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable>{ private final static IntWritable one = new IntWritable(1

2017-11-28 21:15:13 371

原创进程同步03--Peterson算法

Peterson Algorithm简介(Wikipedia) Peterson’s algorithm (or Peterson’s solution) is a concurrent programming algorithm for mutual exclusion that allows two or more processes to share a single-use resour

2017-11-28 21:01:44 1994

原创进程同步02--临界区问题(Critical Section Problem)

临界区问题: 假设现有n个进程(P1, P2,…,Pn), 每个进程都如图所示, 拥有一个可以修改共享变量(变量, 文件, 数据库表等)的临界区(critical section), 要求任何一个进程在临界区执行时, 其他都不能执行. 正式定义:Mutual exclusion(互斥): When a thread is executing in its critical section, n

2017-11-28 15:26:21 2456

原创进程同步--生产者消费者问题(Producer-consumer Problem)

From Wikipedia 生产者消费者问题（英语：Producer-consumer problem），也称有限缓冲问题（英语：Bounded-buffer problem），是一个多线程同步问题的经典案例。该问题描述了共享固定大小缓冲区的两个线程——即所谓的“生产者”和“消费者”——在实际运行时会发生的问题。生产者的主要作用是生成一定量的数据放到缓冲区中，然后重复此过程。与此同时，消费者也

2017-11-28 14:29:09 3513

原创 MR--InputSplit

InputSplit Ref InputSplit represents the data to be processed by an individual Mapper. Typically, it presents a byte-oriented view(面向字节的视图) on the input and is the responsibility of RecordReader

2017-11-27 15:28:14 292

原创 MR--RecordReader

RecordReader Ref The record reader breaks the data into key/value pairs for input to the Mapper.总结:

2017-11-27 15:25:14 261

原创 MR--InputFormat

Hadoop2.7.4 InputFormat总结:

2017-11-27 15:22:05 424

原创 IntelliJ IDEA使用技巧集合

使用Ctrl + J Live templates let you insert frequently-used or custom code constructs into your source code file quickly, efficiently, and accurately.使用实时模板, 可以快速插入代码, 比如

2017-11-27 14:53:03 355

原创 MR--Configuration

Hadoop2.7.4 API–Configuration 功能: 1. 提供了配置Hadoop参数的方法. 2. 可以选择”资源文件(Resource)“, “常量参数(Final Parameters)“, “变量表达式(Variable Expression)“三种方式配置Hadoop参数. 3. 使用conf.get([parameterName])方式获取参数. 4. 过期

2017-11-25 20:19:54 810

原创 MR-Job

/*The job submitter's view of the Job.It allows the user to configure the job, submit it, control its execution, and query the state. The set methods only work until the job is submitted, afterwards

2017-11-25 20:17:14 351

原创 Java-Serialize and Deserialize

Java提供了一个机制叫做:对象序列化(Object Serialization)可以将Java对象的类型信息和对象携带的数据信息持久化写入到文件, 然后在另外的程序中读出保持原样.这样一来, 一个对象可以在A地持久化之后, 传到B处使用.与之有关的类分别是ObjectInputStream和ObjectOutputStrream; 要使一个类的对象可以被持久化, 那么这个类必须实现Serializ

2017-11-25 20:04:30 554

原创 Stackoverflow--Can you explain the concept of streams?

The word “stream” has been chosen because it represents (in real life) a very similar meaning to what we want to convey when we use it.Let’s forget about the backing store for a little, and start think

2017-11-25 18:05:57 259

原创 MR--Text

This class stores text using standard UTF8 encoding. It provides methods to serialize, deserialize, and compare texts at byte level. The type of length is integer and is serialized using zero-compresse

2017-11-25 17:40:41 247

原创 Idea + Hadoop2.7.4开发Mapreduce

环境: 1. Ideal 2016 2. Hadoop 2.7.4由于hadoop较大, 我直接添加本地依赖: 对于较小的jar包, 我选择使用maven仓库: pom.xml<dependencies>  <dep

2017-11-24 17:08:18 687

原创动态规划01--切钢条(2)

带备忘录的自顶向下遍历def memoized_cut_rod_aux(p, n, r): ''' 携带一张收益表, 减少重复计算 ''' if r[n-1]>=0: return [r[n-1], r] if n==0: return [0, r] else: q = 0 for i in

2017-11-24 14:01:24 278

原创动态规划01--切钢条(1)

# -*- coding: utf-8 -*-import randomdef getp(n): ''' 获取一个长度为n的价格表 ''' p = [0 for i in range(n)] for i in range(n): p[i] = random.randint(1, n) return pdef cut_rod(p, n)

2017-11-24 13:36:32 264

原创 scala编程04--数组

//实例化类型, 长度为3的Int类型的数组val nums = new Array[Int](3)//赋值, 赋值不是用的方括号, 用圆括号.nums(0) = 1//显式定义类型val greetings:Array[String] = new Array[String](3)//自动推断类型val strs = Array("1", "Hello")val ints = Array

2017-11-23 13:37:39 235

原创 scala编程03--函数

scala方法结构:scala方法展示: 总结: 1.java中, 函数返回类型称为”返回类型”, scala中称作”结果类型” 2.java中如果函数无返回类型则称”void”, 在scala中称”Unit”, 事实上, scala会把java的void转换为Unit 3.函数定义字面形式与ava不同

2017-11-23 12:37:17 173

原创 scala编程02--变量

val s = "Hello World"val i = 2val f = 3.2var msg = "Hello, Smith"msg = "Hi, Smith" 总结: 1, var 值能改变, val值不能改变, 推荐多用val, 尽量不用var. 2, scala能够自动推断类型.

2017-11-23 12:32:08 159

原创 Scala+Idea开发环境搭建

需要什么东西? 1.idea的scala插件, 主要用于与scala语言的编辑有关. 2.scala环境, 与scala运行有关.1.安装scala环境进入scala主页, 下载scala for windows, 不下载其他版本. Scala下载页面下载安装后, 在cmd中输入scala > scalaWelcome to Scala 2.12.4 (Java HotSpo

2017-11-22 20:44:46 3788 1

原创 MySQL存储过程(1)--简介

5.0之后才开始拥有存储过程存储过程的优点:提升性能, 一次编译,存储在MYSQL服务器,在以后调用可以只是写参, 减少传输压力. MySQL的存储过程实现略微不同, 只是在每一个connection缓存了存储过程降低传输需求, 调用只需要参数.可重用性, 如果多个App对一个存储过程都有需求, 就可以直接调用存储过程, 没有必要自己再去编写一次程序函数.安全的, 管理员可以授予适当的权限给

2017-11-22 10:36:26 209

原创 Spark常见错误

scala> val lines = sc.textFile("README.md")scala> lines.count()org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://192.168.32.132:9000/user/walter/README.md at org.apa

2017-11-20 15:31:34 804

原创 Hive创建外部分区表

drop table if exists employee;create external table employee ( name string, salary float, subordinates array<string>, deductions map<string, float>, add

2017-11-17 10:42:36 676

原创 Vim技巧

插入控制字符输入

2017-11-11 00:08:35 171

原创动态规划--矩阵链乘法

1.两个矩阵乘法def matrix_multipy(A, B): ''' 乘法得到的是一个[A.rows,B.cols]的矩阵, 相当于A.rows个向量的B.cols次的向量线性加权 ''' if not A.shape[1]==B.shape[0]: #A组中向量的维度与B组中向量的维度一致 print("error!")

2017-11-10 11:13:54 199

原创 Secure CRT快捷键

Ctrl+A|Alt+A 光标到行首 Ctrl+E 光标到行尾

2017-11-10 11:10:52 546

原创 shell中source与export

comm.shAGE=18export AGE在当前shell中执行source命令$source comm.sh$echo AGE是可以打印出AGE的在当前shell中执行comm.sh的内容, 类似于C语言中的宏定义.当前shell的子shell并不会拷贝变量当产生子shell时, export 的变量说明, 当由此shell产生新的子shell时, 拷贝这些变量到新的shell

2017-11-02 13:26:37 331

a1158375969的专栏