2013年11月_maixia24

12月 11月 10月 09月 08月 07月 06月 05月 04月

原创 hadoop job.class 源码分析

waitForCompletion()方法里面，提交作业给集群，并且等待作业完成 /** * Submit the job to the cluster and wait for it to finish. * @param verbose print the progress to the user * @return true if the job succeeded

2013-11-28 23:03:44 840

原创 Java 序列化的高级认识 hadoop序列化 avro

http://www.ibm.com/developerworks/cn/java/j-lo-serial/http://blog.csdn.net/yakihappy/article/details/3979373

2013-11-27 15:09:14 617

转载有趣的foo bar

http://www.cnblogs.com/felicity/archive/2010/11/30/1892100.html不管是java，Ｃ＋＋，还是PHP，每次都能看到 foo = bar 的例子，这两个单词很恶心，foo查不到字典，bar的解释又让人跟编程联系不上，更弄不清楚这个神秘的foo是个什么关系。语意不清，学起来心里老是有疙疙瘩瘩的感觉，于是查吧，终于揭开了这两

2013-11-27 13:46:36 799

原创 Mapper reducer 的生命周期

/** * Called once at the start of the task.只在任务开始的时候运行一次 */ protected void setup(Context context ) throws IOException, InterruptedException { // NOTHING } /*

2013-11-26 16:03:09 881

原创 WritableComparator RawComparator

IntWritable 实现了WritableComparable 接口, 它是Writable 和 java.lang.Comparable 接口的子类package org.apache.hadoop.io;public interface WritableComparable extends Writable, Comparable {}数据类型的比较在MapReduce中式

2013-11-26 14:49:27 3322

原创 MultipleInputs源码分析

/** * Licensed to the Apache Software Foundation (ASF) under one * or more contributor license agreements. See the NOTICE file * distributed with this work for additional information * regarding

2013-11-25 14:22:40 1337

原创 DBInputFormat的key value 格式

DBInputFormat emits LongWritables containing the record number as key and DBWritables as value. DBInputFormat key 类型 LongWritables record numbervalue类型：DBWritables

2013-11-25 13:56:38 810

原创 hadoop 序列化与java序列化的区别

1 java的序列化机制在每个类的对象第一次出现的时候保存了每个类的信息, 比如类名, 第二次出现的类对象会有一个类的reference, 导致空间的浪费2 有成千上万(打个比方,不止这么多)的对象要反序列化, 而java序列化机制不能复用对象, java反序列化的时候, 每次要构造出新的对象. 在hadoop的序列化机制中, 反序列化的对象是可以复用的.3 自我实现把控力更好

2013-11-24 14:54:46 1862

原创 DistributedCache源码分析

/** * Licensed to the Apache Software Foundation (ASF) under one * or more contributor license agreements. See the NOTICE file * distributed with this work for additional information * regarding

2013-11-14 15:06:44 1885

原创 Hbase filter接口源码研究

实现该接口的类 FileterBase 、FilterList/** * @xiao 行和列的过滤器接口，直接应用于RegionServer * Interface for row and column filters directly applied within the regionserver. * @xiao 期望如下的调用顺序 * A filter can expect th

2013-11-12 13:08:06 2025

转载二次排序原理

在map阶段，使用job.setInputFormatClass定义的InputFormat将输入的数据集分割成小数据块splites，同时InputFormat提供一个RecordReder的实现。本例子中使用的是TextInputFormat，他提供的RecordReder会将文本的一行的行号作为key，这一行的文本作为value。这就是自定义Map的输入是的原因。然后调用自定义Map的m

2013-11-12 09:57:09 1784

转载获取hive建表语句

http://www.alidata.org/archives/939如何获取hive建表语句在使用hive进行开发时，我们往往需要获得一个已存在hive表的建表语句(DDL),然而hive本身并没有提供这样一个工具。要想还原建表DDL就必须从元数据入手，我们知道，hive的元数据并不存放在hdfs上，而是存放在传统的RDBMS中，典型的如mysql，derby

2013-11-08 19:05:31 32423 9

原创 Hive 插入数据显示hive表详细信息

向管理表中加载数据因为，Hive没有行级别的插入，更新和删除操作，往表中插入数据的唯一方法就是使用成批载入操作。或者你可以通过其他的工具向正确的目录写入数据。

2013-11-08 10:02:36 9189

原创 Hbase startrow

20100809041500_abc_xyz20100809041500_abc_xyw20100809041500_abc_xyc*20100809041500_abd_xyz*20100809041500_abd_xywstart row = "20100809041500_abd"end row = "20100809041500_abe"scan.se

2013-11-05 15:50:55 3380

Hive是為簡化編寫MapReduce程序而生的，使用MapReduce做過數據分析的人都知道，很多分析程序除業務邏輯不同外，程序流程基本一樣。在這種情況下，就需要Hive這樣的用戶編程接口。Hive本身不存儲和計算數據，它完全依賴於HDFS和MapReduce，Hive中的表純邏輯表，就是些表的定義等，也就是表的元數據。使用SQL實現Hive是因為SQL大家都熟悉，轉換成本低，類似作用的Pig就

2013-11-05 11:14:26 21806 12

原创 Hbase源码分析 RowCounter

2013-11-04 18:18:53 3064

原创 MySql Host is blocked because of many connection errors; unblock with 'mysqladmin flush-hosts' 解决方法

以root登录mysql flush hosts 命令mysql -uroot 登录

2013-11-04 11:00:02 2488

原创 hadoop开源

使用hadoop的公司 http://five.rdaili.com/sohu.com.php?u=engl3zVky1NsNDDp3t9mshjqO8Mks29GbFYUjHdHJdhvzaNy&b=3hadoop 开源项目 Cascalog: Abstraction for data processing on Hadoop.Mrjob: Dev

2013-11-03 21:28:21 713