Hadoop
dataee
解决方案咨询
大数据处理
系统架构
展开
-
Hadoop之OutputFormat
版本:2.2版 描述:OutputFormat是设置MR的结果输出写操作格式,包括如何写?写那?也就是定义写规则 类代码:抽象类定义: public abstract RecordWriter<K, V> getRecordWriter( TaskAttemptContext context) throws IOException, I...2014-01-23 16:02:38 · 88 阅读 · 0 评论 -
Hadoop之wordcount性能测试
概述:利用wordcount做hadoop性能测试,依据count的数据规模增长进行性能分析评测 版本:bin/hadoop versionHadoop 2.3.0-cdh5.0.0 测试步骤:1.利用randomtextwriter生成指定规模的测试集合2.执行wordcount:nohup bin/hadoop jar share/hadoop/map...2014-04-28 11:39:27 · 414 阅读 · 0 评论 -
Hadoop之RandomTextWriter使用
作用:RandomTextWriter是为了mock数据集的,做压测等,MRv1和MRv2的参数值不一样,不过其参数标示含义一样,我们以MRv2来做说明:产生100G的数据:bin/hadoop jar share/hadoop/mapreduce2/hadoop-mapreduce-examples-xx.jar randomtextwriter -Dmapreduce.rand...2014-04-15 17:39:30 · 748 阅读 · 1 评论 -
Hadoop之mrbench
需求:需要测试机器mr的执行性能,那么可以通过mrbench进行测试语法: bin/hadoop jar share/hadoop/mapreduceX/hadoop-test-XXX.jar mrbench[-baseDir <base DFS path for output/input, default is /benchmarks/MRBench>][-...2014-04-14 18:58:31 · 704 阅读 · 0 评论 -
Hadoop之YARN安装部署
版本信息:Hadoop 2.3.0-cdh5.0.0节点分布:NameNode:compute-50-04 SecondaryNameNode:compute-50-04 ResourceManager :compute-50-03 NodeManager :compute-28-16compute-28-17compute-50-00compute-...2014-08-21 15:15:44 · 371 阅读 · 0 评论 -
YARN异常: we cannot start a localDataXceiverServer because libhadoop cannot解决方法
版本:HadoopCDH5.0异常描述:部署hadoop yarn的时候报RuntimeException,信息如下:java.lang.RuntimeException: Although a UNIX domain socket path is configured as /var/run/hadoop-hdfs/dn._PORT, we cannot start a loc...2014-03-13 14:34:20 · 1428 阅读 · 0 评论 -
Hive之insert into 和insert overwrite
insert into 和overwrite的用法:INSERT INTO TABLE tablename1 [PARTITION \(partcol1=val1, partcol2=val2 ...)] \select_statement1 FROM from_statement; INSERT OVERWRITE TABLE tablename1 [PARTITION \...2014-06-26 16:56:26 · 2853 阅读 · 0 评论 -
Hive之partition
概述hive的partition可以认为是RMDB中的分区,目的是query时减少全表扫描。使用:创建分区:create EXTERNAL TABLE IF NOT EXISTS p(id STRING COMMENT 'id') partitioned by (seq int)STORED AS SEQUENCEFILE LOCATION 'hdfs:///...2014-06-25 17:50:37 · 306 阅读 · 0 评论 -
Hive之内部表和外部表
hive的建表语句如下:CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name [(col_name data_type [COMMENT col_comment], ...)] [COMMENT table_comment] [PARTITIONED BY (col_name data_type [COM...2014-06-24 17:43:11 · 112 阅读 · 0 评论 -
Hadoop之TeraSort
背景:TeraSort普遍用于参数hadoop的性能,那么他的原理是什么?原理:1.利用默认的IdentityMapper和IdentityReducer进行系统的输入输出。2.利用mapreduce.job.reduces进行partitions数的确定3.每个partition读取mapreduce.terasort.partitions.sample/mapreduce...2014-06-24 11:17:44 · 413 阅读 · 0 评论 -
hadoop2.7.2在Ubuntu12.04下分布式安装指南
一、系统及版本准备JDK:jdk-7u2-linux-i586Hadoop:hadoop-2.7.0安装目录:/usr/local/jdk/usr/local/hadoop节点及IP(/etc/hosts,注意需要重启网络):192.168.56.100 os.data0192.168.56.101 os.data1192.168.56.102 os.data...2017-02-21 14:58:40 · 103 阅读 · 0 评论