Hadoop
rocksword
这个作者很懒,什么都没留下…
展开
-
Visualizing HBase Flushes And Compactions
I was looking into more detail at how HBase compactions work, and given my experience collecting metrics for Lily, and also inspired by this blog post on Lucene, I thought it would be nice to do s转载 2014-01-09 13:59:54 · 314 阅读 · 0 评论 -
Why the days are numbered for Hadoop as we know it
Hadoop is everywhere. For better or worse, it has become synonymous with big data. In just a few years it has gone from a fringe technology to the de facto standard. Want to be big bata or enterprise转载 2015-02-04 19:03:41 · 376 阅读 · 0 评论 -
hadoop 2.2.X 配置参数说明:hdfs-site.xml
dfs.cluster.administrators hdfs dfs.block.access.token.enable true dfs.datanode.failed.volumes.tolerated 0 dfs.repl转载 2015-02-05 11:46:55 · 1822 阅读 · 0 评论 -
Configuration Parameters: What can you just ignore
Configuring a Hadoop cluster is something akin to voodoo. There are a large number of variables in hadoop-default.xml that you can override in hadoop-site.xml. Some specify file paths on your system,转载 2015-02-06 10:47:30 · 297 阅读 · 0 评论 -
LocalCache in hadoop MRv2 aka YARN
The old DistributedCache is deprecated in the new API of hadoop 2.2.3. Now the sugguested method is Job.addCacheFileBy default, the cached file was add to a special folder on each slave node.转载 2014-12-15 11:08:24 · 284 阅读 · 0 评论 -
Getting Started with ORC and HCatalog
ORC (Optimized Row Columnar) is a columnar file format optimized to improve performance of Hive. Through the Hive metastore and HCatalog reading, writing, and processing can also be accomplished b转载 2015-02-10 10:26:14 · 448 阅读 · 0 评论 -
cluster_dispatcher.sh
Script原创 2015-02-05 15:35:15 · 198 阅读 · 0 评论 -
Determine YARN and MapReduce Memory Configuration Settings
This section describes how to configure YARN and MapReduce memory allocation settings based on the node hardware specifications.YARN takes into account all of the available compute resources on each转载 2015-03-20 13:54:46 · 198 阅读 · 0 评论 -
Hadoop Interview Questions
Hadoop is a complex framework. Some interview questions can be really simple like “How do you debug a performance issue or a long running job?” but difficult to answer on the spot if you are not prep转载 2015-03-20 14:34:39 · 200 阅读 · 0 评论 -
Hadoop Performance Tuning Best Practices
I have been working on Hadoop in production for a while. Here are some of the performance tuning tips I learned from work. Many of my tasks had performance improved over 50% in general. Those guide li转载 2015-03-20 15:29:01 · 207 阅读 · 0 评论 -
Apache Hadoop YARN: Avoiding 6 Time-Consuming "Gotchas"
Understanding some key differences between MR1 and MR2/YARN will make your migration much easier.Here at Cloudera, we recently finished a push to get Cloudera Enterprise 5 (containing CDH 5.0.0转载 2015-03-23 13:22:12 · 268 阅读 · 0 评论 -
cluster_mgr.sh
script原创 2015-02-06 16:08:18 · 200 阅读 · 0 评论 -
杂谈国内Hadoop的热潮:以辩证的眼光来看待
Hadoop本质来说仅仅是个存储模型,这个存储模型附带有一种计算模型(map reduce),国人一上来可能比较生疏,但是hadoop仅仅是个基础模型。 数据库技术兴起的时候,那个时候连foxbase都很抢手,历史证明了真正带给人们价值的不是数据库,而是ERP。IBM整个软件体系不怎么搞应用,搞得 都是基础,db2,websphere,rational,tivoli,lotus,转载 2015-02-04 17:42:53 · 227 阅读 · 0 评论 -
HOW TO FILTER RECORDS - PIG TUTORIAL EXAMPLES
Pig allows you to remove unwanted records based on a condition. The Filter functionality is similar to the WHERE clause in SQL. The FILTER operator in pig is used to remove unwanted records from the转载 2014-12-10 09:11:50 · 275 阅读 · 0 评论 -
Set Up Hadoop Multi-Node Cluster on CentOS 6
The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models.Our earlier article abo转载 2013-12-27 14:23:01 · 393 阅读 · 0 评论 -
How to install snappy with HBase 0.96.x (and Hadoop 2.2.0..)
Almost a year ago I published some lines about Snappy installation on HBase 0.94.x. Since both Hadoop 2.2.0 and HBase 0.96.0 are now out, I have decided to install a new cluster with those 2 version转载 2014-01-10 17:34:08 · 1031 阅读 · 1 评论 -
Hadoop FAQ
===1, The ServiceName: mapreduce.shuffle set in yarn.nodemanager.aux-services is invalid.The valid service name should only contain a-zA-Z0-9_ and can not start with numbersA: Configure aux-serv原创 2014-05-08 17:27:57 · 5156 阅读 · 3 评论 -
Hadoop实现自定义InputFormat按单个文件Map
public class WholeFileTest { private static final Log logger = LogFactory.getLog(WholeFileTest.class); public static class mapper extends Mapper { protected void map(Text key, B转载 2014-05-21 12:03:55 · 348 阅读 · 0 评论 -
用新的Java API操作HBase数据库
看到网络上好多关于Java API操作HBase数据库的例子都用的是旧的API。我用新的API把 deprecated的API替换掉。1,用HTableDescriptor tableDesc = new HTableDescriptor(TableName.valueOf(tableName)); 代替 HTableDescriptor tableDesc = new HTabl转载 2014-05-29 14:54:06 · 2112 阅读 · 0 评论 -
Hadoop task failed for Java heap space
Stdout logs:12/12/02 16:31:45 INFO input.FileInputFormat: Total input paths to process : 112/12/02 16:31:45 INFO mapreduce.JobSubmitter: number of splits:1312/12/02 16:31:45 WARN conf.Configuratio原创 2014-06-16 15:25:51 · 1342 阅读 · 0 评论 -
How to Get the Geographic Location of an IP Address
An IP address used to mean nothing to us, other than where they are in the cyber world, but it is now possible to find out where they are in the real world, using databases that arefreely available.转载 2014-12-02 17:37:21 · 450 阅读 · 0 评论 -
Pig与Hive的区别
请允许我很无聊的把飞机和火车拿来做比较,因为2者根本没有深入的可比性,虽然两者都是一种高速的交通工具,但是具体的作用范围是截然不同的,就像Hive和Pig都是Hadoop中的项目,并且Hive和pig有很多共同点,但Hive还似乎有点数据库的影子,而Pig基本就是一个对MapReduce实现的工具(脚本)。两者都拥有自己的表达语言,其目的是将MapReduce的实现进行简化,并且读写操作数据最终都转载 2014-12-03 16:42:56 · 266 阅读 · 0 评论 -
Hadoop Pig Loadfunc
hadoop pig 设计的还是很不错的,可以写 UDF每一个统计基本上都是要对原始日志进行切分,把想要的一些字段 EXTRACT 提取出来日志有着基本的模式"mac:50:A4:C8:D7:10:7D"|"build:5141bc99"|"network:mobile"|"version:2.4.1"|"id:taobao22935952431"|基本上是 key, value转载 2014-12-04 16:32:30 · 325 阅读 · 0 评论 -
When to Disable Speculative Execution
BackgroundsBelow is the link from WikiMedia about what’s SE (Speculative Execution). In Hadoop, the following parameters string are for this settings. And, they are true by default.mapred.转载 2015-01-29 17:51:28 · 219 阅读 · 0 评论 -
Operate Hive by JDBC
New java projectpom.xml xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> 4.0.0 com.an.antry antry-hive 0.0.1原创 2014-12-05 16:48:36 · 261 阅读 · 0 评论 -
cluster_status.sh
Script原创 2015-02-05 16:41:42 · 231 阅读 · 0 评论