Hadoop
ylzhjlinux
这个作者很懒,什么都没留下…
展开
-
7 Tips for Improving MapReduce Performance
Since MapReduce and HDFS are complex distributed systems that run arbitrary user code, there’s no hard and fast set of rules to achieve optimal performance; instead, I tend to think of tuning a cl...原创 2014-05-15 15:32:28 · 118 阅读 · 0 评论 -
Content based and collaborative filtering based recommendation and personalizati
Referenceshttps://github.com/pranab/sifarish原创 2015-01-21 15:53:59 · 156 阅读 · 0 评论 -
hadoop 2.3.0 在 ubuntu/Centos 64位下的编译
hadoop 2..30的官方tarball中 ./lib/native中的库只适合32位操作系统,在64位下安装会报一些错误,使用hadoop启动不起来。所以需要在64位上重新编译。1. enviroment hadoop 2.3.0 ubuntu 12.04 64 2. follow these steps to recompile hadoopsudo a...原创 2014-03-13 17:38:49 · 92 阅读 · 0 评论 -
hadoop 2.2.0 cluster errors messages
1. when put local file to HDFS using #hadoop fs -put in.txt /test, there is a error message: hdfs.DFSClient: Exception in createBlockOutputStream java.net.NoReouteToHostExceptionsolution...原创 2014-03-23 23:08:24 · 168 阅读 · 0 评论 -
Add third party jars in a job
When I submit a java job (include some map/reduce jobs) in the hue UI using oozie Editor, the third party jars are not loaded correctly. 1. the only success way i used is to build a fat jar wh...原创 2014-08-18 15:10:25 · 85 阅读 · 0 评论 -
hadoop: data join exception
http://stackoverflow.com/questions/12956488/hadoop-nosuchmethodexception原创 2014-08-26 18:39:28 · 71 阅读 · 0 评论 -
Hadoop: Output data to mutiple dir
import java.io.IOException;import org.apache.commons.logging.Log;import org.apache.commons.logging.LogFactory;import org.apache.hadoop.io.NullWritable;import org.apache.hadoop.io.Text;im...原创 2014-09-01 12:47:02 · 82 阅读 · 0 评论 -
Eclipse Hadoop Development ENV Construction
ENV: Ubuntu 12.04 1. Install Eclipse2. create desktop shortcut for Eclipse a. create an empty document named eclipse.xx b. edit eclipse.xx like followings (avoid failing to open eclip...原创 2014-04-24 09:58:41 · 79 阅读 · 0 评论 -
Hadoop:Integrating Hadoop Data with Oracle Parallel Processing
Referencehttps://blogs.oracle.com/datawarehousing/entry/integrating_hadoop_data_with_o原创 2014-10-09 16:52:42 · 93 阅读 · 0 评论 -
Hadoop: How to using two mapper to do different thing
In my work, I run a situation that I want to use A mapper reading a file with to fields (questionId, questionTags) and outpute format likes key: questionId value: questionTags, while B mapper readi...原创 2014-10-10 10:30:35 · 74 阅读 · 0 评论 -
The Hadoop Ecosystem Table
http://hadoopecosystemtable.github.io/http://blog.andreamostosi.name/big-data/https://github.com/youngwookim/awesome-hadoop原创 2014-11-10 15:28:24 · 89 阅读 · 0 评论 -
mysql applier with hadoop
MySQL Applier for HadoopReplication via the Hadoop Applier is implemented by connecting to the MySQL master and reading binary log events as soon as they are committed, and writing them into a fil...原创 2014-12-08 11:25:14 · 130 阅读 · 0 评论 -
MySQL Applier For Hadoop: Real time data export from MySQL to HDFS
http://innovating-technology.blogspot.com/2013/04/mysql-hadoop-applier-part-1.htmlMySQL replication enables data to be replicated from one MySQL database server (the master) to one or more MySQL da...原创 2014-12-08 17:00:04 · 159 阅读 · 0 评论 -
MySQL Applier For Hadoop: Implementation
http://innovating-technology.blogspot.com/2013/04/mysql-hadoop-applier-part-2.htmlThis is a follow up post, describing the implementation details of Hadoop Applier, and steps to configure and insta...原创 2014-12-08 17:15:34 · 193 阅读 · 0 评论 -
mysql hadoop applier install and configure
1.install and configure hadoop-2.6.0 ($HADOOP_HOME must be set).2. download mysql-5.6.22.tar.gz source code from http://dev.mysql.com/downloads/mysql/ #tar xf mysql-5.6.22.tar.gz#cd mysql-5.6....原创 2014-12-11 17:36:50 · 132 阅读 · 0 评论 -
Is HDFS an append only file system? Then, how do people modify the files stored
HDFS is append only, yes. The short answer to your question is that, to modify any portion of a file that is already written, one must rewrite the entire file and replace the old file."Even for a sin...原创 2014-12-17 17:22:14 · 90 阅读 · 0 评论 -
sqoop: truncate table prior export data from hdfs
We are using Sqoop to export data from the hive to SQL Server. The new data is always appended to the existing data in SQL Server.Is it possible to truncate the SQL Server table via Sqoop before...原创 2015-01-06 17:18:02 · 531 阅读 · 0 评论 -
Real-time Clickstream Analytics using Flume, Avro, Kite Morphlines and Impala
http://techkites.blogspot.com/2014/06/real-time-clickstream-analytics-using.html原创 2014-12-30 14:16:20 · 75 阅读 · 0 评论 -
TopK problem in Hadoop
Some example codes herehttps://github.com/adamjshook/mapreducepatterns/tree/master/MRDP/src/main/java/mrdp https://github.com/adamjshook/mapreducepatterns/blob/maste...原创 2014-05-19 18:08:44 · 64 阅读 · 0 评论 -
Number of Maps and Reduces
The number of map tasks for a given job is driven by the number of input splits and not by the mapred.map.tasks parameter. For each input split a map task is spawned. So, over the lifetime of a mapr...原创 2014-05-20 09:45:01 · 194 阅读 · 0 评论 -
difference between 0 reducer and identity reducer
0 reducer means reduce step will be skipped and mapper output will be the final outIdentity reducer means then shuffling/sorting will still take placeIf you do not need sorting of map results -...原创 2014-05-20 15:38:36 · 97 阅读 · 0 评论 -
Chain MapReduce Jobs
Referenceshttp://stackoverflow.com/questions/2499585/chaining-multiple-mapreduce-jobs-in-hadoophttps://developer.yahoo.com/hadoop/tutorial/module4.html#chaininght...原创 2014-05-20 18:16:20 · 80 阅读 · 0 评论 -
common errors solution
1.when i create a hive table in hue, there errors comes Solution:#hadoop dfsadmin -safemode leavehttp://www.linkedin.com/groups/Creating-table-in-Hive-getting-4547204.S.225243871 2.error...原创 2014-05-26 11:09:19 · 88 阅读 · 0 评论 -
Making Hadoop MapReduce Work with a Redis Cluster
Redis is a very cool open-source key-value store that can add instant value to your Hadoop installation. Since keys can contain strings, hashes, lists, sets and sorted sets, Redis can be used a...原创 2014-05-28 15:18:49 · 107 阅读 · 0 评论 -
Hadoop: Configuration 1
hadoop-env.shMust set JAVA_HOME in namenode and secondary namenodes, or the start-dfs.sh will run errors原创 2014-06-12 11:45:07 · 52 阅读 · 0 评论 -
Hadoop 2.2.0 cluster install guid
Installing hadoop 2.2.0 clusters with 3 nodes(one for namenode/resourcemanager and secondary namenode while the other tow nodes for datanode/nodemanager)1. ip assignments 192.168.122.1 ...原创 2014-02-05 01:56:35 · 86 阅读 · 0 评论 -
HDFS: API Introduction
Referenceshttp://blog.csdn.net/lastsweetop/article/details/9001467原创 2014-06-17 15:27:31 · 76 阅读 · 0 评论 -
Hadoop: Data Join
Reduce-side joining / repartitioned sort-merge join Note:DataJoinReducerBase, on the other hand, is the workhorse of the datajoin package, and it simplifies our programming by performing a fu...原创 2014-06-30 15:12:12 · 110 阅读 · 0 评论 -
Hadoop: High Qulity Blog
http://www.cnblogs.com/zhangchaoyang/articles/2647905.htmlhttp://blog.pureisle.net/archives/1618.htmlhttp://www.csdn.net/article/2014-01-01/2817984-13-tools-let-hadoop-flyhttp://blog.mortar...原创 2014-07-01 15:01:29 · 119 阅读 · 0 评论 -
In-Memory Hadoop Accelerator
https://gridgaintech.wordpress.com/2013/11/07/hadoop-100x-faster-how-we-did-it/ Almost two years ago, Dmitriy and I stood in front of a white board at GridGain’s office thinking: “How can we deli...原创 2014-12-19 15:02:29 · 230 阅读 · 0 评论 -
data replication from different databases
tungsten-replicator-3.0.0-524-src原创 2014-12-22 10:22:15 · 114 阅读 · 0 评论 -
open replicator
http://blog.csdn.net/menergy/article/details/17583823原创 2014-12-22 20:35:00 · 152 阅读 · 0 评论 -
Data ETL tools for hadoop ecosystem Morphlines
when i usethere is a errorjava.lang.NoClassDefFoundError: org/kitesdk/morphline/api/MorphlineCompilationException at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Cla...原创 2014-12-25 11:39:45 · 228 阅读 · 0 评论 -
flume source using mysql-replication-listener to realtime copy data from mysql
https://bitbucket.org/winebarrel/mysql-replication-listener http://flume.apache.org/FlumeUserGuide.html#a-simple-examplehttps://www.cyberagent.co.jp/recruit/techreport/report/id=7474https://do...原创 2014-12-18 11:46:06 · 122 阅读 · 0 评论