hadoop on nitrous.io

Prepare SSH

ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa

cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

chmod 600 ~/.ssh/authorized_keys

Configuration

vim /home/action/software/hadoop-1.2.1/conf/hadoop-env.sh
export JAVA_HOME=/usr/lib/jvm/java-7-oracle

bin/hadoop namenode -format

bin/hadoop fs -mkdir input
bin/hadoop fs -put conf input
bin/hadoop fs -cp conf/*.xml input 

bin/start-all.sh

bin/hadoop jar hadoop-examples-1.2.1.jar grep input output 'dfs[a-z.]+'
bin/hadoop fs -rmr output

bin/hadoop jar hadoop-examples-1.2.1.jar wordcount input output
bin/hadoop fs -rmr output

bin/stop-all.sh

core-site.xml

<configuration>
    <property>
     <name>fs.default.name</name>
     <value>hdfs://localhost:9000</value>
    </property>
  <property>
    <name>hadoop.tmp.dir</name>
    <value>/home/action/tmp</value>
  </property>
</configuration>

hdfs-site.xml

<configuration>
<property>
  <name>dfs.replication</name>
  <value>1</value>
 </property>
</configuration>

mapred-site.xml

<configuration>
 <property>
   <name>mapred.job.tracker</name>
   <value>localhost:9001</value>
 </property>
</configuration>

"hadoop.tmp.dir" must be set, otherwise it will throw error

admin interface

http://localhost:50030/ - Hadoop 管理介面
http://localhost:50060/ - Hadoop Task Tracker 状态
http://localhost:50070/ - Hadoop DFS 状态

#Reference# angelosun.iteye.com
shaurong.blogspot.hk
blog.csdn.net

转载于:https://my.oschina.net/l1z2g9/blog/168100

结合下面hive元数据信息,生成hive建表语句,并且加上comment,注意day是分区 dwd_weibo_crawl NULL appmarket_appinfo GN线应用市场 2021-01-07 15:07:29 apk 应用包名 string day string入库日期 org.apache.hadoop.hive.ql.io.orc.OrcSerde serialization.format 1 hdfs://DSbigdata/hiveDW/dwd_exten_crawl/appmarket_appinfo org.apache.hadoop.hive.ql.io.orc.OrcInputFormat org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat dwd_weibo_crawl NULL appmarket_appinfo GN线应用市场 2021-01-07 15:07:29 app_name 应用名称 string day string 入库日期 org.apache.hadoop.hive.ql.io.orc.OrcSerde serialization.format 1 hdfs://DSbigdata/hiveDW/dwd_exten_crawl/appmarket_appinfo org.apache.hadoop.hive.ql.io.orc.OrcInputFormat org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat dwd_weibo_crawl NULL appmarket_appinfo GN线应用市场 2021-01-07 15:07:29 app_url 平台详情页链接 string day string入库日期 org.apache.hadoop.hive.ql.io.orc.OrcSerde serialization.format 1 hdfs://DSbigdata/hiveDW/dwd_exten_crawl/appmarket_appinfo org.apache.hadoop.hive.ql.io.orc.OrcInputFormat org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat dwd_weibo_crawl NULL appmarket_appinfo GN线应用市场 2021-01-07 15:07:29 cate 应用所属分类 string day string入库日期 org.apache.hadoop.hive.ql.io.orc.OrcSerde serialization.format 1 hdfs://DSbigdata/hiveDW/dwd_exten_crawl/appmarket_appinfo org.apache.hadoop.hive.ql.io.orc.OrcInputFormat org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat dwd_weibo_crawl NULL appmarket_appinfo GN线应用市场 2021-01-07 15:07:29 other 其他 string day string 入库日期 org.apache.hadoop.hive.ql.io.orc.OrcSerde serialization.format 1 hdfs://DSbigdata/hiveDW/dwd_exten_crawl/appmarket_appinfo org.apache.hadoop.hive.ql.io.orc.OrcInputFormat org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat dwd_weibo_crawl NULL appmarket_appinfo GN线应用市场 2021-01-07 15:07:29 region 平台名称 string day stri
06-10
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值