hadoop学习笔记-第四天-PIG环境搭建

安装配置pig 0.12.0

1、下载pig 0.12.0

2、直接解压,配置环境变量
export JAVA_HOME=/usr/java/jdk1.7.0_45
export HADOOP_HOME=/home/hdpuser/hadoop-2.2.0
export PIG_CLASSPATH=$HADOOP_HOME/etc/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$JAVA_HOME/bin:$JAVA_HOME/jre/bin:/home/hdpuser/pig-0.12.0/bin:/home/hdpuser/apache-ant-1.9.2/bin
export PIG_HADOOP_VERSION=23
3、pig -x local
4、做一个简单load文件的操作,发现有如下报错:
WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
5、查阅各种资料后,做如下动作(有些动作 可能是无效的)
    a、参考RELEASE_NOTES.txt,执行如下步骤
1. Download pig-0.12.0.tar.gz
2. Unpack the file: tar -xzvf pig-0.12.0.tar.gz
3. Move into the installation directory: cd pig-0.12.0
4. To run pig without Hadoop cluster, execute the command below. This will
take you into an interactive shell called grunt that allows you to navigate
the local file system and execute Pig commands against the local files
    bin/pig -x local
5. To run on your Hadoop cluster, you need to set PIG_CLASSPATH environment
variable to point to the directory with your hadoop-site.xml file and then run
pig. The commands below will take you into an interactive shell called grunt
that allows you to navigate Hadoop DFS and execute Pig commands against it
export PIG_CLASSPATH=/hadoop/conf
    bin/pig
6. To build your own version of pig.jar run
    ant
7. To run unit tests run
    ant test
8. To build jar file with available user defined functions run commands below.
    cd contrib/piggybank/java
    ant
9. To build the tutorial:
    cd tutorial
    ant
10. To run tutorial follow instructions in http://wiki.apache.org/pig/PigTutorial
      b、随后发现问题依然存在,在网上看到如下说法:
      ant clean jar-withouthadoop -Dhadoopversion=23,依然很纠结,我的环境是hadoop 2.2.0,不知道这个数值应该如何设置,看到过0.18的设置为18,0.20的设置为20,抱着试试的态度,执行了一下。问题解决。
       这里,不知道的是,之前的步骤a是否一定要执行。
6、将/etc/passwd文件copy到当前目录,执行pig -x local(本地模式)
7、参考官网的内容执行( http://pig.apache.org/docs/r0.12.0/start.html):
grunt> A = load 'passwd' using PigStorage(':'); 
grunt> B = foreach A generate $0 as id; 
grunt> dump B; 

      正确返回。

测试PIG工作

随后测试hadoop模式下是否可以正常工作。这里将集群环境切回伪集群模式。在全集群模式下,还是有报错,尚未解决,初步判断和集群配置有关。
1、将PIG教程带的文件COPY至HDFS
  hadoop fs -copyFromLocal  /home/hdpuser/pig-0.12.0/tutorial/data/excite-small.log /user/hdpuser/excite-small.log
2、pig -x mapreduce,进入pig hadoop模式
3、 grunt> cd /user
4、 grunt>cd /hdpuser
5、ls查看copy的文件是否存在
6、 grunt> log = LOAD '/usr/hdpusr/excite-small.log' AS (user:chararray, time:long, query:chararray);
7、grunt> lmt = LIMIT log 4;
8、grunt> DUMP lmt;
9、结果正确返回
Job Stats (time in seconds):
JobId   Maps    Reduces MaxMapTime      MinMapTIme      AvgMapTime      MedianMapTime   MaxReduceTime   MinReduceTime   AvgReduceTime   MedianReducetime        Alias Feature  Outputs
job_local1555930682_0002        1       1       0       0       0       0       0       0       0       0       lmt,log
job_local1844986210_0003        1       1       0       0       0       0       0       0       0       0       log             hdfs://localhost:9000/tmp/temp1201738014/tmp-313826598,


Input(s):
Successfully read 0 records from: "/user/hdpuser/excite-small.log"


Output(s):
Successfully stored 0 records in: "hdfs://localhost:9000/tmp/temp1201738014/tmp-313826598"


Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0


Job DAG:
job_local1555930682_0002        ->      job_local1844986210_0003,
job_local1844986210_0003




2013-12-08 04:45:37,849 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
2013-12-08 04:45:37,852 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2013-12-08 04:45:37,852 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2013-12-08 04:45:37,852 [main] WARN  org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized
2013-12-08 04:45:37,860 [main] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2013-12-08 04:45:37,860 [main] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
(2A9EABFB35F5B954,970916105432,+md foods +proteins)
(BED75271605EBD0C,970916001949,yahoo chat)
(BED75271605EBD0C,970916001954,yahoo chat)
(BED75271605EBD0C,970916003523,yahoo chat)


  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值