安装配置pig 0.12.0
1、下载pig 0.12.0
2、直接解压,配置环境变量
export JAVA_HOME=/usr/java/jdk1.7.0_45
export HADOOP_HOME=/home/hdpuser/hadoop-2.2.0
export PIG_CLASSPATH=$HADOOP_HOME/etc/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$JAVA_HOME/bin:$JAVA_HOME/jre/bin:/home/hdpuser/pig-0.12.0/bin:/home/hdpuser/apache-ant-1.9.2/bin
export PIG_HADOOP_VERSION=23
export HADOOP_HOME=/home/hdpuser/hadoop-2.2.0
export PIG_CLASSPATH=$HADOOP_HOME/etc/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$JAVA_HOME/bin:$JAVA_HOME/jre/bin:/home/hdpuser/pig-0.12.0/bin:/home/hdpuser/apache-ant-1.9.2/bin
export PIG_HADOOP_VERSION=23
3、pig -x local
4、做一个简单load文件的操作,发现有如下报错:
WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
5、查阅各种资料后,做如下动作(有些动作 可能是无效的)
a、参考RELEASE_NOTES.txt,执行如下步骤
1. Download pig-0.12.0.tar.gz
2. Unpack the file: tar -xzvf pig-0.12.0.tar.gz
3. Move into the installation directory: cd pig-0.12.0
4. To run pig without Hadoop cluster, execute the command below. This will
take you into an interactive shell called grunt that allows you to navigate
the local file system and execute Pig commands against the local files
bin/pig -x local
5. To run on your Hadoop cluster, you need to set PIG_CLASSPATH environment
variable to point to the directory with your hadoop-site.xml file and then run
pig. The commands below will take you into an interactive shell called grunt
that allows you to navigate Hadoop DFS and execute Pig commands against it
export PIG_CLASSPATH=/hadoop/conf
bin/pig
6. To build your own version of pig.jar run
ant
7. To run unit tests run
ant test
8. To build jar file with available user defined functions run commands below.
cd contrib/piggybank/java
ant
9. To build the tutorial:
cd tutorial
ant
10. To run tutorial follow instructions in http://wiki.apache.org/pig/PigTutorial
b、随后发现问题依然存在,在网上看到如下说法:
ant clean jar-withouthadoop -Dhadoopversion=23,依然很纠结,我的环境是hadoop 2.2.0,不知道这个数值应该如何设置,看到过0.18的设置为18,0.20的设置为20,抱着试试的态度,执行了一下。问题解决。
这里,不知道的是,之前的步骤a是否一定要执行。
6、将/etc/passwd文件copy到当前目录,执行pig -x local(本地模式)
7、参考官网的内容执行(
http://pig.apache.org/docs/r0.12.0/start.html):
grunt> A = load 'passwd' using PigStorage(':');
grunt> B = foreach A generate $0 as id;
grunt> dump B;
正确返回。
测试PIG工作
随后测试hadoop模式下是否可以正常工作。这里将集群环境切回伪集群模式。在全集群模式下,还是有报错,尚未解决,初步判断和集群配置有关。
1、将PIG教程带的文件COPY至HDFS
hadoop fs -copyFromLocal /home/hdpuser/pig-0.12.0/tutorial/data/excite-small.log /user/hdpuser/excite-small.log
2、pig -x mapreduce,进入pig hadoop模式
3、 grunt> cd /user
4、 grunt>cd /hdpuser
5、ls查看copy的文件是否存在
6、 grunt> log = LOAD '/usr/hdpusr/excite-small.log' AS (user:chararray, time:long, query:chararray);
7、grunt> lmt = LIMIT log 4;
8、grunt> DUMP lmt;
9、结果正确返回
Job Stats (time in seconds):
JobId Maps Reduces MaxMapTime MinMapTIme AvgMapTime MedianMapTime MaxReduceTime MinReduceTime AvgReduceTime MedianReducetime Alias Feature Outputs
job_local1555930682_0002 1 1 0 0 0 0 0 0 0 0 lmt,log
job_local1844986210_0003 1 1 0 0 0 0 0 0 0 0 log hdfs://localhost:9000/tmp/temp1201738014/tmp-313826598,
Input(s):
Successfully read 0 records from: "/user/hdpuser/excite-small.log"
Output(s):
Successfully stored 0 records in: "hdfs://localhost:9000/tmp/temp1201738014/tmp-313826598"
Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_local1555930682_0002 -> job_local1844986210_0003,
job_local1844986210_0003
2013-12-08 04:45:37,849 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
2013-12-08 04:45:37,852 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2013-12-08 04:45:37,852 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2013-12-08 04:45:37,852 [main] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized
2013-12-08 04:45:37,860 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2013-12-08 04:45:37,860 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
(2A9EABFB35F5B954,970916105432,+md foods +proteins)
(BED75271605EBD0C,970916001949,yahoo chat)
(BED75271605EBD0C,970916001954,yahoo chat)
(BED75271605EBD0C,970916003523,yahoo chat)