hive 0.8运行python脚本问题

最近在hive上执行python脚本出现了以下问题,在hive命令行里,执行时报错信息如下:

hive>  from records                                  
    > select transform(year,temperature,quality)     
    > using 'python /user/hive/script/is_good_quality.py'    
    > as year,temperature;                               
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201112291016_0023, Tracking URL = http://10.200.187.26:50030/jobdetails.jsp?jobid=job_201112291016_0023
Kill Command = /opt/hadoop-0.20.205.0/libexec/../bin/hadoop job  -Dmapred.job.tracker=10.200.187.26:9001 -kill job_201112291016_0023
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2011-12-29 14:56:34,192 Stage-1 map = 0%,  reduce = 0%
2011-12-29 14:57:16,405 Stage-1 map = 100%,  reduce = 100%
Ended Job = job_201112291016_0023 with errors
Error during job, obtaining debugging information...
Examining task ID: task_201112291016_0023_m_000002 (and more) from job job_201112291016_0023
Exception in thread "Thread-248" java.lang.RuntimeException: Error while reading from task log url
at org.apache.hadoop.hive.ql.exec.errors.TaskLogProcessor.getErrors(TaskLogProcessor.java:130)
at org.apache.hadoop.hive.ql.exec.JobDebugger.showJobFailDebugInfo(JobDebugger.java:211)
at org.apache.hadoop.hive.ql.exec.JobDebugger.run(JobDebugger.java:81)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351)
at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:213)
at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
at java.net.Socket.connect(Socket.java:529)
at java.net.Socket.connect(Socket.java:478)
at sun.net.NetworkClient.doConnect(NetworkClient.java:163)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:395)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:530)
at sun.net.www.http.HttpClient.<init>(HttpClient.java:234)
at sun.net.www.http.HttpClient.New(HttpClient.java:307)
at sun.net.www.http.HttpClient.New(HttpClient.java:324)
at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:970)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:911)
at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:836)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1172)
at java.net.URL.openStream(URL.java:1010)
at org.apache.hadoop.hive.ql.exec.errors.TaskLogProcessor.getErrors(TaskLogProcessor.java:120)
... 3 more


在hadoop的日志文件里(/opt/hadoop-0.20.205.0/logs/hadoop-root-jobtracker-chenyi3.log),错误信息如下:

2011-12-29 14:57:06,865 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_201112291016_0023_m_000000_3: java.lang.RuntimeException: Hive Runtime Error while closing operators
at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:226)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hit error while closing ..
at org.apache.hadoop.hive.ql.exec.ScriptOperator.close(ScriptOperator.java:452)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:193)
... 7 more


按照之前找到的解决方案如下:

A few things I'd check for if I were debugging this:

1) Is the python file set to be executable (chmod +x file.py)

2) Make sure the python file is in the same place on all machines. Probably better - put the file in hdfs then you can use " using 'hdfs://path/to/file.py' " instead of a local path

3) Take a look at your job on the hadoop dashboard (http://master-node:9100), if you click on a failed task it will give you the actual java error and stack trace so you can see what actually went wrong with the execution

4) make sure python is installed on all the slave nodes! (I always overlook this one)

Hope that helps.....

还是无法执行成功,暂时无解中(如有网友知道,请告知)……………………

通过几天的努力,终于把这个问题解决了,原因在于配置/etc/hosts文件,请参考我另一篇文章《Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.》。

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值