2015年09月_zcc_0015

12月 11月 10月 09月 08月 07月 06月 05月 04月 03月 02月 01月

转载用python + hadoop streaming 编写分布式程序的本地调试方法

使用python编写Hadoop Streaming程序有几点需要注意：在能使用iterator的情况下，尽量使用iterator，避免将stdin的输入大量储存在内存里，否则会严重降低性能streaming不会帮你分割key和value传进来，传进来的只是一个个字符串而已，需要你自己在代码里手动调用split()从stdin得到的每一行数据末尾似乎会有\n，保险起见一般都需要使用rstr

2015-09-29 17:46:41 2123

转载 hadoop在put数据时，出现org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException 分析

org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not replicated yet:/nnThroughputBenchmark/addblock/AddblockBenchDir0/AddblockBench0 at org.apache.hadoop.hdfs.server.namenode.FSN

2015-09-24 19:14:17 5513 3

原创 python的编码问题

字符串在Python内部的表示是unicode编码，因此，在做编码转换时，通常需要以unicode作为中间编码，即先将其他编码的字符串解码（decode）成unicode，再从unicode编码（encode）成另一种编码。 decode的作用是将其他编码的字符串转换成unicode编码，如str1.decode('gb2312')，表示将gb2312编码的字符串str1转换成un

2015-09-23 15:55:20 786

原创 shell如何将文本文件字符串换行后转换为数组

1 #!/bin/sh 2 #首先判断有没有启动失败的文件，有则处理 3 if [ -f "./task_start_failed.log" ]&&[ ` ls -l task_start_failed.log| awk '{print $5}' ` -gt 0 ]; then 4 # 将失败的任务id 保存在数组array中 5 array=(`cat

2015-09-17 20:57:11 10444

原创 linux下拉取文件的方法

1、通过scp命令 scp -P 22 -r /home/server Android@192.168.1.110:/opt 将本地/home/server的文件夹上传到远端服务器192.168.1.110的目录/opt下 scp -P 22 -r android@192.168.1.110:/opt/docs /home 将远端服务器192.

2015-09-01 15:45:41 10295

aopalliance

aop 面向切面编程，通过此工具包，实现面向切面的编程，部署及维护

2012-11-20

空空如也

TA创建的收藏夹 TA关注的收藏夹

TA关注的人

转载 用python + hadoop streaming 编写分布式程序的本地调试方法