- 博客(23)
- 收藏
- 关注
原创 Workflow工作经验之如何通过修改environment来提高程序速度
outputDefinition: type: "delimited" header: false fieldSeparator: "~" fieldDelimiter: "" recordSeparator: "\n" disableParallelism: false相比于 disableParallelism: true
2014-11-25 18:51:08 508
原创 UA 工作经验之常用命令和目录
sudo -u unar -i 修改权限到sudo/home/unar/SVN/tags/Release_0.3.11/SignalHubmain working folder
2014-11-12 18:40:50 702
原创 Workflow 工作经验之 jar extract error
- IOException: No such file or directoryERROR - RuntimeException: Unable to extract jar:file:/opt/vektor/lib/datarush-hadoop-cdh4-6.3.1-11.jar!/datarush-hadoop-cdh4/com.pervasive.datarush/hadoop-s
2014-11-12 18:39:06 542
原创 Screen操作
开始一个screen:screen -S screen_name进入:screen -r screen_namedetach:screen -d screen_name或者进入后 Ctrl + a+ d关闭:screen -listOutput:There is a screen on:23536.pts-0.wd
2014-10-24 14:51:27 466
原创 Vektor改变运行的版本
export VEKTOR_HOME=/San1/Vektor-2.1.0-SNAPSHOT/PATH=$PATH:/San1/Vektor-2.1.0-SNAPSHOT/bin
2014-10-24 13:25:08 470
原创 Workflow工作经验之常见错误
错误一: Out of MemoryERROR >>>ERROR - OutOfMemoryError: unable to create new native threadERROR - DRException: Failed to start dataflow graph etl\.fa_master_gen\.fa_master - createView-phas
2014-10-10 15:49:13 1373
原创 Important workfile
101 server:169.242.216.101temp file /edge/home/channel/operatestdata/UI_outoperauserPassword1hadoop fs -ls output_PA_SG_100pct/views//user/operauser/output_PA_SG_10
2014-09-28 16:41:07 530
原创 Hadoop ecosystem自己的理解
Avro : Avro is a data serialization framework that is useful in Hadoop and other systems. The framework allows one to define schema which is language independent so that data can be interchanged bet
2014-09-28 13:36:28 552
原创 Python 读写例子
def change(): output = open('helias2.csv','w') for line in open('helias.csv','r'): line_array = line.split('|') line_array[0] = 'E'+line_array[0].replace('-','_')
2014-09-23 18:02:17 400
转载 Python 文字处理
首先需要:import string然后有两种方法取代字符串中的某项:方法1:>>> a='...fuck...the....world............'>>> b=a.replace('.',' ')>>> print b fuck the world方法2:>>> a='...fuck...the....wo
2014-09-23 17:58:17 578
原创 Shell数据处理
cut -d "|" -f 1,3,4,5 pv_account_summary.csv | sort -t "|" -k 1 > helias.csv-f:
2014-09-23 17:52:51 527
原创 Vektor代码库
- operation: readView view: performance_visibility.notification.notification.fa_notification as: fa_notification - operation: union as: notification_union
2014-09-19 10:16:48 453
原创 Vektor run 参数
time vektor -Djava.io.tmpdir=/San1/VWM/user/hzhong/tmp -Xmx5G run --run-dependencies=true -e environment.yaml etl.fa_master_gen.fa_master | tee log0其中time 用来jis
2014-09-05 14:52:17 460
原创 Vektor workflow single server上的常见问题
1. 运行了没有反应:可能是server的空间满了。也可能是workplace
2014-08-18 11:55:13 419
原创 hadoop上优化速度可以采用改变outputDefinition为avro的方法
To optimise the performenc , after result validation, it is better let the intermediate output format as avro, need to do the setting in myenv.yamloutputDefinition:type: "avro"disableParalleli
2014-08-15 11:45:10 490
空空如也
空空如也
TA创建的收藏夹 TA关注的收藏夹
TA关注的人