Screen 配置:
shell -$SHELL
vbell off
defscrollback 8192
#caption always "%?%F%{wb}%? %L=%-Lw%45>%{+b bw}%n%f* %t%{-}%+Lw%-0<"
# %?%F%{wb}%? : set the current window title's color to white/blue, white is bg, blue is fg
caption always "%{wb} %-w%{+b bw}%n %t%{-}%+w"
#caption always "%{=b kR} %-w%{-b bg}%n %t%{-}%+w"
#escape ^Jj
-----------------------
python去重unicode的特殊字符
vim 列编辑
ctrl-v(window ctrl-q), I, double esc
--------------
vim缩进:
set shiftwidth=4
set expandtab
set softtabstop=4
---------------
hadoop ant:
在cygwin bash shell 下成功
查看端口所属进程:
lsof -i[46] [protocol][@hostname|hostaddr][:service|port]
46 --> ipv4 or ipv6
protocol --> tcp or udp
hostname --> internet host name
hostaddr --> ipv4地址
service --> /etc/service中的 service name (可以不只一个)
port --> 端口号 (可以不只一个)
例如: 查看22端口现在运行的情况
# lsof -i :22
-----------------
python 匹配所有字符
If the DOTALL flag has been specified, this matches any character including a newline.
-------------------------
shell 简单的字符串替换:
#!/bin/bash
a="111 222 333 444"
b="111 333"
null=""
for i in $b
do
a=${a//$i/$null}
done
c="$a"
./mongo localhost:12345/crawler -u 'username' -p 'password' --eval "printjson(db.secondhand.group({ key: {source:true}, initial: {count:0}, reduce: function(obj, prev){prev.count++;}})"
emit(this.source, 1);
}
r = function (k, vals) {
count = 0;
vals.forEach(function (val) {count += val;});
return count;
}
now = new Date();
d = new Date(now.getFullYear(), now.getMonth(), now.getDate());
shell -$SHELL
vbell off
defscrollback 8192
#caption always "%?%F%{wb}%? %L=%-Lw%45>%{+b bw}%n%f* %t%{-}%+Lw%-0<"
# %?%F%{wb}%? : set the current window title's color to white/blue, white is bg, blue is fg
caption always "%{wb} %-w%{+b bw}%n %t%{-}%+w"
#caption always "%{=b kR} %-w%{-b bg}%n %t%{-}%+w"
#escape ^Jj
-----------------------
simHash 一种hash算法,simHash的相似度可以反映文档的相似度
python去重unicode的特殊字符
s.encode('gb2312', 'ignore').decode('gb2312')
vim 列编辑
ctrl-v(window ctrl-q), I, double esc
--------------
vim缩进:
set shiftwidth=4
set expandtab
set softtabstop=4
---------------
hadoop ant:
在cygwin bash shell 下成功
在mintty 下不成功
查看端口所属进程:
lsof -i[46] [protocol][@hostname|hostaddr][:service|port]
46 --> ipv4 or ipv6
protocol --> tcp or udp
hostname --> internet host name
hostaddr --> ipv4地址
service --> /etc/service中的 service name (可以不只一个)
port --> 端口号 (可以不只一个)
例如: 查看22端口现在运行的情况
# lsof -i :22
-----------------
python 匹配所有字符
If the DOTALL flag has been specified, this matches any character including a newline.
-------------------------
shell 简单的字符串替换:
#!/bin/bash
a="111 222 333 444"
b="111 333"
null=""
for i in $b
do
a=${a//$i/$null}
done
c="$a"
echo $c
在crontab的脚本里即使执行了source ~/.bash_profile里,也不能保证PATH和登录的环境一样,最保险的办法就是显示的修改PATH
-----------------
安装mssql支持
-------------------------------------------
两种方式实现mongo的mapreduce
1.
./mongo localhost:12345/crawler -u 'username' -p 'password' --eval "printjson(db.secondhand.group({ key: {source:true}, initial: {count:0}, reduce: function(obj, prev){prev.count++;}})"
---------
2.
emit(this.source, 1);
}
r = function (k, vals) {
count = 0;
vals.forEach(function (val) {count += val;});
return count;
}
now = new Date();
d = new Date(now.getFullYear(), now.getMonth(), now.getDate());
res = db.secondhand.mapReduce(m, r, {query: {addTime:{$gte:d}}, out: {inline:1}} )
-------------
log统计:
1. sort可以按数字方式排序
cat access.log.2011-10-07-08-00 |awk '{print $3}'|sort|uniq -c|sort -rn > visit_times.txt
2. sort 按某一行排序
cat map_reduce.result |awk -F': ' '$2>1{print$1"\t"$2 }'|sort -k 2 -t $'\t' -nr|more