2020开年第一篇
在之前的一篇博客中,我们对比了C++和Python的开发效率和运行效率。而对比Python和Bash shell,则是无解。因为shell可以实现的复杂功能较少而能利用的各种工具又较繁杂,所以基本不具备可比性。可是,当面对一个特定的问题,稍微比较一下shell和Python,还是蛮有趣的。本文就来记述这样一个问题。
首先,出一道设计题。某人有若干条自己的Go对弈记录,包含对弈时间、黑、白、胜负情况。现在希望从这些对局记录中提取:
- 自己的总胜率、执黑胜率、执白胜率
- 某一天对弈的总胜率、执黑胜率、执白胜率 (当然也可以是某一个时间段的,为了简化问题,先设定为某一天的)
- 找出所有与自己对弈了N盘及以上的对手,并且该对手的胜率大于50%(也就是找出比自己厉害的对手)
那么如何做到以上三条需求呢?先想一想。
最容易想到的是把这些对弈记录都存到数据库中,然后通过SQL语言将这些功能实现。不错,这确实是个好方法。如果是从事数据库相关工作的朋友可以来试一试实现这种方法。不过,如果我们一定要编程实现呢,不用SQL. (其实以后可以把用SQL和不用SQL的方法做一下效率的比较,也蛮有意思的。)不用SQL的编程实现,最容易想到的当然是Python了,因为其开发效率最高;追求运行效率,可以用C++. 应该没人会想用shell吧?不错,用shell来实现这第3个需求似乎是有些麻烦,但如果仅仅是实现前2个需求的话,其实shell比Python简单! 稍后再谈具体实现。下面先进入设计环节。
无论用什么语言实现,你首先得决定对弈记录长什么样。没错,假设你希望它长什么样,它就长什么样。
传统的想法大概是长下面这样:
<Date> <Black Name> <White Name> <Game Result>
如果是这样的话,至少存在两个缺陷:
- 其中一个Name是多余的,不需要列在这里
- 对弈者姓名和 <Game Result> 实际上在此格式下不是那么方便就能取得。为什么呢?因为对弈者姓名中是可以夹杂空格的。
所以,需要修改一下对弈记录格式的设计。一个更好的方案如下:
<Date> <Game Result> <Player Name>
这样,既没有了多余的自己的对弈名,又因为 <Game Result> 格式固定从而可以方便取得对手的对弈名。可是问题又来了,如何知道黑白双方呢?
答案就在 <Game Result> 的设计了。它有6个值: black_win, black_lose, black_draw, white_win, white_lose, white_draw. 从而从字面上就可以轻易看出来黑白双方和对战结果了。比如:
20191230 black_win AA BB 2020
以上表示,主人公在2019-12-30这一天,执黑战胜了 “AA BB 2020” 这个对手。
设计完毕。用shell来实现前2条需求,应该怎么写呢?
#!/bin/sh
GORECORD_FILE="GoRecording.txt"
PRINT_LINE="---------------------------"
GREP_CMD="egrep"
function GetWinRate()
{
DATE=$1
if [ ! -z $DATE ]; then
echo $PRINT_LINE
echo "DATE: $DATE"
fi
BLACK_WIN=`$GREP_CMD -c "$DATE.*black_win" $GORECORD_FILE`
WHITE_WIN=`$GREP_CMD -c "$DATE.*white_win" $GORECORD_FILE`
BLACK_LOSE=`$GREP_CMD -c "$DATE.*black_lose" $GORECORD_FILE`
WHITE_LOSE=`$GREP_CMD -c "$DATE.*white_lose" $GORECORD_FILE`
BLACK_DRAW=`$GREP_CMD -c "$DATE.*black_draw" $GORECORD_FILE`
WHITE_DRAW=`$GREP_CMD -c "$DATE.*white_draw" $GORECORD_FILE`
GAMES_BLACK=$(($BLACK_WIN + $BLACK_LOSE + $BLACK_DRAW))
GAMES_WHITE=$(($WHITE_WIN + $WHITE_LOSE + $WHITE_DRAW))
TOTAL=$(($GAMES_BLACK + $GAMES_WHITE))
BLACK_WIN_RATE=`python -c "print('%.2f%%' % ($BLACK_WIN * 100.0 / $GAMES_BLACK))"`
WHITE_WIN_RATE=`python -c "print('%.2f%%' % ($WHITE_WIN * 100.0 / $GAMES_WHITE))"`
WIN_RATE=`python -c "print('%.2f%%' % (($BLACK_WIN + $WHITE_WIN) * 100.0 / $TOTAL))"`
echo $PRINT_LINE
echo "Total Games: $TOTAL"
echo "Win Rate: $WIN_RATE"
echo $PRINT_LINE
echo "Black Games: $GAMES_BLACK"
echo "Black Win: $BLACK_WIN"
echo "Black Lose: $BLACK_LOSE"
echo "Black Draw: $BLACK_DRAW"
echo "Black Win Rate: $BLACK_WIN_RATE"
echo $PRINT_LINE
echo "White Games: $GAMES_WHITE"
echo "White Win: $WHITE_WIN"
echo "White Lose: $WHITE_LOSE"
echo "White Draw: $WHITE_DRAW"
echo "White Win Rate: $WHITE_WIN_RATE"
echo $PRINT_LINE
}
########################
# Main
########################
INPUT_DATE=$1
if [ ! -z $INPUT_DATE ]; then
GetWinRate $INPUT_DATE
else
GetWinRate
fi
shell脚本的执行效果如下:
$ ./goplayer.sh
---------------------------
Total Games: 96
Win Rate: 71.88%
---------------------------
Black Games: 54
Black Win: 46
Black Lose: 8
Black Draw: 0
Black Win Rate: 85.19%
---------------------------
White Games: 42
White Win: 23
White Lose: 19
White Draw: 0
White Win Rate: 54.76%
---------------------------
$ ./goplayer.sh 20191224
---------------------------
DATE: 20191224
---------------------------
Total Games: 26
Win Rate: 69.23%
---------------------------
Black Games: 17
Black Win: 13
Black Lose: 4
Black Draw: 0
Black Win Rate: 76.47%
---------------------------
White Games: 9
White Win: 5
White Lose: 4
White Draw: 0
White Win Rate: 55.56%
---------------------------
而用 Python 来实现全部3条需求,又是怎么写的呢?如下:
# coding=utf-8
import re
import sys
import argparse
GO_RECORD_FILE = "GoRecording.txt"
SEP_LINE = "-" * 50
HIGHER_PLAYER_LIMITATION_GAMES = 3
regex_date = re.compile(r'^\s*(\d{8})\s+(.+)')
regex_black_win = re.compile(r'^black_win\s+(.+)')
regex_black_lose = re.compile(r'^black_lose\s+(.+)')
regex_black_draw = re.compile(r'^black_draw\s+(.+)')
regex_white_win = re.compile(r'^white_win\s+(.+)')
regex_white_lose = re.compile(r'^white_lose\s+(.+)')
regex_white_draw = re.compile(r'^white_draw\s+(.+)')
GLOBAL_DATA = {
"black_win": 0,
"black_lose": 0,
"black_draw": 0,
"white_win": 0,
"white_lose": 0,
"white_draw": 0,
"games_total": 0,
"games_black": 0,
"games_white": 0,
"win_rate": 0,
"black_win_rate": 0,
"white_win_rate": 0,
}
RECORDS_DICT = dict()
PLAYERS_DICT = dict()
# All of GLOBAL_DATA, RECORDS_DICT[play_date], and PLAYERS_DICT[player] will call this utility function.
def calc_win_rate(unit):
unit["games_black"] = unit["black_win"] + unit["black_lose"] + unit["black_draw"]
unit["games_white"] = unit["white_win"] + unit["white_lose"] + unit["white_draw"]
unit["games_total"] = unit["games_black"] + unit["games_white"]
if unit["games_total"] > 0:
unit["win_rate"] = 1.0 * (unit["black_win"] + unit["white_win"]) / unit["games_total"]
if unit["games_black"] > 0:
unit["black_win_rate"] = 1.0 * unit["black_win"] / unit["games_black"]
if unit["games_white"] > 0:
unit["white_win_rate"] = 1.0 * unit["white_win"] / unit["games_white"]
def setup_statistics():
global GLOBAL_DATA, RECORDS_DICT
with open(GO_RECORD_FILE, "r") as INPUT_FILE:
lines = INPUT_FILE.readlines()
for line in lines:
line = line.strip('\r').strip('\n').strip('\r').strip('\n')
play_date = ""
rest_line = ""
items = regex_date.search(line)
if items:
play_date = items.group(1)
rest_line = items.group(2)
if RECORDS_DICT.get(play_date) is None:
RECORDS_DICT[play_date] = dict()
RECORDS_DICT[play_date]['black_win'] = 0
RECORDS_DICT[play_date]['black_lose'] = 0
RECORDS_DICT[play_date]['black_draw'] = 0
RECORDS_DICT[play_date]['white_win'] = 0
RECORDS_DICT[play_date]['white_lose'] = 0
RECORDS_DICT[play_date]['white_draw'] = 0
RECORDS_DICT[play_date]['games_total'] = 0
RECORDS_DICT[play_date]['games_black'] = 0
RECORDS_DICT[play_date]['games_white'] = 0
RECORDS_DICT[play_date]['win_rate'] = 0
RECORDS_DICT[play_date]['black_win_rate'] = 0
RECORDS_DICT[play_date]['white_win_rate'] = 0
if rest_line.startswith("black_win"):
GLOBAL_DATA["black_win"] += 1
RECORDS_DICT[play_date]["black_win"] += 1
elif rest_line.startswith("black_lose"):
GLOBAL_DATA["black_lose"] += 1
RECORDS_DICT[play_date]["black_lose"] += 1
elif rest_line.startswith("black_draw"):
GLOBAL_DATA["black_draw"] += 1
RECORDS_DICT[play_date]["black_draw"] += 1
elif rest_line.startswith("white_win"):
GLOBAL_DATA["white_win"] += 1
RECORDS_DICT[play_date]["white_win"] += 1
elif rest_line.startswith("white_lose"):
GLOBAL_DATA["white_lose"] += 1
RECORDS_DICT[play_date]["white_lose"] += 1
elif rest_line.startswith("white_draw"):
GLOBAL_DATA["white_draw"] += 1
RECORDS_DICT[play_date]["white_draw"] += 1
else:
pass # Cannot get here
calc_win_rate(GLOBAL_DATA)
for _, record in RECORDS_DICT.iteritems():
calc_win_rate(record)
def display_result(data):
print(SEP_LINE)
print("Total Games: %d" % data["games_total"])
print("Win Rate: %.2f%%" % (100 * data["win_rate"]))
print("Black Games: %d" % data["games_black"])
print("Black Win: %d" % data["black_win"])
print("Black Lose: %d" % data["black_lose"])
print("Black Draw: %d" % data["black_draw"])
print("Black Win Rate: %.2f%%" % (100 * data["black_win_rate"]))
print("White Games: %d" % data["games_white"])
print("White Win: %d" % data["white_win"])
print("White Lose: %d" % data["white_lose"])
print("White Draw: %d" % data["white_draw"])
print("White Win Rate: %.2f%%" % (100 * data["white_win_rate"]))
def setup_player_statistics():
global PLAYERS_DICT
def setup_player_data(player, mode):
if PLAYERS_DICT.get(player) is None:
PLAYERS_DICT[player] = dict()
PLAYERS_DICT[player]["black_win"] = 0
PLAYERS_DICT[player]["black_lose"] = 0
PLAYERS_DICT[player]["black_draw"] = 0
PLAYERS_DICT[player]["white_win"] = 0
PLAYERS_DICT[player]["white_lose"] = 0
PLAYERS_DICT[player]["white_draw"] = 0
PLAYERS_DICT[player]["black_win_rate"] = 0
PLAYERS_DICT[player]["white_win_rate"] = 0
PLAYERS_DICT[player]["win_rate"] = 0
PLAYERS_DICT[player]["games_total"] = 0
PLAYERS_DICT[player]["games_black"] = 0
PLAYERS_DICT[player]["games_white"] = 0
PLAYERS_DICT[player][mode] += 1
with open(GO_RECORD_FILE, "r") as INPUT_FILE:
lines = INPUT_FILE.readlines()
for line in lines:
line = line.strip('\r').strip('\n').strip('\r').strip('\n')
rest_line = None
items = regex_date.search(line)
if items:
rest_line = items.group(2)
player = ""
if rest_line is not None:
if rest_line.startswith("black_win"):
name_items = regex_black_win.search(rest_line)
player = name_items.group(1)
setup_player_data(player, "black_win")
elif rest_line.startswith("black_lose"):
name_items = regex_black_lose.search(rest_line)
player = name_items.group(1)
setup_player_data(player, "black_lose")
elif rest_line.startswith("black_draw"):
name_items = regex_black_draw.search(rest_line)
player = name_items.group(1)
setup_player_data(player, "black_draw")
elif rest_line.startswith("white_win"):
name_items = regex_white_win.search(rest_line)
player = name_items.group(1)
setup_player_data(player, "white_win")
elif rest_line.startswith("white_lose"):
name_items = regex_white_lose.search(rest_line)
player = name_items.group(1)
setup_player_data(player, "white_lose")
elif rest_line.startswith("white_draw"):
name_items = regex_white_draw.search(rest_line)
player = name_items.group(1)
setup_player_data(player, "white_draw")
else:
pass # Cannot get here
for _, pdata in PLAYERS_DICT.iteritems():
calc_win_rate(pdata)
def display_player_data(player):
pdata = PLAYERS_DICT.get(player)
if pdata is None:
return
print(SEP_LINE)
print("Player: %s" % player)
display_result(pdata)
def print_higher_player():
global PLAYERS_DICT
higher_players = list()
for player, pdata in PLAYERS_DICT.iteritems():
if pdata["games_total"] >= HIGHER_PLAYER_LIMITATION_GAMES and pdata["win_rate"] < 0.5:
higher_players.append(player)
higher_players = sorted(higher_players, lambda x1, x2: cmp(PLAYERS_DICT[x1]["win_rate"], PLAYERS_DICT[x2]["win_rate"]))
for player in higher_players:
display_player_data(player)
def main():
global GLOBAL_DATA
parser = argparse.ArgumentParser(prog="python %s" % sys.argv[0])
parser.add_argument("-m", "--mode", dest='mode', required=False,
choices=["win_rate", "higher_player"],
help="Specify the mode: 'win_rate', 'higher_player'")
parser.add_argument("-d", "--date", dest='play_date', required=False,
help="Specify the date when playing the games")
args = parser.parse_args()
MODE, PLAY_DATE = args.mode, args.play_date
if MODE is None:
MODE = "win_rate"
if MODE == "win_rate":
setup_statistics()
if PLAY_DATE:
if RECORDS_DICT.get(PLAY_DATE):
print(SEP_LINE)
print("Play Date: %s" % PLAY_DATE)
display_result(RECORDS_DICT[PLAY_DATE])
else:
display_result(GLOBAL_DATA)
elif MODE == "higher_player":
setup_player_statistics()
print_higher_player()
if __name__ == "__main__":
main()
Python脚本的执行效果如下:
$ python gorate.py
--------------------------------------------------
Total Games: 96
Win Rate: 71.88%
Black Games: 54
Black Win: 46
Black Lose: 8
Black Draw: 0
Black Win Rate: 85.19%
White Games: 42
White Win: 23
White Lose: 19
White Draw: 0
White Win Rate: 54.76%
$ python gorate.py -d 20191224
--------------------------------------------------
Play Date: 20191224
--------------------------------------------------
Total Games: 26
Win Rate: 69.23%
Black Games: 17
Black Win: 13
Black Lose: 4
Black Draw: 0
Black Win Rate: 76.47%
White Games: 9
White Win: 5
White Lose: 4
White Draw: 0
White Win Rate: 55.56%
$ python gorate.py -m higher_player
--------------------------------------------------
Player: 棋手50151
--------------------------------------------------
Total Games: 3
Win Rate: 0.00%
Black Games: 1
Black Win: 0
Black Lose: 1
Black Draw: 0
Black Win Rate: 0.00%
White Games: 2
White Win: 0
White Lose: 2
White Draw: 0
White Win Rate: 0.00%
--------------------------------------------------
Player: 棋手71902
--------------------------------------------------
Total Games: 4
Win Rate: 25.00%
Black Games: 2
Black Win: 1
Black Lose: 1
Black Draw: 0
Black Win Rate: 50.00%
White Games: 2
White Win: 0
White Lose: 2
White Draw: 0
White Win Rate: 0.00%
--------------------------------------------------
Player: 棋手78237
--------------------------------------------------
Total Games: 3
Win Rate: 33.33%
Black Games: 3
Black Win: 1
Black Lose: 2
Black Draw: 0
Black Win Rate: 33.33%
White Games: 0
White Win: 0
White Lose: 0
White Draw: 0
White Win Rate: 0.00%
--------------------------------------------------
Player: 棋手43200
--------------------------------------------------
Total Games: 3
Win Rate: 33.33%
Black Games: 1
Black Win: 1
Black Lose: 0
Black Draw: 0
Black Win Rate: 100.00%
White Games: 2
White Win: 0
White Lose: 2
White Draw: 0
White Win Rate: 0.00%
shell实现前2条功能的代码是 59 行;
Python实现3条功能的代码是 267 行。
最后比较一下它们的运行效率:
在Cygwin里,当记录数达到96条时,对于第1条需求“计算总胜率、执黑胜率、执白胜率”,取 5 次计算平均:
- Python脚本的平均运行时间为: 0.279 秒
- Shell脚本的平均运行时间为: 1.249 秒
差不多Python的效率是shell的 4.48 倍啦。
而Python代码的前2条需求的有效行数在150行左右,而shell的行数仅仅60行不到。在开发效率上,shell又是Python的 2.5 倍左右了。
其实,无论是运行效率,还是开发效率,说多少倍,即使是在这个特定问题上,也没有太大的意义。但是,从中也可以看出比较明确的意义是: 有一些问题,用shell比Python具有更高的开发效率;而Python又比shell更能解决复杂的问题。
哦,对了,这个问题如果用C++做,效率会达到怎样呢(Boost支持正则)?这个问题就留给有兴趣的人吧。
(完)