Python又战Shell -实现几个对弈记录统计功能

2020开年第一篇
在之前的一篇博客中,我们对比了C++和Python的开发效率和运行效率。而对比Python和Bash shell,则是无解。因为shell可以实现的复杂功能较少而能利用的各种工具又较繁杂,所以基本不具备可比性。可是,当面对一个特定的问题,稍微比较一下shell和Python,还是蛮有趣的。本文就来记述这样一个问题。

首先,出一道设计题。某人有若干条自己的Go对弈记录,包含对弈时间、黑、白、胜负情况。现在希望从这些对局记录中提取:

  1. 自己的总胜率、执黑胜率、执白胜率
  2. 某一天对弈的总胜率、执黑胜率、执白胜率 (当然也可以是某一个时间段的,为了简化问题,先设定为某一天的)
  3. 找出所有与自己对弈了N盘及以上的对手,并且该对手的胜率大于50%(也就是找出比自己厉害的对手)

那么如何做到以上三条需求呢?先想一想。
最容易想到的是把这些对弈记录都存到数据库中,然后通过SQL语言将这些功能实现。不错,这确实是个好方法。如果是从事数据库相关工作的朋友可以来试一试实现这种方法。不过,如果我们一定要编程实现呢,不用SQL. (其实以后可以把用SQL和不用SQL的方法做一下效率的比较,也蛮有意思的。)不用SQL的编程实现,最容易想到的当然是Python了,因为其开发效率最高;追求运行效率,可以用C++. 应该没人会想用shell吧?不错,用shell来实现这第3个需求似乎是有些麻烦,但如果仅仅是实现前2个需求的话,其实shell比Python简单! 稍后再谈具体实现。下面先进入设计环节。

无论用什么语言实现,你首先得决定对弈记录长什么样。没错,假设你希望它长什么样,它就长什么样。
传统的想法大概是长下面这样:

    <Date>  <Black Name>  <White Name>  <Game Result>

如果是这样的话,至少存在两个缺陷:

  1. 其中一个Name是多余的,不需要列在这里
  2. 对弈者姓名和 <Game Result> 实际上在此格式下不是那么方便就能取得。为什么呢?因为对弈者姓名中是可以夹杂空格的。

所以,需要修改一下对弈记录格式的设计。一个更好的方案如下:

    <Date> <Game Result> <Player Name>

这样,既没有了多余的自己的对弈名,又因为 <Game Result> 格式固定从而可以方便取得对手的对弈名。可是问题又来了,如何知道黑白双方呢?
答案就在 <Game Result> 的设计了。它有6个值: black_win, black_lose, black_draw, white_win, white_lose, white_draw. 从而从字面上就可以轻易看出来黑白双方和对战结果了。比如:

    20191230    black_win   AA BB 2020

以上表示,主人公在2019-12-30这一天,执黑战胜了 “AA BB 2020” 这个对手。

设计完毕。用shell来实现前2条需求,应该怎么写呢?

#!/bin/sh

GORECORD_FILE="GoRecording.txt"
PRINT_LINE="---------------------------"
GREP_CMD="egrep"


function GetWinRate()
{
    DATE=$1
    if [ ! -z $DATE ]; then
        echo $PRINT_LINE
        echo "DATE: $DATE" 
    fi
    
    BLACK_WIN=`$GREP_CMD  -c "$DATE.*black_win"  $GORECORD_FILE`
    WHITE_WIN=`$GREP_CMD  -c "$DATE.*white_win"  $GORECORD_FILE`
    BLACK_LOSE=`$GREP_CMD -c "$DATE.*black_lose" $GORECORD_FILE`
    WHITE_LOSE=`$GREP_CMD -c "$DATE.*white_lose" $GORECORD_FILE`
    BLACK_DRAW=`$GREP_CMD -c "$DATE.*black_draw" $GORECORD_FILE`
    WHITE_DRAW=`$GREP_CMD -c "$DATE.*white_draw" $GORECORD_FILE`

    GAMES_BLACK=$(($BLACK_WIN + $BLACK_LOSE + $BLACK_DRAW))
    GAMES_WHITE=$(($WHITE_WIN + $WHITE_LOSE + $WHITE_DRAW))
    TOTAL=$(($GAMES_BLACK + $GAMES_WHITE))

    BLACK_WIN_RATE=`python -c "print('%.2f%%' % ($BLACK_WIN * 100.0 / $GAMES_BLACK))"`
    WHITE_WIN_RATE=`python -c "print('%.2f%%' % ($WHITE_WIN * 100.0 / $GAMES_WHITE))"`
    WIN_RATE=`python -c "print('%.2f%%' % (($BLACK_WIN + $WHITE_WIN) * 100.0 / $TOTAL))"`

    echo $PRINT_LINE
    echo "Total Games: $TOTAL"
    echo "Win Rate:    $WIN_RATE"
    echo $PRINT_LINE
    echo "Black Games: $GAMES_BLACK"
    echo "Black Win:   $BLACK_WIN"
    echo "Black Lose:  $BLACK_LOSE"
    echo "Black Draw:  $BLACK_DRAW"
    echo "Black Win Rate: $BLACK_WIN_RATE"
    echo $PRINT_LINE
    echo "White Games: $GAMES_WHITE"
    echo "White Win:   $WHITE_WIN"
    echo "White Lose:  $WHITE_LOSE"
    echo "White Draw:  $WHITE_DRAW"
    echo "White Win Rate: $WHITE_WIN_RATE"
    echo $PRINT_LINE
}

########################
#   Main 
########################

INPUT_DATE=$1

if [ ! -z $INPUT_DATE ]; then 
    GetWinRate $INPUT_DATE 
else
    GetWinRate 
fi 

shell脚本的执行效果如下:

$ ./goplayer.sh
---------------------------
Total Games: 96
Win Rate:    71.88%
---------------------------
Black Games: 54
Black Win:   46
Black Lose:  8
Black Draw:  0
Black Win Rate: 85.19%
---------------------------
White Games: 42
White Win:   23
White Lose:  19
White Draw:  0
White Win Rate: 54.76%
---------------------------

$ ./goplayer.sh 20191224
---------------------------
DATE: 20191224
---------------------------
Total Games: 26
Win Rate:    69.23%
---------------------------
Black Games: 17
Black Win:   13
Black Lose:  4
Black Draw:  0
Black Win Rate: 76.47%
---------------------------
White Games: 9
White Win:   5
White Lose:  4
White Draw:  0
White Win Rate: 55.56%
---------------------------

而用 Python 来实现全部3条需求,又是怎么写的呢?如下:

# coding=utf-8

import re
import sys
import argparse


GO_RECORD_FILE = "GoRecording.txt"
SEP_LINE = "-" * 50 

HIGHER_PLAYER_LIMITATION_GAMES = 3

regex_date       = re.compile(r'^\s*(\d{8})\s+(.+)')
regex_black_win  = re.compile(r'^black_win\s+(.+)')
regex_black_lose = re.compile(r'^black_lose\s+(.+)')
regex_black_draw = re.compile(r'^black_draw\s+(.+)')
regex_white_win  = re.compile(r'^white_win\s+(.+)')
regex_white_lose = re.compile(r'^white_lose\s+(.+)')
regex_white_draw = re.compile(r'^white_draw\s+(.+)')


GLOBAL_DATA = {
    "black_win":  0,
    "black_lose": 0,
    "black_draw": 0,

    "white_win":  0,
    "white_lose": 0,
    "white_draw": 0,

    "games_total": 0,
    "games_black": 0,
    "games_white": 0,

    "win_rate":       0,
    "black_win_rate": 0,
    "white_win_rate": 0,
}

RECORDS_DICT = dict()
PLAYERS_DICT = dict()

# All of GLOBAL_DATA, RECORDS_DICT[play_date], and PLAYERS_DICT[player] will call this utility function. 
def calc_win_rate(unit):
    unit["games_black"] = unit["black_win"] + unit["black_lose"] + unit["black_draw"]
    unit["games_white"] = unit["white_win"] + unit["white_lose"] + unit["white_draw"]
    unit["games_total"] = unit["games_black"] + unit["games_white"]
    
    if unit["games_total"] > 0:
        unit["win_rate"] = 1.0 * (unit["black_win"] + unit["white_win"]) / unit["games_total"]
    if unit["games_black"] > 0:
        unit["black_win_rate"] = 1.0 * unit["black_win"] / unit["games_black"]
    if unit["games_white"] > 0:
        unit["white_win_rate"] = 1.0 * unit["white_win"] / unit["games_white"]


def setup_statistics():
    global GLOBAL_DATA, RECORDS_DICT 
    
    with open(GO_RECORD_FILE, "r") as INPUT_FILE:
        lines = INPUT_FILE.readlines()
        for line in lines:
            line = line.strip('\r').strip('\n').strip('\r').strip('\n')
            play_date = ""
            rest_line = ""
            
            items = regex_date.search(line)
            if items:
                play_date = items.group(1) 
                rest_line = items.group(2)
                
                if RECORDS_DICT.get(play_date) is None:
                    RECORDS_DICT[play_date] = dict()
                    RECORDS_DICT[play_date]['black_win']  = 0
                    RECORDS_DICT[play_date]['black_lose'] = 0
                    RECORDS_DICT[play_date]['black_draw'] = 0
                    RECORDS_DICT[play_date]['white_win']  = 0
                    RECORDS_DICT[play_date]['white_lose'] = 0
                    RECORDS_DICT[play_date]['white_draw'] = 0
                    RECORDS_DICT[play_date]['games_total'] = 0
                    RECORDS_DICT[play_date]['games_black'] = 0
                    RECORDS_DICT[play_date]['games_white'] = 0
                    RECORDS_DICT[play_date]['win_rate']       = 0
                    RECORDS_DICT[play_date]['black_win_rate'] = 0
                    RECORDS_DICT[play_date]['white_win_rate'] = 0
                
                if rest_line.startswith("black_win"):
                    GLOBAL_DATA["black_win"] += 1
                    RECORDS_DICT[play_date]["black_win"] += 1
                    
                elif rest_line.startswith("black_lose"):
                    GLOBAL_DATA["black_lose"] += 1
                    RECORDS_DICT[play_date]["black_lose"] += 1
                    
                elif rest_line.startswith("black_draw"):
                    GLOBAL_DATA["black_draw"] += 1
                    RECORDS_DICT[play_date]["black_draw"] += 1
                    
                elif rest_line.startswith("white_win"):
                    GLOBAL_DATA["white_win"] += 1
                    RECORDS_DICT[play_date]["white_win"] += 1
                    
                elif rest_line.startswith("white_lose"):
                    GLOBAL_DATA["white_lose"] += 1
                    RECORDS_DICT[play_date]["white_lose"] += 1
                    
                elif rest_line.startswith("white_draw"):
                    GLOBAL_DATA["white_draw"] += 1
                    RECORDS_DICT[play_date]["white_draw"] += 1
                
                else:
                    pass  # Cannot get here
    
    calc_win_rate(GLOBAL_DATA)
    
    for _, record in RECORDS_DICT.iteritems():
        calc_win_rate(record)
        

def display_result(data):
    print(SEP_LINE)
    print("Total Games:     %d" % data["games_total"])
    print("Win Rate:        %.2f%%" % (100 * data["win_rate"]))
    
    print("Black Games:     %d" % data["games_black"])
    print("Black Win:       %d" % data["black_win"])
    print("Black Lose:      %d" % data["black_lose"])
    print("Black Draw:      %d" % data["black_draw"])
    print("Black Win Rate:  %.2f%%" % (100 * data["black_win_rate"]))
    
    print("White Games:     %d" % data["games_white"])
    print("White Win:       %d" % data["white_win"])
    print("White Lose:      %d" % data["white_lose"])
    print("White Draw:      %d" % data["white_draw"])
    print("White Win Rate:  %.2f%%" % (100 * data["white_win_rate"]))


def setup_player_statistics():
    global PLAYERS_DICT
    
    def setup_player_data(player, mode):
        if PLAYERS_DICT.get(player) is None: 
            PLAYERS_DICT[player] = dict()
            PLAYERS_DICT[player]["black_win"]  = 0
            PLAYERS_DICT[player]["black_lose"] = 0
            PLAYERS_DICT[player]["black_draw"] = 0
            PLAYERS_DICT[player]["white_win"]  = 0
            PLAYERS_DICT[player]["white_lose"] = 0
            PLAYERS_DICT[player]["white_draw"] = 0
            
            PLAYERS_DICT[player]["black_win_rate"] = 0
            PLAYERS_DICT[player]["white_win_rate"] = 0
            PLAYERS_DICT[player]["win_rate"] = 0
            
            PLAYERS_DICT[player]["games_total"] = 0
            PLAYERS_DICT[player]["games_black"] = 0
            PLAYERS_DICT[player]["games_white"] = 0
            
        PLAYERS_DICT[player][mode] += 1
    
    with open(GO_RECORD_FILE, "r") as INPUT_FILE:
        lines = INPUT_FILE.readlines()
        for line in lines:
            line = line.strip('\r').strip('\n').strip('\r').strip('\n')
            rest_line = None
            
            items = regex_date.search(line)
            if items:
                rest_line = items.group(2)
            
            player = ""
            if rest_line is not None: 
                if rest_line.startswith("black_win"):
                    name_items = regex_black_win.search(rest_line)
                    player = name_items.group(1)
                    setup_player_data(player, "black_win")
                    
                elif rest_line.startswith("black_lose"):
                    name_items = regex_black_lose.search(rest_line)
                    player = name_items.group(1)
                    setup_player_data(player, "black_lose")
                    
                elif rest_line.startswith("black_draw"):
                    name_items = regex_black_draw.search(rest_line)
                    player = name_items.group(1)
                    setup_player_data(player, "black_draw")
                    
                elif rest_line.startswith("white_win"):
                    name_items = regex_white_win.search(rest_line)
                    player = name_items.group(1)
                    setup_player_data(player, "white_win")
                    
                elif rest_line.startswith("white_lose"):
                    name_items = regex_white_lose.search(rest_line)
                    player = name_items.group(1)
                    setup_player_data(player, "white_lose")
                    
                elif rest_line.startswith("white_draw"):
                    name_items = regex_white_draw.search(rest_line)
                    player = name_items.group(1)
                    setup_player_data(player, "white_draw")
                
                else:
                    pass  # Cannot get here

    for _, pdata in PLAYERS_DICT.iteritems():
        calc_win_rate(pdata)


def display_player_data(player):
    pdata = PLAYERS_DICT.get(player)
    if pdata is None:
        return 
        
    print(SEP_LINE)
    print("Player:          %s" % player)
    display_result(pdata)

def print_higher_player():
    global PLAYERS_DICT
    
    higher_players = list()
    for player, pdata in PLAYERS_DICT.iteritems():
        if pdata["games_total"] >= HIGHER_PLAYER_LIMITATION_GAMES and pdata["win_rate"] < 0.5:
            higher_players.append(player)
    
    higher_players = sorted(higher_players, lambda x1, x2: cmp(PLAYERS_DICT[x1]["win_rate"], PLAYERS_DICT[x2]["win_rate"]))
    for player in higher_players:
        display_player_data(player)


def main():
    global GLOBAL_DATA
    
    parser = argparse.ArgumentParser(prog="python %s" % sys.argv[0])
    
    parser.add_argument("-m", "--mode", dest='mode', required=False, 
                        choices=["win_rate", "higher_player"],
                        help="Specify the mode: 'win_rate', 'higher_player'")
                        
    parser.add_argument("-d", "--date", dest='play_date', required=False,
                        help="Specify the date when playing the games")

    args = parser.parse_args()
    
    MODE, PLAY_DATE = args.mode, args.play_date
    
    if MODE is None:
        MODE = "win_rate" 
    
    if MODE == "win_rate":
        setup_statistics()
        if PLAY_DATE:
            if RECORDS_DICT.get(PLAY_DATE):
                print(SEP_LINE)
                print("Play Date:       %s" % PLAY_DATE)
                display_result(RECORDS_DICT[PLAY_DATE])
        else:
            display_result(GLOBAL_DATA)
            
    elif MODE == "higher_player":
        setup_player_statistics()
        print_higher_player()
    
    
if __name__ == "__main__":
    main()

Python脚本的执行效果如下:

$ python gorate.py
--------------------------------------------------
Total Games:     96
Win Rate:        71.88%
Black Games:     54
Black Win:       46
Black Lose:      8
Black Draw:      0
Black Win Rate:  85.19%
White Games:     42
White Win:       23
White Lose:      19
White Draw:      0
White Win Rate:  54.76%

$ python gorate.py -d 20191224
--------------------------------------------------
Play Date:       20191224
--------------------------------------------------
Total Games:     26
Win Rate:        69.23%
Black Games:     17
Black Win:       13
Black Lose:      4
Black Draw:      0
Black Win Rate:  76.47%
White Games:     9
White Win:       5
White Lose:      4
White Draw:      0
White Win Rate:  55.56%

$ python gorate.py -m higher_player
--------------------------------------------------
Player:          棋手50151
--------------------------------------------------
Total Games:     3
Win Rate:        0.00%
Black Games:     1
Black Win:       0
Black Lose:      1
Black Draw:      0
Black Win Rate:  0.00%
White Games:     2
White Win:       0
White Lose:      2
White Draw:      0
White Win Rate:  0.00%
--------------------------------------------------
Player:          棋手71902
--------------------------------------------------
Total Games:     4
Win Rate:        25.00%
Black Games:     2
Black Win:       1
Black Lose:      1
Black Draw:      0
Black Win Rate:  50.00%
White Games:     2
White Win:       0
White Lose:      2
White Draw:      0
White Win Rate:  0.00%
--------------------------------------------------
Player:          棋手78237
--------------------------------------------------
Total Games:     3
Win Rate:        33.33%
Black Games:     3
Black Win:       1
Black Lose:      2
Black Draw:      0
Black Win Rate:  33.33%
White Games:     0
White Win:       0
White Lose:      0
White Draw:      0
White Win Rate:  0.00%
--------------------------------------------------
Player:          棋手43200
--------------------------------------------------
Total Games:     3
Win Rate:        33.33%
Black Games:     1
Black Win:       1
Black Lose:      0
Black Draw:      0
Black Win Rate:  100.00%
White Games:     2
White Win:       0
White Lose:      2
White Draw:      0
White Win Rate:  0.00%

shell实现前2条功能的代码是 59 行;
Python实现3条功能的代码是 267 行。

最后比较一下它们的运行效率:
在Cygwin里,当记录数达到96条时,对于第1条需求“计算总胜率、执黑胜率、执白胜率”,取 5 次计算平均:

  • Python脚本的平均运行时间为: 0.279 秒
  • Shell脚本的平均运行时间为: 1.249 秒

差不多Python的效率是shell的 4.48 倍啦。
而Python代码的前2条需求的有效行数在150行左右,而shell的行数仅仅60行不到。在开发效率上,shell又是Python的 2.5 倍左右了。

其实,无论是运行效率,还是开发效率,说多少倍,即使是在这个特定问题上,也没有太大的意义。但是,从中也可以看出比较明确的意义是: 有一些问题,用shell比Python具有更高的开发效率;而Python又比shell更能解决复杂的问题。
哦,对了,这个问题如果用C++做,效率会达到怎样呢(Boost支持正则)?这个问题就留给有兴趣的人吧。

(完)

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值