使用python编写数据检索脚本

日常工作中,时常会遇到因数据错误而引起的bug,这个问题,有时候会很头疼,费尽心力排查原因,最后发现非逻辑性问题,仅是由一个数据填错而引发的,真是觉得耗费时间,一次两次还能接受,但是对于一个数据量很大的工程来说,让你多次排查类似的错误,真是一件体力活。为了不再干这件苦差事,遂决定用python写一个数据检索的脚本,把错误提前暴露出来,解放了自己也提高了团队的工作效率。

环境
1. python3.6
2. win7 64位

python语法及函数使用请参照官方文档
python官方文档
python基础教程

定义windows控制台下的标准输出句柄和颜色宏
设置文本颜色,自定义格式化输出
脚本名称:data_review.py

import codecs
import ctypes
import sys

# 标准输出句柄
STD_INPUT_HANDLE    = -10
STD_OUTPUT_HANDLE   = -11
STD_ERROR_HANDLE    = -12

# 前景色
FOREGROUND_BLACK        = 0x0
FOREGROUND_BLUE         = 0x01
FOREGROUND_GREEN        = 0x02
FOREGROUND_RED          = 0x04
FOREGROUND_YELLOW       = 0x0e
FOREGROUND_INTENSITY    = 0x08

# 背景色
BACKGROUND_BLUE         = 0x10
BACKGROUND_GREEN        = 0x20
BACKGROUND_RED          = 0x40
BACKGROUND_YELLOW       = 0xe0
BACKGROUND_INTENSITY    = 0x80

# get handle
std_out_handle = ctypes.windll.kernel32.GetStdHandle(STD_OUTPUT_HANDLE)

def set_text_color(color, handle = std_out_handle):
    ctypes.windll.kernel32.SetConsoleTextAttribute(handle, color)

def reset_color():
    set_text_color(FOREGROUND_RED | FOREGROUND_GREEN | FOREGROUND_BLUE)

def print_red_msg(msg):
    set_text_color(FOREGROUND_RED | FOREGROUND_INTENSITY)
    print(msg)
    reset_color()

def print_green_msg(msg):
    set_text_color(FOREGROUND_GREEN | FOREGROUND_INTENSITY)
    print(msg)
    reset_color()

def print_blue_msg(msg):
    set_text_color(FOREGROUND_BLUE | FOREGROUND_INTENSITY)
    print(msg)
    reset_color()

def print_yellow_msg(msg):
    set_text_color(FOREGROUND_YELLOW | FOREGROUND_INTENSITY)
    print(msg)
    reset_color()

定义数据字典
以任务表为例

# 数据字典
Quest_Dict              = {}

定义数据检索函数
这里需要留意codecs.open的使用,参考官方文档说明

codecs.open(filename, mode='r', encoding=None, errors='strict', buffering=1)

Open an encoded file using the given mode and return an instance of StreamReaderWriter, providing transparent encoding/decoding. The default file mode is 'r', meaning to open the file in read mode.

Note: Underlying encoded files are always opened in binary mode. No automatic conversion of '\n' is done on reading and writing. The mode argument may be any binary mode acceptable to the built-in open() function; the 'b' is automatically added.

encoding specifies the encoding which is to be used for the file. Any encoding that encodes to and decodes from bytes is allowed, and the data types supported by the file methods depend on the codec used.

errors may be given to define the error handling. It defaults to 'strict' which causes a ValueError to be raised in case an encoding error occurs.

buffering has the same meaning as for the built-in open() function. It defaults to line buffered.

对于codecs.open所使用的文件编码格式参照官方文档 7.2.3. Standard Encodings 中所列出的。我们这里文件使用的带BOM的UTF8的编码格式。注意 utf_8 和 utf_8_sig 的区别。

7.2.7. encodings.utf_8_sig — UTF-8 codec with BOM signature

This module implements a variant of the UTF-8 codec: On encoding a UTF-8 encoded BOM will be prepended to the UTF-8 encoded bytes. For the stateful encoder this is only done once (on the first write to the byte stream). For decoding an optional UTF-8 encoded BOM at the start of the data will be skipped.

如何检索数据

def data_review_func():
    # start data review
    print_green_msg('start data review ... ...')

    # 检测结果
    bRet = True

    ############################################################## 
    # 建立Quest字典
    ############################################################## 
    try:
        fd = codecs.open('./data/Quest.csv','r','utf_8_sig')
    except Exception as err:
        print_red_msg(str(err))
        return

    try:
        for line in fd:
            szQuestID,szData = line.split(',',1)
            key_ID = int(szQuestID)
            if key_ID <= 0:
                print_red_msg('Quest.csv: invaild index for -1')
                continue
            if key_ID in Quest_Dict:
                print_yellow_msg('Quest.csv: {0} is repeat'.format(key_ID))
            else:
                Quest_Dict[key_ID] = szData
    except Exception as err:
        print_red_msg('Quest.csv: ' + str(err))
        bRet = False
    finally:
        fd.close()

# 在RandQuest.csv中检索任务ID的有效性
    try:
        fd = codecs.open('./data/RandQuest.csv','r','utf_8_sig')
    except Exception as err:
        print_red_msg(str(err))
        return

    try:
        for line in fd:
            szDataList = line.split(',',-1)
            for index in range(4,20,2):
                if int(szDataList[index]) <= 0:
                    continue
                if not int(szDataList[index]) in Quest_Dict:
                    print_red_msg('RandQuest.csv: Quest ID {0} is not found in Quest.csv'.format(szDataList[index]))
                    bRet = False
    except Exception as err:
        print_red_msg('RandQuest.csv: ' + str(err))
        bRet = False
    finally:
        fd.close()

    # check over
    if bRet:
        print_green_msg('everything is OK !')
    else:
        print_red_msg('there is something wrong !')

    return

相关语法和函数说明参照
python官方文档
python基础教程

在data_review.py末尾添加执行函数

# 检查数据表
data_review_func()

window环境 可以新建一个bat脚本用于执行python命令,bat脚本内容如下

python data_review.py 

pause

双击data_review.bat即开始执行data_review.py脚本

数据有误的测试效果
第一次检测结果

修改之后的测试效果
第二次检测结果

ps: 当输出的log很多时,终端无法一次性显示完全时,可以用管道符一页一页显示,具体做法:修改data_review.bat

python data_review.py | more

pause

输出结果按space键翻页 但log的文本颜色会被重置 (⊙o⊙)…

也可以将输出的log重定向到文件中,如:

python data_review.py >data_review_ret.txt

@echo off
echo.
echo 检查结果已保存在dat_review_ret.txt中
echo.

pause
  • 0
    点赞
  • 5
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值