bigchaindb源码分析（一）——命令行参数与配置文件解析

最新推荐文章于 2021-10-30 09:26:13 发布

lwyeluo

最新推荐文章于 2021-10-30 09:26:13 发布

阅读量2k

点赞数

分类专栏：分布式

本文链接：https://blog.csdn.net/lwyeluo/article/details/74157303

版权

分布式专栏收录该内容

13 篇文章 0 订阅

订阅专栏

bigchaindb版本：

BigchainDB (1.0.0rc1)
bigchaindb-driver (0.3.1)

命令行参数解析

使用whereis定位bigchaindb可执行文件为/usr/local/bin/bigchaindb，该文件调用了bigchaindb.commands.bigchaindb.main()函数。re.sub(r'(-script\.pyw?|\.exe)?$', '', sys.argv[0])相当于字符创的替换，将sys.argv[0]的结尾字符-script.pyw与.exe替换成空。

root@bigchain:~# whereis bigchaindb
bigchaindb: /usr/local/bin/bigchaindb
root@bigchain:~# cat /usr/local/bin/bigchaindb 
#!/usr/bin/python3

# -*- coding: utf-8 -*-
import re
import sys

from bigchaindb.commands.bigchaindb import main

if __name__ == '__main__':
    sys.argv[0] = re.sub(r'(-script\.pyw?|\.exe)?$', '', sys.argv[0])
    sys.exit(main())
root@bigchain:~#

main函数简单地执行了utils.start。其第一个参数为create_parser函数，该函数利用argparse模块定义了脚本能够解析的命令行参数，如configure\backend等等(称之为子命令)，并且将解析到的子命令复制给command变量。

def main():
    utils.start(create_parser(), sys.argv[1:], globals())

def create_parser():
    parser = argparse.ArgumentParser(
        description='Control your BigchainDB node.',
        parents=[utils.base_parser])

    # all the commands are contained in the subparsers object,
    # the command selected by the user will be stored in `args.command`
    # that is used by the `main` function to select which other
    # function to call.
    subparsers = parser.add_subparsers(title='Commands',
                                       dest='command')

    # parser for writing a config file
    config_parser = subparsers.add_parser('configure',
                                          help='Prepare the config file '
                                               'and create the node keypair')

    ...

此外，解析的参数除了create_parser函数之外，commands/utils/py同样给出了一些可以解析的命令行参数，包括用来读取配置文件的-c、日志输出级别的-l、对于提示默认设置为yes的-y以及用来查看版本信息的-v。

base_parser = argparse.ArgumentParser(add_help=False, prog='bigchaindb')

base_parser.add_argument('-c', '--config',
                         help='Specify the location of the configuration file '
                              '(use "-" for stdout)')

...

在定义了命令行解析器后，utils.start函数的第一步在于parse_args来进行解析，并确保用来表征configure\backend\start等的command变量存在，若command变量不存在，则命令行输入未带有子命令，从而弹出help。

def start(parser, argv, scope):

    args = parser.parse_args(argv)

    if not args.command:
        parser.print_help()
        raise SystemExit()

start函数的第三个参数为scope，调用时的形参为globals()。globals是一个python的内置函数，用来获取该模块的名字空间，包括函数、类、其它导入的模块、模块级的变量和常量，并以字典形式返回。start函数在解析完命令行参数后，接下来的是根据子命令找到对应的要调用的函数。其中函数的名字为子命令字符串中将’-‘替换为’_’，并在字符创前面加入run_。因此，当执行bigchaindb start时，args.command为start，而func则为run_start。若该模块中找不到func函数，则抛出NotImplementedError异常。

此后还根据命令行参数来设置multiprocess的值。之后调用func函数。

func = scope.get('run_' + args.command.replace('-', '_'))
if not func:
    raise NotImplementedError('Command `{}` not yet implemented'.
                              format(args.command))

...

return func(args)

执行子命令

以命令行bigchaindb start为例，utils.start函数将调用run_start()函数，该函数位于commands.bigchaindb中。该函数拥有两个装饰器（decorator）。这意味着在调用run_start(args)时，将会执行run_start=start_logging_process(configure_bigchaindb(run_start))，之后才调用真正的run_start(args)。装饰器的例子可以阅读博客（http://www.cnblogs.com/SeasonLee/articles/1719444.html），不过注意是先调用的configure_bigchaindb。

@configure_bigchaindb
@start_logging_process
def run_start(args):

我们先来阅读两个装饰器的代码，再来看run_start函数

配置bigchaindb

configure_bigchaindb位于commands.utils中。

def configure_bigchaindb(command):

    @functools.wraps(command)
    def configure(args):
        try:
            print(">>> enter configure")
            config_from_cmdline = {
                'log': {
                    'level_console': args.log_level,
                    'level_logfile': args.log_level,
                },
                'server': {'loglevel': args.log_level},
            }
        except AttributeError:
            config_from_cmdline = None
        bigchaindb.config_utils.autoconfigure(
            filename=args.config, config=config_from_cmdline, force=True)
        command(args)

    return configure

此时传入的command可以看成是带有装饰器start_logging_process的run_start函数，因此，configure函数的最后一句command(args)相当于执行了

@start_logging_process
def run_start(args):
    ...

run_start(args)

也就是说，会先执行start_logging_process，再执行真正的run_start。至于configure函数上的装饰器@functools.wraps(command)的目的在于确保原函数的一些属性不被装饰器函数所覆盖。如下面的例子，add函数的__name__已经被赋值为run。而使用functools.wraps能够确保原函数的属性不变。

>>> def test(func):
...     def run(x1, x2):
...         print("run>>")
...         return func(x1,x2)
...     return run
>>> @test
... def add(x1, x2):
...     print("x1+x2=%d" % (x1+x2))
... add(1,2)
run>>
x1+x2=3
>>> print(add.__name__)
run

再来看configure函数的具体内容，该函数调用了config_utils.autoconfigure，第一个参数为命令行中输入的配置文件的路径，第二个参数为一个说明日志输出级别的字典。

def autoconfigure(filename=None, config=None, force=False):

    # start with the current configuration
    newconfig = bigchaindb.config

    # update configuration from file
    try:
        newconfig = update(newconfig, file_config(filename=filename))
    except FileNotFoundError as e:
        if filename:
            raise
        else:
            logger.info('Cannot find config file `%s`.' % e.filename)

    # override configuration with env variables
    newconfig = env_config(newconfig)

    if config:
        newconfig = update(newconfig, config)

    set_config(newconfig)  # sets bigchaindb.config

该函数首先将newconfig设置为默认的配置（位于bigchaindb/__init.py中），然后调用update来将配置文件中的json更新到newconfig中。file_config的作用在于使用json.load将配置文件中的json加载进来。update函数如下。作用在于递归地遍历配置文件的json，将key value同步到newconfig。

def update(d, u):
    for k, v in u.items():
        if isinstance(v, collections.Mapping):
            r = update(d.get(k, {}), v)
            d[k] = r
        else:
            d[k] = u[k]
    return d

autoconfigure之后再依次利用现有的环境变量、利用形参传入的说明日志级别字典来更新newconfig，最后将newconfig设置为当前bigchaindb实例所使用的配置。

我们先来看env_config函数，该函数将一直调用到env_config->map_leafs->_inner，_inner拥有的两个变量分别为func指向函数load_from_env、mapping指向newconfig。_inner的作用方式如上面的update一样，递归遍历newconfig的值，对每个key调用load_from_env来进行重新赋值，调用时第一个参数为newconfig中某个key的value，第二个参数为一个表示路径的path。

若newconfig中有一项{'database': {'host': 'localhost'}}，那么load_from_env的形参为localhost, ['database', 'host']。而该函数的函数体则是根据path拼凑出环境变量的名字，再调用os.environ.get来取该环境变量来更新newconfig，若环境变量不存在，则依旧使用原来的value。

CONFIG_PREFIX = 'BIGCHAINDB'
CONFIG_SEP = '_'

def env_config(config):

    def load_from_env(value, path):
        var_name = CONFIG_SEP.join([CONFIG_PREFIX] + list(map(lambda s: s.upper(), path)))
        return os.environ.get(var_name, value)

    return map_leafs(load_from_env, config)

def map_leafs(func, mapping):

    def _inner(mapping, path=None):
        if path is None:
            path = []

        for key, val in mapping.items():
            if isinstance(val, collections.Mapping):
                _inner(val, path + [key])
            else:
                mapping[key] = func(val, path=path+[key])

        return mapping

    return _inner(copy.deepcopy(mapping))

具体来看如何根据path来获取环境变量，即语句

var_name = CONFIG_SEP.join([CONFIG_PREFIX] + list(map(lambda s: s.upper(), path)))

lambda相当于是一个简单地函数，lambda s: s.upper()的含义为对于输入的字符串s，返回s大写之后的字符串。而map(func, seq)则是对序列seq的每一项用func进行计算，故[CONFIG_PREFIX] + list(map(lambda s: s.upper(), path))返回将path中每个元素变为大写后的序列，并在该序列最前面插入一个元素CONFIG_PREFIX。join函数则将序列转化为字符串，并且两个相邻元素之间用CONFIG_SEP相连。因此，当load_from_env的形参为localhost, ['database', 'host']时，对应的环境变量为BIGCHAINDB_DATABASE_HOST。

至此，autoconfigure已经获取到了更新之后的newconfig，最后一句set_config(newconfig)将newconfig设置为目前的配置。其中利用到了函数update_types，来利用map_leafs来遍历newconfig，从而根据bigchaindb.__init__.py中定义的config来更新newconfig的类型。配置完成！最终的配置存储在变量bigchaindb.config中。

def set_config(config):
    # Deep copy the default config into bigchaindb.config
    bigchaindb.config = copy.deepcopy(bigchaindb._config)
    # Update the default config with whatever is in the passed config
    update(bigchaindb.config, update_types(config, bigchaindb.config))
    bigchaindb.config['CONFIGURED'] = True

启动日志

在配置完成后将调用start_logging_process。该函数在调用setup_logging后将调用真正的run_start。在启动日志时，bigchaindb使用publisher\subscriber的结构。

def start_logging_process(command):

    @functools.wraps(command)
    def start_logging(args):
        from bigchaindb import config
        setup_logging(user_log_config=config.get('log'))
        command(args)
    return start_logging

def setup_logging(*, user_log_config=None):
    setup_pub_logger()
    setup_sub_logger(user_log_config=user_log_config)

setup_pub_logger启动publisher，并打开DEFAULT_SOCKET_LOGGING_PORT端口来创建一个socket handler。

def setup_pub_logger():
    dictConfig(PUBLISHER_LOGGING_CONFIG)
    socket_handler = logging.handlers.SocketHandler(
        DEFAULT_SOCKET_LOGGING_HOST, DEFAULT_SOCKET_LOGGING_PORT)
    socket_handler.setLevel(logging.DEBUG)
    logger = logging.getLogger()
    logger.addHandler(socket_handler)

setup_sub_logger使用配置文件中key为log下的配置参数接收端口DEFAULT_TCP_LOGGING_PORT的信息。这也意味着如果在同一节点上要启动两个bigchaindb实例将会打开两次端口DEFAULT_TCP_LOGGING_PORT，会出现地址已经在使用的错。

def setup_sub_logger(*, user_log_config=None):
    server = LogRecordSocketServer()
    with server:
        server_proc = Process(
            target=server.serve_forever,
            kwargs={'log_config': user_log_config},
        )
        server_proc.start()

class LogRecordSocketServer(ThreadingTCPServer):

    allow_reuse_address = True

    def __init__(self,
                 host='localhost',
                 port=logging.handlers.DEFAULT_TCP_LOGGING_PORT,
                 handler=LogRecordStreamHandler):
        super().__init__((host, port), handler)

    def serve_forever(self, *, poll_interval=0.5, log_config=None):
        sub_logging_config = create_subscriber_logging_config(
            user_log_config=log_config)
        dictConfig(sub_logging_config)
        try:
            super().serve_forever(poll_interval=poll_interval)
        except KeyboardInterrupt:
            pass

`run_start`

终于到了真正的run_start。忽略掉生成密钥等操作，该函数其实只调用了_run_init()与process.start()。前者会对后端存储的数据库进行一些初始化操作，包括创建数据库、创建表，以及创建创世区块。

@configure_bigchaindb
@start_logging_process
def run_start(args):
    ...

    try:
        _run_init()
    except DatabaseAlreadyExists:
        pass
    except KeypairNotFoundException:
        sys.exit(CANNOT_START_KEYPAIR_NOT_FOUND)

    ...

    processes.start()

def _run_init():
    # Try to access the keypair, throws an exception if it does not exist
    b = bigchaindb.Bigchain()

    schema.init_database(connection=b.connection)

    b.create_genesis_block()
    logger.info('Genesis block created.')

process.start则依次启动block、vote、stale、election等进程。

def start():

    events_queue = setup_events_queue()

    # start the processes
    logger.info('Starting block')
    block.start()

    logger.info('Starting voter')
    vote.start()

    logger.info('Starting stale transaction monitor')
    stale.start()

    logger.info('Starting election')
    election.start(events_queue=events_queue)

    # start the web api
    app_server = server.create_server(bigchaindb.config['server'])
    p_webapi = mp.Process(name='webapi', target=app_server.run)
    p_webapi.start()

    logger.info('WebSocket server started')
    p_websocket_server = mp.Process(name='ws',
                                    target=websocket_server.start,
                                    args=(events_queue,))
    p_websocket_server.start()

    # start message
    logger.info(BANNER.format(bigchaindb.config['server']['bind']))

关于数据库以及这些进程的逻辑，下一篇再进行源码跟踪。。

lwyeluo

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
bigchaindb源码分析（一）——命令行参数与配置文件解析

bigchaindb版本：BigchainDB (1.0.0rc1)bigchaindb-driver (0.3.1)命令行参数解析使用whereis定位bigchaindb可执行文件为/usr/local/bin/bigchaindb，该文件调用了bigchaindb.commands.bigchaindb.main()函数。re.sub(r'(-script\.pyw?|\.exe)?$',
复制链接

扫一扫