Swift源码分析----swift-account-replicator(1)

感谢朋友支持本博客,欢迎共同探讨交流,由于能力和时间有限,错误之处在所难免,欢迎指正!

如果转载,请保留作者信息。
博客地址:http://blog.csdn.net/gaoxingnengjisuan
邮箱地址:dong.liu@siat.ac.cn

PS:最近没有登录博客,很多朋友的留言没有看见,这里道歉!还有就是本人较少上QQ,可以邮件交流。


概述部分:

实现复制指定分区(账户)数据到指定节点(用以实现数据副本之间的同步);
这里定义的once=True,说明系统默认调用守护进程类Daemon中的run_once方法;
从而最终实现调用Replicator类中的run_once方法;
注:账户之间同步数据主要就是对形如object_file = /srv/node/node['device']/accounts/partition/suffix/hsh****.db的数据库文件执行复制操作;


源码解析部分:

下面是这部分代码的主要执行流程,代码中较重要的部分已经进行了相关的注释;

from swift.account.replicator import AccountReplicator
from swift.common.utils import parse_options
from swift.common.daemon import run_daemon

if __name__ == '__main__':
    conf_file, options = parse_options(once=True)
    run_daemon(AccountReplicator, conf_file, **options)

class AccountReplicator(db_replicator.Replicator):
    server_type = 'account'
    brokerclass = AccountBroker
    datadir = DATADIR
    default_port = 6002

class Replicator(Daemon)----def run_once(self, *args, **kwargs):
     """
     实现复制指定分区数据到指定节点(用以实现数据副本之间的同步);
     数据类型可能是account或container或object;
     """
        
    # 初始化若干参数的操作;
    # self.stats = {'attempted': 0, 'success': 0, 'failure': 0, 'ts_repl': 0,
    #              'no_change': 0, 'hashmatch': 0, 'rsync': 0, 'diff': 0,
    #              'remove': 0, 'empty': 0, 'remote_merge': 0,
    #              'start': time.time(), 'diff_capped': 0}
    self._zero_stats()
    dirs = []
    ips = whataremyips()
    if not ips:
        self.logger.error(_('ERROR Failed to get my own IPs?'))
        return
        
    # 获取环上的设备信息;
    for node in self.ring.devs:
        if (node and node['replication_ip'] in ips and node['replication_port'] == self.port):
            if self.mount_check and not ismount(os.path.join(self.root, node['device'])):
                self.logger.warn(_('Skipping %(device)s as it is not mounted') % node)
                    continue
                
            # 删除若干过期文件;
            unlink_older_than(
                os.path.join(self.root, node['device'], 'tmp'),
                time.time() - self.reclaim_age)
                
            datadir = os.path.join(self.root, node['device'], self.datadir)
            if os.path.isdir(datadir):
                dirs.append((datadir, node['id']))
        
    self.logger.info(_('Beginning replication run'))
    for part, object_file, node_id in roundrobin_datadirs(dirs):
            
         # _replicate_object:复制指定分区数据到指定节点(用以实现数据副本之间的同步),具体步骤如下;
         #     获取指定分区所在的所有节点nodes(一个分区可能对应多个节点,因为可能有多个副本);
         #     判断node_id是否在nodes的范围之内(这是合理的);
         #     循环实现数据到各个目标节点上(的分区)的复制操作;
         #     通过比较同步点和哈希值来判断复制后的两个版本是否是同步的,即复制操作是否成功;
        self.cpool.spawn_n(self._replicate_object, part, object_file, node_id)
    self.cpool.waitall()
    self.logger.info(_('Replication run OVER'))
    self._report_stats()
1.for node in self.ring.devs:从环上获取所有设备,遍历并执行以下操作:
通过IP地址判断并获取属于本机的且已经挂载的设备,并存储设备对应的datadir = /srv/node/node['device']/accounts和node['id']作为元素储存在字典dirs中;
注:这里实际上就是获取属于本机的设备,且明确文件路径/srv/node/node['device']/accounts(对应于账户);
2.循环遍历node['device']/accounts下面的每一个文件object_file(文件路径形如object_file = /srv/node/node['device']/accounts/partition/suffix/hsh****.db,为账户中具体分区下的以.db为后缀的文件),调用方法_replicate_object实现复制本地指定分区数据到指定节点(用以实现数据副本之间的同步);


转到2,来看方法_replicate_object的实现:

def _replicate_object(self, partition, object_file, node_id):
    """
    复制指定分区数据到指定节点(用以实现数据副本之间的同步),具体步骤如下;
    获取指定分区所在的所有节点nodes(一个分区可能对应多个节点,因为可能有多个副本);
    判断node_id是否在nodes的范围之内(这是合理的);
    循环实现数据到各个目标节点上(的分区)的复制操作;
    通过比较同步点和哈希值来判断复制后的两个版本是否是同步的,即复制操作是否成功;

    object_file = /srv/node/node['device']/accounts/partition/suffix/hsh****.db
    """
    start_time = now = time.time()
    self.logger.debug(_('Replicating db %s'), object_file)
    self.stats['attempted'] += 1
    self.logger.increment('attempts')
    shouldbehere = True
        
    try:
        broker = self.brokerclass(object_file, pending_timeout=30)
        broker.reclaim(now - self.reclaim_age, now - (self.reclaim_age * 2))
        # 获取关于数据库复制需求的信息;
        info = broker.get_replication_info()
        full_info = broker.get_info()
        bpart = self.ring.get_part(full_info['account'], full_info.get('container'))
        if bpart != int(partition):
            partition = bpart
            # Important to set this false here since the later check only
            # checks if it's on the proper device, not partition.
            shouldbehere = False
            name = '/' + quote(full_info['account'])
            if 'container' in full_info:
                name += '/' + quote(full_info['container'])
            self.logger.error(
                'Found %s for %s when it should be on partition %s; will '
                'replicate out and remove.' % (object_file, name, bpart))
    except (Exception, Timeout) as e:
        if 'no such table' in str(e):
            self.logger.error(_('Quarantining DB %s'), object_file)
            quarantine_db(broker.db_file, broker.db_type)
        else:
            self.logger.exception(_('ERROR reading db %s'), object_file)
        self.stats['failure'] += 1
        self.logger.increment('failures')
        return

    # The db is considered deleted if the delete_timestamp value is greater
    # than the put_timestamp, and there are no objects.
    delete_timestamp = 0
    try:
        delete_timestamp = float(info['delete_timestamp'])
    except ValueError:
        pass
    put_timestamp = 0
    try:
        put_timestamp = float(info['put_timestamp'])
    except ValueError:
        pass
    if delete_timestamp < (now - self.reclaim_age) and delete_timestamp > put_timestamp and info['count'] in (None, '', 0, '0'):
        if self.report_up_to_date(full_info):
            self.delete_db(object_file)
        self.logger.timing_since('timing', start_time)
        return
    responses = []
        
    # 获取指定分区所在的所有节点(一个分区可能对应多个节点,因为可能有多个副本);
    nodes = self.ring.get_part_nodes(int(partition))
    if shouldbehere:
        shouldbehere = bool([n for n in nodes if n['id'] == node_id])
    # See Footnote [1] for an explanation of the repl_nodes assignment.
    i = 0
    while i < len(nodes) and nodes[i]['id'] != node_id:
        i += 1
    repl_nodes = nodes[i + 1:] + nodes[:i]
    more_nodes = self.ring.get_more_nodes(int(partition))
        
    # 实现数据到各个目标节点上(的分区)的复制操作;
    for node in repl_nodes:
        success = False
            
        # _repl_to_node:复制数据库文件到指定node;
        #     建立到目标分区的连接;
        #     实现一个HTTP REPLICATE复制请求;
        #     获取请求操作的响应信息;
        #     通过比较同步点和哈希值来判断复制后的两个副本是否是同步的,即复制操作是否成功;
        #     如果复制成功则直接返回True;
        try:
            success = self._repl_to_node(node, broker, partition, info)
        except DriveNotMounted:
            repl_nodes.append(more_nodes.next())
            self.logger.error(_('ERROR Remote drive not mounted %s'), node)
        except (Exception, Timeout):
            self.logger.exception(_('ERROR syncing %(file)s with node %(node)s'),
                                   {'file': object_file, 'node': node})
        self.stats['success' if success else 'failure'] += 1
        self.logger.increment('successes' if success else 'failures')
        responses.append(success)
    if not shouldbehere and all(responses):
        # If the db shouldn't be on this node and has been successfully
        # synced to all of its peers, it can be removed.
        self.delete_db(object_file)
    self.logger.timing_since('timing', start_time)
2.1.获取指定文件object_file = /srv/node/node['device']/accounts/partition/suffix/hsh****.db所处在的分区;
2.2.获取上面的分区所在的所有节点nodes(一个分区可能对应多个节点,因为可能有多个副本);
2.3.循环遍历所有副本节点(除去本节点),调用方法_repl_to_node实现复制本地数据到副本节点上;
2.4.当针对所有副本节点的数据同步操作都完成之后,并且判断数据不需要再存储到本地,则执行操作实现删除本地的数据文件;


下一篇博客将继续swift-account-replicator的分析工作。

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值