python 64式: 第27式、分布式锁与群组管理__2、tooz应用之负载均衡

本文主要介绍了在Python中如何使用Tooz库进行分布式锁和群组管理,特别是如何应用于负载均衡。文章详细分析了Ceilometer组件中的协调组配置和源码,阐述了协调组如何通过一致性哈希实现消息处理的负载均衡,确保数据均匀分发到多个服务。此外,还讨论了Ceilometer的配置、源码分析及负载均衡的具体实现过程。
摘要由CSDN通过智能技术生成

python中分布式锁与群组管理系列
最近有接触到分布式锁的相关问题。
基于openstack相关组件源码, tooz官网文档和自己对组件使用的一点点心得,
想整理一下这部分的内容。

主要想分为四个部分介绍:
分布式锁与群组管理 1、 tooz介绍
分布式锁与群组管理 2、 tooz应用之负载均衡
分布式锁与群组管理 3、 tooz应用之分布式锁
分布式锁与群组管理 4、 tooz源码分析
下面是第2部分的内容

1 引言
ceilometer组件源码(newton版本)中至少有compute服务和notification服务可以配置使用一种被
称之为coordination的东西。
coordination翻译过来可以叫做协调组,实际的主要作用就是负载均衡。
根据ceilometer的源码,这里的负载均衡应该是消息处理可以均匀分发到多个服务。
和基于round-bin轮转的haproxy原理不太一样。这里的实现是通过实现了一致性哈希,
根据传递过来的消息中的某个属性计算哈希值,然后根据事先初始化好的oslo_messaging.notifier
列表notifiers的长度,用哈希值 对 notifiers的长度取模得到下标index。
获取发送该消息的实际notifier即为notifiers[index],而不同的服务则监听不同的topics(根据一致性哈希),即不同的队列,

2 ceilomter中关于协调组的配置
查看ceilometer.conf,可以进行如下协调组的配置,具体配置coordination的backend_url
实际就是tooz库的driver。官方支持redis,memcached等。

[compute]
workload_partitioning = true
[coordination]
backend_url = redis://redis.openstack.svc.cluster.local:6379/
[notification]
messaging_urls = rabbit://rabbitmq:vut8mvvS@rabbitmq.openstack.svc.cluster.local:5672/
workload_partitioning = true

请根据实际情况配置不同的backend_url。


3 ceilometer-compute服务中关于协调组的源码分析
总入口: ceilometer/agent/manager.py中的__init__方法
3.1 __init__方法具体内容如下:
class AgentManager(service_base.PipelineBasedService):

    def __init__(self, namespaces=None, pollster_list=None, worker_id=0):
        namespaces = namespaces or ['compute', 'central']
        pollster_list = pollster_list or []
        group_prefix = cfg.CONF.polling.partitioning_group_prefix
        self._inspector = virt_inspector.get_hypervisor_inspector()
        self.nv = nova_client.Client()
        self.rpc_server = None

        # features of using coordination and pollster-list are exclusive, and
        # cannot be used at one moment to avoid both samples duplication and
        # samples being lost
        if pollster_list and cfg.CONF.coordination.backend_url:
            raise PollsterListForbidden()

        super(AgentManager, self).__init__(worker_id)

        def _match(pollster):
            """Find out if pollster name matches to one of the list."""
            return any(fnmatch.fnmatch(pollster.name, pattern) for
                       pattern in pollster_list)

        if type(namespaces) is not list:
            namespaces = [namespaces]

        # we'll have default ['compute', 'central'] here if no namespaces will
        # be passed
        extensions = (self._extensions('poll', namespace).extensions
                      for namespace in namespaces)
        # get the extensions from pollster builder
        extensions_fb = (self._extensions_from_builder('poll', namespace)
                         for namespace in namespaces)
        if pollster_list:
            extensions = (moves.filter(_match, exts)
                          for exts in extensions)
            extensions_fb = (moves.filter(_match, exts)
                             for exts in extensions_fb)

        self.extensions = list(itertools.chain(*list(extensions))) + list(
            itertools.chain(*list(extensions_fb)))

        if self.extensions == []:
            raise EmptyPollstersList()

        discoveries = (self._extensions('discover', namespace).extensions
                       for namespace in namespaces)
        self.discoveries = list(itertools.chain(*list(discoveries)))
        self.polling_periodics = None

        self.partition_coordinator = coordination.PartitionCoordinator()
        self.heartbeat_timer = utils.create_periodic(
            target=self.partition_coordinator.heartbeat,
            spacing=cfg.CONF.coordination.heartbeat,
            run_immediately=True)

        # Compose coordination group prefix.
        # We'll use namespaces as the basement for this partitioning.
        namespace_prefix = '-'.join(sorted(namespaces))
        self.group_prefix = ('%s-%s' % (namespace_prefix, group_prefix)
                             if group_prefix else namespace_prefix)

        self.notifier = oslo_messaging.Notifier(
            messaging.get_transport(),
            driver=cfg.CONF.publisher_notifier.telemetry_driver,
            publisher_id="ceilometer.polling")

        self._keystone = None
        self._keystone_last_exception = None

分析:
3.1.1) 上述通过self.partition_coordinator = coordination.PartitionCoordinator()
来初始化一个协调组。

3.1.2) 
     self.heartbeat_timer = utils.create_periodic(
            target=self.partition_coordinator.heartbeat,
            spacing=cfg.CONF.coordination.heartbeat,
            run_immediately=True)
这个是定时调用coordination的heartbeat,用于判断服务是否存活

3.2) 接下来进入AgentManager类的run方法
内容如下:
    def run(self):
        """Start RPC server and handle realtime query."""
        super(AgentManager, self).run()
        self.polling_manager = pipeline.setup_polling()
        self.join_partitioning_groups()
        self.start_polling_tasks()
        self.init_pipeline_refresh()

分析:
3.2.1)
self.polling_manager = pipeline.setup_polling()
这里调用了ceilometer/pipeline.py中的
def setup_polling():
    """Setup polling manager according to yaml config file."""
    cfg_file = cfg.CONF.pipeline_cfg_file
    return PollingManager(cfg_file)

3.2.2)
self.join_partitioning_groups()
具体代码如下:
    def join_partitioning_groups(self):
        self.groups = set([self.construct_group_id(d.obj.group_id)
                          for d in self.discoveries])
        # let each set of statically-defined resources have its own group
        static_resource_groups = set(
            [self.construct_group_id(utils.hash_of_set(p.resources))
             for p in self.polling_manager.sources
             if p.resources
             ])
        self.groups.update(static_resource_groups)

        if not self.groups and self.partition_coordinator.is_active():
            self.partition_coordinator.stop()
            self.heartbeat_timer.stop()

        if self.groups and not self.partition_coordinator.is_active():
            self.partition_coordinator.start()
            utils.spawn_thread(self.heartbeat_timer.start)

        for group in self.groups:
            self.partition_coordinator.join_group(group)

分析:
3.2.2.1) self.discoveries来自于
        discoveries = (self._extensions('discover', namespace).extensions
                       for namespace in namespaces)
        self.discoveries = list(itertools.chain(*list(discoveries)))
其中:
       namespaces = namespaces or ['compute', 'central']
查看ceilometer/setup.cfg中有如下内容:
ceilometer.discover.compute =
    local_instances = ceilometer.compute.discovery:InstanceDiscovery

ceilometer.discover.central =
    endpoint = ceilometer.agent.discovery.endpoint:EndpointDiscovery
    tenant = ceilometer.agent.discovery.tenant:TenantDiscovery
    lb_pools = ceilometer.network.services.discovery:LBPoolsDiscovery
    lb_vips = ceilometer.network.services.discovery:LBVipsDiscovery
    lb_members = ceilometer.network.services.discovery:LBMembersDiscovery
    lb_listeners = ceilometer.network.services.discovery:LBListenersDiscovery
    lb_loadbalancers = ceilometer.network.services.discovery:LBLoadBalancersDiscovery
    lb_health_probes = ceilometer.network.services.discovery:LBHealthMonitorsDiscovery
    vpn_services    = ceilometer.network.services.discovery:VPNServicesDiscovery
    ipsec_connections  = ceilometer.network.services.discovery:IPSecConnectionsDiscovery
    fw_services = ceilometer.network.services.discovery:FirewallDiscovery
    fw_policy = ceilometer.network.services.discovery:FirewallPolicyDiscovery
    tripleo_overcloud_nodes = ceilometer.hardware.discovery:NodesDiscoveryTripleO
    fip_services = ceilometer.network.services.discovery:FloatingIPDiscovery
    images = ceilometer.image.discovery:ImagesDiscovery


3.2.3) 分析self.start_polling_tasks方法
代码如下:
    def start_polling_tasks(self):
        # allow time for coordination if necessary
        delay_start = self.partition_coordinator.is_active()

        # set shuffle time before polling task if necessary
        delay_polling_time = random.randint(
            0, cfg.CONF.shuffle_time_before_polling_task)

        data = self.setup_polling_tasks()

        # One thread per polling tasks is enough
        self.polling_periodics = periodics.PeriodicWorker.create(
            [], executor_factory=lambda:
            futures.ThreadPoolExecutor(max_workers=len(data)))

        for interval, polling_task in data.items():
            delay_time = (interval + delay_polling_time if delay_start
                          else delay_polling_time)

            @periodics.periodic(spacing=interval, run_immediately=False)
            def task(running_task):
                self.interval_task(running_task)

            utils.spawn_thread(utils.delayed, delay_time,
                               self.polling_periodics.add, task, polling_task)

        if data:
            # Don't start useless threads if no task will run
            utils.spawn_thread(self.polling_periodics.start, allow_empty=True)
分析:
3.2.3.1) 在self.setup_polling_tasks中建立了: <采样间隔,轮询任务列表>的字典。
然后利用定时器每隔一定时间执行轮询任务。
里面调用了interval_task方法,该方法内容如下:
    def interval_task(self, task):
        # NOTE(sileht): remove the previous keystone client
        # and exception to get a new one in this polling cycle.
        self._keystone = None
        self._keystone_last_exception = None

        task.poll_and_notify()

3.2.3.2)
调用了task.poll_and_notify方法,该方法具体内容如下:
    def poll_and_notify(self):
        """Polling sample and notify."""
        cache = {}
        discovery_cache = {}
        poll_history = {}
        for source_name in self.pollster_matches:
            for pollster in self.pollster_matches[source_name]:
                key = Resources.key(source_name, pollster)
                candidate_res = list(
                    self.resources[key].get(discovery_cache))
                if not candidate_res and pollster.obj.default_discovery:
                    candidate_res = self.manager.discover(
                        [pollster.obj.default_discovery], discovery_cache)

                # Remove duplicated resources and black resources. Using
                # set() requires well defined __hash__ for each resource.
                # Since __eq__ is defined, 'not in' is safe here.
                polling_resources = []
                black_res = self.resources[key].blacklist
                history = poll_history.get(pollster.name, [])
                for x in candidate_res:
                    if x not in history:
                        history.append(x)
                        if x not in black_res:
                            polling_resources.append(x)
                poll_history[pollster.name] = history

                # If no resources, skip for this pollster
                if not polling_resources:
                    p_context = 'new ' if history else ''
                    LOG.info(_("Skip pollster %(name)s, no %(p_context)s"
                               "resources found this cycle"),
                             {'name': pollster.name, 'p_context': p_context})
                    continue

                LOG.info(_("Polling pollster %(poll)s in the context of "
                           "%(src)s"),
                         dict(poll=pollster.name, src=source_name))
                try:
                    polling_timestamp = timeutils.utcnow().isoformat()
                    samples = pollster.obj.get_samples(
                        manager=self.manager,
                        cache=cache,
                        resources=polling_resources
                    )
                    sample_batch = []

                    # filter None in samples
                    samples = [s for s in samples if s is not None]
                    # TODO(chao.ma), debug it
                    if samples:
                        metric = pollster.name

                    for sample in samples:
                        # Note(yuywz): Unify the timestamp of polled samples
                        sample.set_timestamp(polling_timestamp)
                        sample_dict = (
                            publisher_utils.meter_message_from_counter(
                                sample, self._telemetry_secret
                            ))
                        if self._batch:
                            sample_batch.append(sample_dict)
                        else:
                            self._send_notification([sample_dict])

                    if sample_batch:
                        self._send_notification(sample_batch)

                except plugin_base.PollsterPermanentError as err:
                    LOG.error(_(
                        'Prevent pollster %(name)s for '
                        'polling source %(source)s anymore!')
                        % ({'name': pollster.name, 'source': source_name}))
                    self.resources[key].blacklist.extend(err.fail_res_list)
                except Exception as err:
                    LOG.warning(_(
                        'Continue after error from %(name)s: %(error)s')
                        % ({'name': pollster.name, 'error': err}),
                        exc_info=True)

分析:
3.2.3.2.1)
                    candidate_res = self.manager.discover(
                        [pollster.obj.default_discovery], discovery_cache)
这个调用了discover方法

3.2.3.2.2) discover方法如下

    def discover(self, discovery=None, discovery_cache=None):
        resources = []
        discovery = discovery or []
        for url in discovery:
            if discovery_cache is not None and url in discovery_cache:
                resources.extend(discovery_cache[url])
                continue
            name, param = self._parse_discoverer(url)
            discoverer = self._discoverer(name)
            if discoverer:
                try:
                    if discoverer.KEYSTONE_REQUIRED_FOR_SERVICE:
                        service_type = getattr(
                            cfg.CONF.service_types,
                            discoverer.KEYSTONE_REQUIRED_FOR_SERVICE)
                        if not keystone_client.get_service_catalog(
                                self.keystone).get_endpoints(
                                    service_type=service_type):
                            LOG.warning(_LW('Skipping %(name)s, '
                                            '%(service_type)s service '
                                            'is not registered in keystone'),
                                        {'name': name,
                                         'service_type': service_type})
                            continue

                    discovered = discoverer.discover(self, param)
                    partitioned = self.partition_coordinator.extract_my_subset(
                        self.construct_group_id(discoverer.group_id),
                        discovered)
                    resources.extend(partitioned)
                    if discovery_cache is not None:
                        discovery_cache[url] = partitioned
                except ka_exceptions.ClientException as e:
                    LOG.error(_LE('Skipping %(name)s, keystone issue: '
                                  '%(exc)s'), {'name': name, 'exc': e})
                except Exception as err:
                    LOG.exception(_LE('Unable to discover resources: %s'), err)
            else:
                LOG.warning(_LW('Unknown discovery extension: %s'), name)
        return resources

分析:
1) 入参如下
self.manager.discover(
                        [pollster.obj.default_discovery], discovery_cache)
其中discovery参数是[pollster.obj.default_discovery]
里面最关键的是:
                    discovered = discoverer.discover(self, param)
                    partitioned = self.partition_coordinator.extract_my_subset(
                        self.construct_group_id(discoverer.group_id),
                        discovered)
                    resources.extend(partitioned)
分析:不知道里面是什么内容,需要打印,

2) 但是不管怎样,都是获取所有的监控数据发送到消息队列,都调用了
ceilometer/agent/manager.py中的PollingTask类的如下方法发送监控数据。

    def _send_notification(self, samples):
        self.manager.notifier.sample(
            {},
            'telemetry.polling',
            {'samples': samples}
        )

那么按照道理应该还是发送给
notifications.sample这个队列


3.2.4) 分析self.init_pipeline_refresh
代码在: ceilometer/service_base.py中的PipelineBasedService(cotyledon.Service)类的

    def init_pipeline_refresh(self):
        """Initializes pipeline refresh state."""
        self.clear_pipeline_validation_status()

        if (cfg.CONF.refresh_pipeline_cfg or
                cfg.CONF.refresh_event_pipeline_cfg):
            self.refresh_pipeline_periodic = utils.create_periodic(
                target=self.refresh_pipeline,
                spacing=cfg.CONF.pipeline_polling_interval)
            utils.spawn_thread(self.refresh_pipeline_periodic.start)
分析:
该方法应该不会执行,因为默认配置参数不会刷新。

3.3) 查看某个discover
查看: ceilometer/compute/discovery.py中InstanceDiscovery类
内容如下: 
class InstanceDiscovery(plugin_base.DiscoveryBase):
    def __init__(self):
        super(InstanceDiscovery, self).__init__()
        self.nova_cli = nova_client.Client()
        self.last_run = None
        self.instances = {}
        self.expiration_time = cfg.CONF.compute.resource_update_interval
        self.cache_expiry = cfg.CONF.compute.resource_cache_expiry
        self.last_cache_expire = None

    def discover(self, manager, param=None):
        """Discover resources to monitor."""
        secs_from_last_update = 0
        utc_now = timeutils.utcnow(True)
        secs_from_last_expire = 0
        if self.last_run:
            secs_from_last_update = timeutils.delta_seconds(
                self.last_run, utc_now)
        if self.last_cache_expire:
            secs_from_last_expire = timeutils.delta_seconds(
                self.last_cache_expire, utc_now)

        instances = []
        # NOTE(ityaptin) we update make a nova request only if
        # it's a first discovery or resources expired
        if not self.last_run or secs_from_last_update >= self.expiration_time:
            try:
                if secs_from_last_expire < self.cache_expiry and self.last_run:
                    # since = self.last_run.isoformat()
                    pass
                else:
                    # since = None
                    self.instances.clear()
                    self.last_cache_expire = utc_now

                # since = self.last_run.isoformat() if self.last_run else None
                # FIXME(ccz): Remove parameter last_run from nova_list query.
                # Using changes-since cannot list those instances which just
                # changes volume attachment and that will affect the discovery
                # of volumes under telemetry.
                # Original Code:
                # instances = self.nova_cli.instance_get_all_by_host(
                #     cfg.CONF.host, since)
                instances = self.nova_cli.instance_get_all_by_host(
                    cfg.CONF.host)
                self.last_run = utc_now
            except Exception:
                # NOTE(zqfan): instance_get_all_by_host is wrapped and will log
                # exception when there is any error. It is no need to raise it
                # again and print one more time.
                return []

        for instance in instances:
            if getattr(instance, 'OS-EXT-STS:vm_state', None) in ['deleted',
                                                                  'error']:
                self.instances.pop(instance.id, None)
            else:
                self.instances[instance.id] = instance

        return self.instances.values()

    @property
    def group_id(self):
        if cfg.CONF.compute.workload_partitioning:
            return cfg.CONF.host
        else:
            return None

分析:
可以看到group_id

总结:
对于ceilomete-compute服务而言:
获取组的所有成员,因为组名来自于:
ceilometer/compute/discovery.py中InstanceDiscovery类的group_id方法,
而这个方法是返回当前计算节点的名称,例如: compute-node-2.domain.tld
也就是说各个计算节点有不同的组,因此无论怎样处理,这个ceilometer-compute服务
需要处理的肯定是在这个计算节点上的所有虚机。
也就是说这里实际没有实现ceilometer-compute服务的负载均衡。

@property
def group_id(self):
    if cfg.CONF.compute.workload_partitioning:
        return cfg.CONF.host
    else:
        return None

参考:
https://specs.openstack.org/openstack/ceilometer-specs/specs/kilo/notification-coordiation.html
https://github.com/openstack/ceilometer-specs/blob/master/specs/juno/central-agent-partitioning.rst


4 ceilometer-notification服务中关于协调组的源码分析
4.1 总入口
ceilometer/notification.py的run方法
具体代码如下:

class NotificationService(service_base.PipelineBasedService):
    """Notification service.

    When running multiple agents, additional queuing sequence is required for
    inter process communication. Each agent has two listeners: one to listen
    to the main OpenStack queue and another listener(and notifier) for IPC to
    divide pipeline sink endpoints. Coordination should be enabled to have
    proper active/active HA.
    """

    NOTIFICATION_NAMESPACE = 'ceilometer.notification'
    NOTIFICATION_IPC = 'ceilometer-pipe'
    def run(self):
        super(NotificationService, self).run()
        self.shutdown = False
        self.periodic = None
        self.partition_coordinator = None
        self.coord_lock = threading.Lock()

        self.listeners = []

        # NOTE(kbespalov): for the pipeline queues used a single amqp host
        # hence only one listener is required
        self.pipeline_listener = None

        self.pipeline_manager = pipeline.setup_pipeline()

        self.event_pipeline_manager = pipeline.setup_event_pipeline()

        self.transport = messaging.get_transport()

        if cfg.CONF.notification.workload_partitioning:
            self.group_id = self.NOTIFICATION_NAMESPACE
            self.partition_coordinator = coordination.PartitionCoordinator()
            self.partition_coordinator.start()
        else:
            # FIXME(sileht): endpoint uses the notification_topics option
            # and it should not because this is an oslo_messaging option
            # not a ceilometer. Until we have something to get the
            # notification_topics in another way, we must create a transport
            # to ensure the option has been registered by oslo_messaging.
            messaging.get_notifier(self.transport, '')
            self.group_id = None

        self.pipe_manager = self._get_pipe_manager(self.transport,
                                                   self.pipeline_manager)
        self.event_pipe_manager = self._get_event_pipeline_manager(
            self.transport)

        self._configure_main_queue_listeners(self.pipe_manager,
                                             self.event_pipe_manager)

        if cfg.CONF.notification.workload_partitioning:
            # join group after all manager set up is configured
            self.partition_coordinator.join_group(self.group_id)
            self.partition_coordinator.watch_group(self.group_id,
                                                   self._refresh_agent)

            @periodics.periodic(spacing=cfg.CONF.coordination.heartbeat,
                                run_immediately=True)
            def heartbeat():
                self.partition_coordinator.heartbeat()

            @periodics.periodic(spacing=cfg.CONF.coordination.check_watchers,
                                run_immediately=True)
            def run_watchers():
                self.partition_coordinator.run_watchers()

            self.periodic = periodics.PeriodicWorker.create(
                [], executor_factory=lambda:
                futures.ThreadPoolExecutor(max_workers=10))
            self.periodic.add(heartbeat)
            self.periodic.add(run_watchers)

            utils.spawn_thread(self.periodic.start)

            # configure pipelines after all coordination is configured.
            with self.coord_lock:
                self._configure_pipeline_listener()

        if not cfg.CONF.notification.disable_non_metric_meters:
            LOG.warning(_LW('Non-metric meters may be collected. It is highly '
                            'advisable to disable these meters using '
                            'ceilometer.conf or the pipeline.yaml'))

        self.init_pipeline_refresh()

分析:
4.1.1) 
上述方法中,如果开启了 cfg.CONF.notification.workload_partitioning
则:
            self.group_id = self.NOTIFICATION_NAMESPACE
   

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值