上一篇分析了neutron wsgi应用的源码,这一篇分析另外一部分核心功能,rpc篇,同时分析一下neutron采用的并发模型。
还是上篇的代码,启动完wsgi后,启动rpc_workers。
neutron/server/wsgi_eventlet.py:
def eventlet_wsgi_server():
neutron_api = service.serve_wsgi(service.NeutronApiService)
start_api_and_rpc_workers(neutron_api)
可以看到neutron默认采用的并发模型是eventlet,eventlet是一个并发网络库,底层主要是通过epoll机制来实现非阻塞I/O操作,并提供了协程机制。关于eventlet不是本篇重点,以后专门写一篇介绍eventlet的。这里只知道使用它来实现并发即可。
这里提一下neutron采用的并发模型,neutron采用的是多进程加GreenPool的并发模型。wsgi app,rpc分别fork不同的子进程来执行,在每个子进程内部通过eventlet提供的GreenPool来提高吞吐量,可以理解为线程池(实际上是GreenThread,协程)。后面详细讲解这个过程。
WSGI服务启动涉及的类的关系图如下:
对照serve_wsgi函数来讲解上面的类图,边结合类图边看代码可以理的更清楚一些:
neutron/service.py:
def serve_wsgi(cls): try: service = cls.create() service.start() except Exception: with excutils.save_and_reraise_exception(): LOG.exception(_LE('Unrecoverable error: please check log ' 'for details.')) return service创建一个NeutronApiService对象,从类图可以看到这个类是WsgiService的子类,然后调用WsgiService的start方法启动服务:
service = cls.create() service.start()
start方法中会使用oslo_service.wsgi中的Loader来加载wsgi app,这个Loader实际上会使用上一篇中讲到的paste.deploy来加载app。
接着来看下其父类WsgiService的start方法:
neutron/service.py:
class WsgiService(object): """Base class for WSGI based services. For each api you define, you must also define these flags: :<api>_listen: The address on which to listen :<api>_listen_port: The port on which to listen """ def __init__(self, app_name): self.app_name = app_name self.wsgi_app = None def start(self): self.wsgi_app = _run_wsgi(self.app_name)
在start方法中调用_run_wsgi:
def run_wsgi_app(app): server = wsgi.Server("Neutron") server.start(app, cfg.CONF.bind_port, cfg.CONF.bind_host, workers=_get_api_workers()) LOG.info(_LI("Neutron service started, listening on %(host)s:%(port)s"), {'host': cfg.CONF.bind_host, 'port': cfg.CONF.bind_port}) return server结合类图可以看到WsgiService会声明一个neutron.wsgi::Server对象,这个对象内部会使用eventlet.GreenPool这个GreenThread池。然后调用其start方法。
neutron/wsgi.py:
def start(self, application, port, host='0.0.0.0', workers=0): """Run a WSGI server with the given application.""" self._host = host self._port = port backlog = CONF.backlog self._socket = self._get_socket(self._host, self._port, backlog=backlog) self._launch(application, workers)def _launch(self, application, workers=0): service = WorkerService(self, application, self.disable_ssl) if workers < 1: # The API service should run in the current process. self._server = service # Dump the initial option values cfg.CONF.log_opt_values(LOG, logging.DEBUG) service.start() systemd.notify_once() else: # dispose the whole pool before os.fork, otherwise there will # be shared DB connections in child processes which may cause # DB errors. api.dispose() # The API service runs in a number of child processes. # Minimize the cost of checking for child exit by extending the # wait interval past the default of 0.01s. self._server = common_service.ProcessLauncher(cfg.CONF, wait_interval=1.0) self._server.launch_service(service, workers=workers)
- 结合类图和_launch代码可知,Server对象会把application封装成一个WorkerService,然后使用oslo_service.service中提供的ProcessLanucher对象来启动WokerService。
封装成WorkerService
service = WorkerService(self, application, self.disable_ssl) 调用ProcessLauncher运行service:
self._server = common_service.ProcessLauncher(cfg.CONF, wait_interval=1.0) self._server.launch_service(service, workers=workers)
- ProcessLauncher的主要作用是根据workers数量来fork不同个数个子进程,再在每个子进程中启动WorkerService。
根据workers数量来创建不同个数的子进程来运行service:
def launch_service(self, service, workers=1): """Launch a service with a given number of workers. :param service: a service to launch, must be an instance of :class:`oslo_service.service.ServiceBase` :param workers: a number of processes in which a service will be running """ _check_service_base(service) wrap = ServiceWrapper(service, workers) LOG.info(_LI('Starting %d workers'), wrap.workers) while self.running and len(wrap.children) < wrap.workers: self._start_child(wrap)
- WorkerService在启动过程中会使用Server对象的GreenPool来spawn一个GreenThread来调用eventlet.wsgi.server运行我们的app.这样就最终运行起来了wsgi app服务并对外提供restful API.
neutron/wsgi.py:
可以看到start方法中会调用self._service也就是Server对象的pool.spawn来运行Server的_run方法:
class WorkerService(worker.NeutronWorker):
def start(self): super(WorkerService, self).start() # When api worker is stopped it kills the eventlet wsgi server which # internally closes the wsgi server socket object. This server socket # object becomes not usable which leads to "Bad file descriptor" # errors on service restart. # Duplicate a socket object to keep a file descriptor usable. dup_sock = self._service._socket.dup() if CONF.use_ssl and not self._disable_ssl: dup_sock = sslutils.wrap(CONF, dup_sock) self._server = self._service.pool.spawn(self._service._run, self._application, dup_sock)
self._service._run即为Server的_run方法:
def _run(self, application, socket): """Start a WSGI server in a new green thread.""" eventlet.wsgi.server(socket, application, max_size=self.num_threads, log=LOG, keepalive=CONF.wsgi_keep_alive, socket_timeout=self.client_socket_timeout)
- 默认情况下,workers配置为1,因此会创建一个子进程来提供restfulAPI服务,这个子进程中的eventlet.wsgi.server最终会运行在一个GreenThread中。
通过上面WSGI server的启动,我们知道了neutron使用进程+GreenPool的方式来运行服务,后面运行rpc服务也是使用上面这种架构。我们也知道了关键对象ProcessLauncher是通过创建进程的方式来启动服务的。ProcessLauncher启动的service需要是oslo_service.service::ServiceBase的子类并实现start方法。
有了上面的基础,再分析rpc的启动过程就容易了。
def start_api_and_rpc_workers(neutron_api): pool = eventlet.GreenPool() api_thread = pool.spawn(neutron_api.wait) try: neutron_rpc = service.serve_rpc() except NotImplementedError: LOG.info(_LI("RPC was already started in parent process by " "plugin.")) else: rpc_thread = pool.spawn(neutron_rpc.wait) plugin_workers = service.start_plugin_workers() for worker in plugin_workers: pool.spawn(worker.wait) # api and rpc should die together. When one dies, kill the other. rpc_thread.link(lambda gt: api_thread.kill()) api_thread.link(lambda gt: rpc_thread.kill()) pool.waitall()
主进程中使用GreenPool来运行neutron_api,neutron_rpc的wait方法,并调用waitall方法等待2个GreenThread结束,实际上这意味着主进程只是等待wsgi API,rpc两个子进程结束而已。其中的link方法是确保只要rpc,api有一个服务挂掉就结束另外一个服务。
我们重点分析neutron_rpc的创建过程:
neutron/service.py:
def serve_rpc(): plugin = manager.NeutronManager.get_plugin() service_plugins = ( manager.NeutronManager.get_service_plugins().values()) if cfg.CONF.rpc_workers < 1: cfg.CONF.set_override('rpc_workers', 1) # If 0 < rpc_workers then start_rpc_listeners would be called in a # subprocess and we cannot simply catch the NotImplementedError. It is # simpler to check this up front by testing whether the plugin supports # multiple RPC workers. if not plugin.rpc_workers_supported(): LOG.debug("Active plugin doesn't implement start_rpc_listeners") if 0 < cfg.CONF.rpc_workers: LOG.error(_LE("'rpc_workers = %d' ignored because " "start_rpc_listeners is not implemented."), cfg.CONF.rpc_workers) raise NotImplementedError() try: # passing service plugins only, because core plugin is among them rpc = RpcWorker(service_plugins) # dispose the whole pool before os.fork, otherwise there will # be shared DB connections in child processes which may cause # DB errors. LOG.debug('using launcher for rpc, workers=%s', cfg.CONF.rpc_workers) session.dispose() launcher = common_service.ProcessLauncher(cfg.CONF, wait_interval=1.0) launcher.launch_service(rpc, workers=cfg.CONF.rpc_workers) if (cfg.CONF.rpc_state_report_workers > 0 and plugin.rpc_state_report_workers_supported()): rpc_state_rep = RpcReportsWorker([plugin]) LOG.debug('using launcher for state reports rpc, workers=%s', cfg.CONF.rpc_state_report_workers) launcher.launch_service( rpc_state_rep, workers=cfg.CONF.rpc_state_report_workers) return launcher except Exception: with excutils.save_and_reraise_exception(): LOG.exception(_LE('Unrecoverable error: please check log for ' 'details.'))
plugin = manager.NeutronManager.get_plugin()这个NeutronManager上篇也提到过,它主要是通过配置文件来加载初始化正确的插件,如M2lPlugin,这里调用其类方法get_plugin()获取配置的核心插件保证NeutronManager是个单例类。plugin即为"Ml2Plugin"。
service_plugins = ( manager.NeutronManager.get_service_plugins().values())然后获取所有的service_plugins,这个上篇中也讲到过,最终会获取到以下6个插件实例:
'neutron.plugins.ml2.plugin.Ml2Plugin'
'neutron.services.network_ip_availability.plugin.NetworkIPAvailabilityPlugin'
'neutron.services.auto_allocate.plugin.Plugin'
'neutron.services.timestamp.timestamp_plugin.TimeStampPlugin'
'neutron.services.tag.tag_plugin.TagPlugin'
'neutron.services.l3_router.l3_router_plugin.L3RouterPlugin'
if cfg.CONF.rpc_workers < 1: cfg.CONF.set_override('rpc_workers', 1)然后从配置中获取配置的rpc_worker数量,默认为1。通过上面的分析可知,这个决定了后面ProcessLauncher启动几个子进程来提供服务。
if not plugin.rpc_workers_supported(): LOG.debug("Active plugin doesn't implement start_rpc_listeners") if 0 < cfg.CONF.rpc_workers: LOG.error(_LE("'rpc_workers = %d' ignored because " "start_rpc_listeners is not implemented."), cfg.CONF.rpc_workers) raise NotImplementedError()
然后判断核心插件(这里是Ml2Plugin)是否实现了start_rpc_listeners方法,如果没有实现则报错。
rpc = RpcWorker(service_plugins)然后创建了一个RpcWorker,这个和上面讲到的neutron.wsgi:WorkerService的作用一样,也是继承ServiceBase的子类NeutronWorker,并重写start方法,来交于ProcessLauncher运行。因此其start方法就是服务启动的关键代码:
neutron/service.py:
class RpcWorker(worker.NeutronWorker): """Wraps a worker to be handled by ProcessLauncher""" start_listeners_method = 'start_rpc_listeners' def __init__(self, plugins): self._plugins = plugins self._servers = [] def start(self): super(RpcWorker, self).start() for plugin in self._plugins: if hasattr(plugin, self.start_listeners_method): try: servers = getattr(plugin, self.start_listeners_method)() except NotImplementedError: continue self._servers.extend(servers)
可以看到,会遍历所有的service_plugins,也就是上面讲的6个插件,查看插件是否实现了"start_rpc_listeners"方法,如果实现了则调用之。这就是RpcWorker的作用。这些插件的start_rpc_listeners方法中就完成了rpc的功能,主要是通过消费特定名称的mq队列消息来提供服务。
launcher = common_service.ProcessLauncher(cfg.CONF, wait_interval=1.0) launcher.launch_service(rpc, workers=cfg.CONF.rpc_workers)这样就会通过ProcessLauncher来创建了workers个子进程(默认为1)提供RPC服务,具体的rpc功能实现交给插件的"start_rpc_listeners"方法去实现。
if (cfg.CONF.rpc_state_report_workers > 0 and plugin.rpc_state_report_workers_supported()): rpc_state_rep = RpcReportsWorker([plugin]) LOG.debug('using launcher for state reports rpc, workers=%s', cfg.CONF.rpc_state_report_workers) launcher.launch_service( rpc_state_rep, workers=cfg.CONF.rpc_state_report_workers)然后判断是否配置了rpc_state_report_workers,如果配置了则再启动指定个子进程运行RpcReportWorker,这个Worker也是继承自ServiceBase并重写了start方法。最终的rpc功能交由插件的'start_rpc_state_reports_listener'方法去实现。
plugin_workers = service.start_plugin_workers() for worker in plugin_workers: pool.spawn(worker.wait)
def start_plugin_workers(): launchers = [] # NOTE(twilson) get_service_plugins also returns the core plugin for plugin in manager.NeutronManager.get_unique_service_plugins(): # TODO(twilson) Instead of defaulting here, come up with a good way to # share a common get_workers default between NeutronPluginBaseV2 and # ServicePluginBase for plugin_worker in getattr(plugin, 'get_workers', tuple)(): print("Plugin start_worker",plugin,plugin_worker) launcher = common_service.ProcessLauncher(cfg.CONF) launcher.launch_service(plugin_worker) launchers.append(launcher) return launchers最后是调用所有插件的'get_workers'方法,这个方法用于插件定义自己的ServiceBase来提供自己的个性化服务,如果有自定义的ServiceBase,最终也会交由ProcessLauncher去创建进程启动服务。
这样,整个neutron就启动完成了,可以看到rpc,wsgi都是通过封装继承自ServiceBase并交由ProcessLauncher创建进程去启动,并通过钩子函数方便插件自定义需要的服务。如果默认配置,最终会有3个子进程,分别提供wsgi api,rpc,rpc_state_reports服务。
主进程,通过GreenPool等待所有子进程结束:
eutron 1348 1 0 16:02 ? 00:00:24 /usr/bin/python /usr/bin/neutron-server --config-file=/etc/neutron/neutron.conf --config-file=/etc/neutron/plugins/ml2/ml2_conf.ini --log-file=/var/log/neutron/neutron-server.log
3个子进程分别提供不同的服务:
neutron 3275 1348 0 16:03 ? 00:00:00 /usr/bin/python /usr/bin/neutron-server --config-file=/etc/neutron/neutron.conf --config-file=/etc/neutron/plugins/ml2/ml2_conf.ini --log-file=/var/log/neutron/neutron-server.log
neutron 3276 1348 0 16:03 ? 00:00:23 /usr/bin/python /usr/bin/neutron-server --config-file=/etc/neutron/neutron.conf --config-file=/etc/neutron/plugins/ml2/ml2_conf.ini --log-file=/var/log/neutron/neutron-server.log
neutron 3277 1348 0 16:03 ? 00:00:22 /usr/bin/python /usr/bin/neutron-server --config-file=/etc/neutron/neutron.conf --config-file=/etc/neutron/plugins/ml2/ml2_conf.ini --log-file=/var/log/neutron/neutron-server.log