引入nova placement之后对调度的影响(by quqi99)

作者:张华 发表于:2020-09-17

nova cell v2

nova cell v2将nova db分成了3个(nova, nova_api, nova_cell0,虚机信息只存储在所在的cell中,公共数据存储在nova_api库中), nova_api中的3个表(nova_api.host_mappings, nova_api.instance_mappings, nova_api.cell_mappings可以直接从instance找到cell_id进而找到DB与MQ的信息,这样nova-api直接就可以操作该cell之类的DB与MQ从而可以让nova-compute可以水平扩展到更多的物理节点,另一方面,nova-api节点也不再需要nova-cell服务,要有nova-api与nova-scheduler两个服务即可.

mysql> pager less -S
PAGER set to 'less -S'
mysql> show tables;
mysql> select instance_uuid,cell_id from instance_mappings;
| instance_uuid                        | cell_id |
| 4039ed4e-d0a1-46ba-99a5-68bc84421b42 |       2 |

mysql> select instance_uuid,cell_id from nova_api.instance_mappings;

mysql> select * from host_mappings;
| created_at          | updated_at | id | cell_id | host                                |
| 2020-09-16 06:18:26 | NULL       |  1 |       2 | juju-3ba760-ceilometer-15.cloud.sts |

mysql> select transport_url,name,database_connection from cell_mappings;
| transport_url                                                                                            | name  | database_connection                                                         |
| none:///                                                                                                 | cell0 | mysql+pymysql://nova:4Hjrdj5yMTkG6V9nxNpqrfVdhtJ5Tnww@ |
| rabbit://nova:wSz5LjscfBqKnhVWKBZnrXdwS5Kz6TByz9jKfm2xKHbCRYPPSbcnqFwPTnCp8VpP@ | cell1 | mysql+pymysql://nova:4Hjrdj5yMTkG6V9nxNpqrfVdhtJ5Tnww@       |


openstack server delete 2ebf1b2d-f679-4265-9c4b-71420dace71a
No server with a name or ID of 2ebf1b2d-f679-4265-9c4b-71420dace71a


sudo nova-manage cell_v2 list_cells
sudo nova-manage cell_v2 map_instances --cell_uuid <cell-id-from-above>
openstack server delete 2ebf1b2d-f679-4265-9c4b-71420dace71a


sudo nova-manage cell_v2 discover_hosts --verbose

NOTE: 将nova_api.host_mappings表清空之后在ncc上运行’nova-manage cell_v2 discover_hosts --verbose’时总是报‘Found 0 unmapped computes in cell’,想恢复的话必须先使用’openstack compute service delete’命令删除service,然后再重启nova-compute, 最后再运行’nova-manage cell_v2 discover_hosts --verbose’时就会看到‘Found 1 unmapped computes in cell’

nova placement API

nova placement API在Newton被引入, nova-scheduler调用placement-api用于调度. 主要用于跟踪记录Resource Provider(compute-node, external storage-pool, external ip-allocation-pool etc)的Inventory和Usage.自Pike版本, 必须启用Placement API来辅助nova-scheduler service进行compute node调度,并以此替代之前的RAMFilter、CoreFilter和DiskFilter。概念对象如下:

  • Resource Class, 资源种类, placement api默认实现了DISK_GB, MEMORY_MB,VCPU三种标准resource classes, 也提供了custom resource classes的接口.
  • Resource Providers:资源提供者,实际提供资源的对象,例如:compute node、storage pool
  • Inventory:资源清单,资源提供者所拥有的资源清单,例如:compute node 拥有的vCPU、Disk、RAM 等 inventories
  • Resource Allocations:资源分配状况,包含了Resource Class、Resource Provider以及Consumer 的映射关系。记录消费者使用了多少该类型的资源数量
  • Provider Aggregate:资源聚合,类似 HostAggregate 的概念
  • Traits:资源特征,不同资源提供者可能会具有不同的资源特征。Traits 作为资源提供者特征的描述,它不能够被消费,但在某些Workflow 或者会需要这些信息。例如:标识可用的Disk是一个SSD,可以帮助Scheduler更好的匹配 instance boot请求。
# 当然,compute_node表中的记录不需要删除,resource-update线程应该自动更新里面的drity usage (eg: pinned_vcps)
mysql> select * from placement.resource_providers;
| created_at          | updated_at          | id | uuid                                 | name                                | generation | can_host | root_provider_id | parent_provider_id |
| 2020-09-16 06:18:15 | 2020-09-16 10:19:33 |  1 | a7081054-ee03-44b8-ae21-f20e0535cfc1 | juju-3ba760-ceilometer-15.cloud.sts |         19 |     NULL |                1 |               NULL |

# for the field resource_class_id, 0 means VCPU, 1 means MEMORY_MB, 2 means DISK_GB
mysql> select * from placement.inventories;
| created_at          | updated_at | id | resource_provider_id | resource_class_id | total | reserved | min_unit | max_unit | step_size | allocation_ratio |
| 2020-09-16 06:18:15 | NULL       |  1 |                    1 |                 0 |     2 |        0 |        1 |        2 |         1 |               16 |
| 2020-09-16 06:18:15 | NULL       |  2 |                    1 |                 1 |  3944 |      512 |        1 |     3944 |         1 |              1.5 |
| 2020-09-16 06:18:15 | NULL       |  3 |                    1 |                 2 |    38 |        0 |        1 |       38 |         1 |                1 |
mysql> select * from placement.allocations;
| created_at          | updated_at | id | resource_provider_id | consumer_id                          | resource_class_id | used |
| 2020-09-16 08:45:44 | NULL       | 16 |                    1 | 64cb10fd-246f-4864-b06d-687d59c47c2c |                 2 |    1 |
| 2020-09-16 08:45:44 | NULL       | 17 |                    1 | 64cb10fd-246f-4864-b06d-687d59c47c2c |                 1 |   64 |
| 2020-09-16 08:45:44 | NULL       | 18 |                    1 | 64cb10fd-246f-4864-b06d-687d59c47c2c |                 0 |    1 |

Placement CLI

sudo apt install python3-osc-placement -y

$ openstack resource provider list
| uuid                                 | name                                | generation |
| a7081054-ee03-44b8-ae21-f20e0535cfc1 | juju-3ba760-ceilometer-15.cloud.sts |         19 |

$  openstack resource provider inventory list a7081054-ee03-44b8-ae21-f20e0535cfc1
| resource_class | allocation_ratio | min_unit | max_unit | reserved | step_size | total |
| VCPU           |             16.0 |        1 |        2 |        0 |         1 |     2 |
| MEMORY_MB      |              1.5 |        1 |     3944 |      512 |         1 |  3944 |
| DISK_GB        |              1.0 |        1 |       38 |        0 |         1 |    38 |

$ openstack resource provider usage show a7081054-ee03-44b8-ae21-f20e0535cfc1
| resource_class | usage |
| VCPU           |     3 |
| MEMORY_MB      |   192 |
| DISK_GB        |     3 |


如果说Inventory and Allocation是来辅助ResourceProvider来管理数量问题的话,那么traits用来辅助特征信息的管理。例如:用户需要为instance关联80G的disk(数量),但是也要求是SSD(特征),那么就需要标记StorageResourceProvider是不是SSD. 它类似于tag (https://github.com/openstack/os-traits)
mysql> select * from resource_provider_traits where resource_provider_id = 2;
| created_at          | updated_at | trait_id | resource_provider_id |
| 2020-10-26 12:30:09 | NULL       |       59 |                    2 |

mysql> select * from traits;
| created_at          | updated_at | id  | name                                  |
| 2020-10-26 12:27:34 | NULL       |  59 | COMPUTE_DEVICE_TAGGING                |

1, The cloud deployer creates an aggregate representing all the compute nodes in row 1, racks 6 through 10:
$AGG_UUID=`openstack aggregate create r1rck0610`
# for all compute nodes in the system that are in racks 6-10 in row 1...
openstack aggregate add host $AGG_UUID $HOSTNAME

2, The cloud deployer creates a ResourceProvider representing the NFS share:
$RP_UUID=`openstack resource-provider create "/mnt/nfs/row1racks0610/" \

3, The cloud deployer updates the resource provider’s capacity of shared disk:
openstack resource-provider set inventory $RP_UUID \
    --resource-class=DISK_GB \
    --total=100000 --reserved=1000 \
    --min-unit=50 --max-unit=10000 --step-size=10 \

4, The cloud deployer adds the STORAGE_SSD trait
openstack resource-provider trait add $RP_UUID STORAGE_SSD

5, Scheduling based on traits - https://docs.openstack.org/ironic/queens/install/configure-nova-flavors.html
openstack --os-baremetal-api-version 1.37 baremetal node add trait \
nova flavor-key my-baremetal-flavor set trait:CUSTOM_TRAIT1=required
nova flavor-key my-baremetal-flavor set trait:HW_CPU_X86_VMX=required

one bug

例如, 如https://bugs.launchpad.net/nova/+bug/1679750描述的场景, 在hostA上创建一虚机,hostA死掉了, 这时删除虚机时无法删除实例(因为nova-compute这时死掉了啊), 这样会导致allocations表中的记录没有被删除. 如果hostA又起来了, nova-compute的init_host->_complete_partial_deletion

pre_start_hook -> update_available_resource -> nova/compute/manager.py#update_available_resource_for_node -> update_available_resource -> _update_available_resource -> _remove_deleted_instances_allocations

当把一个nova-compute删除时,删了service和compute_node表记录后,却没有删除placement resource provider和host mapping records.
nova-compute自己都死了它是没法自己删自己的,所以改由nova-api在启动时在删除instances时也删除allocations表中的记录 - https://review.opendev.org/#/c/580498/

another bug

另一个bug, 客户说正常维护一台机器, 机器肯定先关机了然后起来之后, 说之前host上的一个instance被rebuild到了远程机器了, 要求查清原因. nova-api端找到了下列日志, 显然是触发了local-delete机制:

var/log/nova/nova-api-os-compute.log:2022-06-09 00:06:13.699 68432 WARNING nova.compute.api [req-608d8025-c117-4dc6-9e0f-d2ee1da3f74e bf1b9f5bb95d409ca23a9ce477e94145 d730ab6a0a334ccea1577cd6b725a82d - f1f0b64a74f9408b8ba3506e6f4f6e67 f1f0b64a74f9408b8ba3506e6f4f6e67] [instance: 65472e49-a426-4b0c-8ed4-781c52f68d3d] instance's host llw-nfvi-az1-sv-com-02 is down, deleting from database
var/log/nova/nova-api-os-compute.log:2022-06-09 00:06:17.413 68432 INFO nova.scheduler.client.report [req-608d8025-c117-4dc6-9e0f-d2ee1da3f74e bf1b9f5bb95d409ca23a9ce477e94145 d730ab6a0a334ccea1577cd6b725a82d - f1f0b64a74f9408b8ba3506e6f4f6e67 f1f0b64a74f9408b8ba3506e6f4f6e67] Deleted allocation for instance 65472e49-a426-4b0c-8ed4-781c52f68d3d
var/log/nova/nova-api-os-compute.log:2022-06-09 00:06:17.498 68432 INFO nova.osapi_compute.wsgi.server [req-608d8025-c117-4dc6-9e0f-d2ee1da3f74e bf1b9f5bb95d409ca23a9ce477e94145 d730ab6a0a334ccea1577cd6b725a82d - f1f0b64a74f9408b8ba3506e6f4f6e67 f1f0b64a74f9408b8ba3506e6f4f6e67], "DELETE /v2.1/d730ab6a0a334ccea1577cd6b725a82d/servers/65472e49-a426-4b0c-8ed4-781c52f68d3d HTTP/1.1" status: 204 len: 405 time: 3.9009511

根据下列代码分析, 似乎local_delete只有在有人或api在delete instance时才会被触发:

在_delete中, neutron-api有一个local_delete机制(neutron-api跟据心跳检查和service_down_time是判断是否有nova-compute服务DOWN掉了(可能是nova-compute死掉了,也可能是没死掉但功能不work这种就可以根据心跳service_down_time机制检测出来),如果is_local_delete=True且cell不为空将call _local_delete
只有soft_delete, _delete_instance, delete才会调用_delete, 也就是只有在delete instance时才会call _delete. 所以只有有人人为调用了delete instance时且满足is_local_delete=true的条件才会触发local_delete, 也见:https://bugs.launchpad.net/nova/+bug/1679750

即删除一个instance主要有以下两种情况(instance在vm_statesvm_states.SHELVED, vm_states.SHELVED_OFFLOADED时会采用其他方式):
is_local_delete = True 采用local_delete()
is_local_delete = False 采用compute_rpcapi.terminate_instance()

另外, nova-compute端有这种日志,

./nova-compute.log:2022-06-08 23:01:08.651 19391 INFO nova.virt.libvirt.driver [req-d920209e-a181-41c9-8e57-1ada3850b81b 6e4b921233de47499c204ad695d893e5 f4d010bb96704bd7891351697103d4f5 - a473f51a33534c0fbf1febb216be04ba a473f51a33534c0fbf1febb216be04ba] [instance: 65472e49-a426-4b0c-8ed4-781c52f68d3d] Instance shutdown successfully after 23 seconds.
./nova-compute.log:2022-06-08 23:01:08.653 19391 INFO nova.virt.libvirt.driver [-] [instance: 65472e49-a426-4b0c-8ed4-781c52f68d3d] Instance destroyed successfully.
./nova-compute.log:2022-06-08 23:01:23.299 19391 INFO nova.compute.manager [-] [instance: 65472e49-a426-4b0c-8ed4-781c52f68d3d] VM Stopped (Lifecycle Event)
./nova-compute.log:2022-06-09 00:10:12.120 4572 ERROR oslo_messaging.rpc.server [req-681cc4a6-cad6-480b-bcba-392eed949412 bf1b9f5bb95d409ca23a9ce477e94145 d730ab6a0a334ccea1577cd6b725a82d - f1f0b64a74f9408b8ba3506e6f4f6e67 f1f0b64a74f9408b8ba3506e6f4f6e67] Exception during message handling: InstanceNotFound_Remote: Instance 65472e49-a426-4b0c-8ed4-781c52f68d3d could not be found.
./nova-compute.log:InstanceNotFound: Instance 65472e49-a426-4b0c-8ed4-781c52f68d3d could not be found.
./nova-compute.log:2022-06-09 00:10:12.120 4572 ERROR oslo_messaging.rpc.server InstanceNotFound_Remote: Instance 65472e49-a426-4b0c-8ed4-781c52f68d3d could not be found.
./nova-compute.log:2022-06-09 00:10:12.120 4572 ERROR oslo_messaging.rpc.server InstanceNotFound: Instance 65472e49-a426-4b0c-8ed4-781c52f68d3d could not be found.
./nova-compute.log:2022-06-09 00:15:03.899 4572 INFO nova.virt.libvirt.driver [req-7b2fd0cd-a807-425b-9df1-ed4a888d495b - - - - -] [instance: 65472e49-a426-4b0c-8ed4-781c52f68d3d] Deleting instance files /var/lib/nova/instances/65472e49-a426-4b0c-8ed4-781c52f68d3d_del
./nova-compute.log:2022-06-09 00:15:03.902 4572 INFO nova.virt.libvirt.driver [req-7b2fd0cd-a807-425b-9df1-ed4a888d495b - - - - -] [instance: 65472e49-a426-4b0c-8ed4-781c52f68d3d] Deletion of /var/lib/nova/instances/65472e49-a426-4b0c-8ed4-781c52f68d3d_del complete
./nova-compute.log:2022-06-09 00:15:05.010 4572 WARNING nova.virt.libvirt.driver [req-7b2fd0cd-a807-425b-9df1-ed4a888d495b - - - - -] Periodic task is updating the host stat, it is trying to get disk instance-00000116, but disk file was removed by concurrent operations such as resize.: OSError: [Errno 2] No such file or directory: '/var/lib/nova/instances/65472e49-a426-4b0c-8ed4-781c52f68d3d/disk.config'


  • 2022-06-08 23:01:23, the instance was stopped. 应该是维护开始停host的时候
  • 2022-06-09 00:06:17 , neutron-api deletes the allocations for the instance - https://review.opendev.org/c/openstack/nova/+/580498/1/nova/compute/api.py#2107
  • 2022-06-09 00:10:04, nova-compute was restarted - 2022-06-09 00:10:04.220 4572 INFO nova.service [-] Starting compute node (version 17.0.13) - 应该是维护结束启动host的进修
  • 2022-06-09 00:10:12, so nova-compute reported ‘InstanceNotFound_Remote’ because the following path was triggered
pre_start_hook -> update_available_resource -> nova/compute/manager.py#update_available_resource_for_node -> update_available_resource -> _update_available_resource -> _remove_deleted_instances_allocations
  • 2022-06-09 00:15:03, the nova-compute started to delete instance files /var/lib/nova/instances/65472e49-a426-4b0c-8ed4-781c52f68d3d_del

究竟是什么因为触发的local_delete呢? neutron-api也没有发现定时线程来循环检测host是否down的代码啊, 仅仅只在delete instance时会检查host是否down来触发local_delete哦


NOTE: the table resource_providers, inventories allocations are in the db placement rather than nova_api
select * nova_api.from host_mappings;
select * from nova_api.cell_mappings;
select * from placement.resource_providers where name like '%xxx%';   xx.bos01.xxx (9)
select * from nova.compute_nodes where host like '%bagon%' or hypervisor_hostname like '%xxx%';
select * from placement.inventories where resource_provider_id in (select id from nova_api.resource_providers where name like '%xxx%');
select * from placement.allocations where resource_provider_id in (select id from nova_api.resource_providers where name like '%xxx%') order by consumer_id,resource_provider_id,resource_class_id;
select uuid, host, node, vcpus, memory_mb, vm_state, power_state, task_state, root_gb, ephemeral_gb, cell_name,deleted from nova.instances where uuid in (select consumer_id from nova_api.allocations where resource_provider_id in (select id from nova_api.resource_providers where name like '%xxx%')) order by uuid;

20201230更新 - another bug

"select numa_topology from nova.compute_nodes where hypervisor_hostname=‘cloud3.xxx.com’\G"显示cell0上的pinned_cpus将所有CPU全用完了导致nova-schedule无法继续调度报“Filter NUMATopologyFilter returned 0 hosts"这种错。

pre_start_hook -> update_available_resource -> _update_available_resource -> _update_usage_from_instances -> _update_usage_from_instance -> _update_usage -> numa_usage_from_instance_numa

从数据库拿出host_cell.pinned_cpus作为pinned_cpus的初始值,特别要注意:host_cell.pinned_cpus并不是直接从数据库取的,它通过运行下列的self._copy_resources(cn, resources)方法实际上让host_cell.pinned_cpus永远为empty

693 def _init_compute_node(self, context, resources):
713 if nodename in self.compute_nodes:
714 cn = self.compute_nodes[nodename]
715 self._copy_resources(cn, resources)
716 self._setup_pci_tracker(context, cn, resources)
717 return False


def numa_usage_from_instance_numa(host_topology, instance_topology,free=False):
for host_cell in host_topology.cells:
new_cell = objects.NUMACell(
if free:
if (instance_cell.cpu_thread_policy ==

free是由”free = sign == -1“决定的(看仔细,右边的是两个等号,左边的是一个等号)

def _update_usage(self, usage, nodename, sign=1):
free = sign == -1
cn.numa_topology = hardware.numa_usage_from_instance_numa(
host_numa_topology, instance_numa_topology, free)._to_json()

def _update_usage_from_instance():
is_new_instance = uuid not in self.tracked_instances
is_removed_instance = not is_new_instance and (is_removed or
instance['vm_state'] in vm_states.ALLOW_RESOURCE_REMOVAL)
if is_new_instance:
sign = 1
if is_removed_instance:
sign = -1
self._update_usage(self._get_usage_dict(instance, instance),nodename, sign=sign)

所以只要update_available_resource运行那脏记录必须得到修改,那现在没修改说明update_available_resource一直没运行,日志里发现这种错误placement正在使用http而非https打头的endpoint从而导致placement api不可用,这样导致update_available_resource在调用update_placement时出错,从而导致update_available_resource自2020-10-26后再未运行。详见-https://bugs.launchpad.net/charm-nova-compute/+bug/1826382

2020-10-26 15:43:34.459 1393 WARNING keystoneauth.discover [req-5dcdc394-2784-40d2-984c-54fe261f36f0 - - - - -] Failed to contact the endpoint at http://placement-int.xxx.com:8778 for discovery. Fallback to using that endpoint as the base url.
2020-10-26 15:43:34.463 1393 ERROR nova.compute.manager [req-5dcdc394-2784-40d2-984c-54fe261f36f0 - - - - -] Could not retrieve compute node resource provider 8bd4062b-84c7-4aab-ade7-31dc01695878 and therefore unable to error out any instances stuck in BUILDING state. Error: Failed to retrieve allocations for resource provider 8bd4062b-84c7-4aab-ade7-31dc01695878:

关于numa测试环境的搭建可以见-https://blog.csdn.net/quqi99/article/details/51993512, 注意一点,grub里定义isolcpus并不会让nova不使用这些cpu, nova里专门有vcpu_pin_set来做这件事。

其他一个, https://zhhuabj.blog.csdn.net/article/details/50988089

20211214更新 - FQDN hostname test

注:最后的原因查出来是在删除了stale resource-provider record之后,nova-compute没有重启,这样’openstack compute service list’看到的还是老host的记录,另外,也要记得重启neutron-openviswithch-agent,可使用’openstack network agent list’确认

1, delete compute_nodes record

delete from nova.services where host='juju-6ae090-focal2-9.cloud.sts';
delete from nova.compute_nodes where hypervisor_hostname='juju-6ae090-focal2-9.cloud.sts';

2, The record in two table nova.services and nova.compute_nodes will be recreated automatically after nova-compute restart

juju ssh nova-compute/1 -- sudo systemctl restart nova-compute

3, run 'discover_hosts' command in nova-cloud-controller unit

root@juju-6ae090-focal2-8:/home/ubuntu# nova-manage cell_v2 discover_hosts --verbose
Found 2 cell mappings.
Skipping cell0 since it does not contain hosts.
Getting computes from cell 'cell1': 4473067d-4c91-459f-93ae-79cb1e1203c7
/usr/lib/python3/dist-packages/pymysql/cursors.py:170: Warning: (3719, "'utf8' is currently an alias for the character set UTF8MB3, but will be an alias for UTF8MB4 in a future release. Please consider using UTF8MB4 in order to be unambiguous.")
  result = self._query(query)
Checking host mapping for compute host 'juju-6ae090-focal2-9.cloud.sts': 1a4600d5-f889-429e-bb67-00498f3166ab
Found 0 unmapped computes in cell: 4473067d-4c91-459f-93ae-79cb1e1203c7

4, check nova-compute service is there, and juju-6ae090-focal2-9.cloud.sts is in cell1

$ openstack compute service list |grep nova-compute
|  9 | nova-compute   | juju-6ae090-focal2-9.cloud.sts | nova     | enabled | up    | 2021-12-14T07:47:23.000000 |

# nova-manage cell_v2 list_hosts 
| Cell Name |              Cell UUID               |            Hostname            |
|   cell1   | 4473067d-4c91-459f-93ae-79cb1e1203c7 | juju-6ae090-focal2-9.cloud.sts |

5, create a instance for the test, then I hit ResourceProviderCreationFailed exception from nova-compute.log

2021-12-14 07:50:41.044 24622 ERROR nova.compute.manager [req-b7b0a884-3e07-430d-bd1d-7933cf06befb - - - - -] Error updating resources for node juju-6ae090-focal2-9.cloud.sts.: nova.exception.ResourceProviderCreationFailed: Failed to create resource provider juju-6ae090-focal2-9.cloud.sts
2021-12-14 07:50:41.044 24622 ERROR nova.compute.manager nova.exception.ResourceProviderCreationFailed: Failed to create resource provider juju-6ae090-focal2-9.cloud.sts

6, It looks uuid between placement.resource_providers and nova.compute_nodes conflicts

mysql> select id,uuid,name from placement.resource_providers where name='juju-6ae090-focal2-9.cloud.sts';
| id | uuid                                 | name                           |
|  2 | cd83ab34-5407-4c88-98bb-60afc100abbf | juju-6ae090-focal2-9.cloud.sts |
mysql> select id, hypervisor_hostname, host_ip, host, uuid from nova.compute_nodes where host='juju-6ae090-focal2-9.cloud.sts';
| id | hypervisor_hostname            | host_ip   | host                           | uuid                                 |
|  3 | juju-6ae090-focal2-9.cloud.sts | | juju-6ae090-focal2-9.cloud.sts | 1a4600d5-f889-429e-bb67-00498f3166ab |

so modify it.

update placement.resource_providers set uuid='1a4600d5-f889-429e-bb67-00498f3166ab' where name='juju-6ae090-focal2-9.cloud.sts';

7, then it works again.

select id, hypervisor_hostname, host_ip, host, uuid from nova.compute_nodes;
select * from nova_api.host_mappings;
openstack host list
openstack compute service list
openstack server show <id>


# TEST 4

1, create a dead record by 'openstack compute service delete 9', and change it's host from 'juju-6ae090-focal2-9.cloud.sts' to 'juju-6ae090-focal2-9' by 'update nova.compute_nodes set host='juju-6ae090-focal2-9' where id=3;' as well

mysql> select deleted_at, id, hypervisor_hostname, host_ip, host, uuid  from nova.compute_nodes where host like 'juju-6ae090-focal2-9%';
| deleted_at          | id | hypervisor_hostname            | host_ip   | host                           | uuid                                 |
| 2021-12-14 09:21:26 |  3 | juju-6ae090-focal2-9.cloud.sts | | juju-6ae090-focal2-9 | 1a4600d5-f889-429e-bb67-00498f3166ab |

2, restart nova-compute service to recreate nova.compute_nodes record automatically by 'juju ssh nova-compute/1 -- sudo systemctl restart nova-compute'

mysql> select deleted_at, id, hypervisor_hostname, host_ip, host, uuid  from nova.compute_nodes where host like 'juju-6ae090-focal2-9%';
| deleted_at          | id | hypervisor_hostname            | host_ip   | host                           | uuid                                 |
| NULL                |  4 | juju-6ae090-focal2-9.cloud.sts | | juju-6ae090-focal2-9.cloud.sts | 5ed16f18-f83a-4109-a8ef-74b8e5d81218 |
| 2021-12-14 09:21:26 |  3 | juju-6ae090-focal2-9.cloud.sts | | juju-6ae090-focal2-9 | 1a4600d5-f889-429e-bb67-00498f3166ab |

3, double confirm uuids between placement.resource_providers and nova.compute_nodes are same (all are 5ed16f18-f83a-4109-a8ef-74b8e5d81218).

mysql> select id,uuid,name from placement.resource_providers;
| id | uuid                                 | name                           |
| 52 | 5ed16f18-f83a-4109-a8ef-74b8e5d81218 | juju-6ae090-focal2-9.cloud.sts |

4, double confirm nova-compute service is there

$ openstack compute service list |grep nova-compute
| 10 | nova-compute   | juju-6ae090-focal2-9.cloud.sts | nova     | enabled | up    | 2021-12-14T09:36:36.000000 |

5, it works

# TEST 5

1, Let's change host to 

update nova.compute_nodes set host='juju-6ae090-focal2-9.cloud.sts' where id=3;
update nova.compute_nodes set host='juju-6ae090-focal2-9' where id=4;

mysql> select deleted_at, id, hypervisor_hostname, host_ip, host, uuid  from nova.compute_nodes where host like 'juju-6ae090-focal2-9%';
| deleted_at          | id | hypervisor_hostname            | host_ip   | host                           | uuid                                 |
| NULL                |  4 | juju-6ae090-focal2-9.cloud.sts | | juju-6ae090-focal2-9           | 5ed16f18-f83a-4109-a8ef-74b8e5d81218 |
| 2021-12-14 09:21:26 |  3 | juju-6ae090-focal2-9.cloud.sts | | juju-6ae090-focal2-9.cloud.sts | 1a4600d5-f889-429e-bb67-00498f3166ab |

2, change host to juju-6ae090-focal2-9 in nova_api.host_mappings as well

update nova_api.host_mappings set host='juju-6ae090-focal2-9' where host='juju-6ae090-focal2-9.cloud.sts';
mysql> select * from nova_api.host_mappings;
| created_at          | updated_at | id | cell_id | host                 |
| 2021-12-14 09:28:02 | NULL       |  2 |       2 | juju-6ae090-focal2-9 |

3, but the output of 'openstack compute service list' is still juju-6ae090-focal2-9.cloud.sts

$ openstack compute service list |grep nova-compute
| 10 | nova-compute   | juju-6ae090-focal2-9.cloud.sts | nova     | enabled | up    | 2021-12-14T09:51:26.000000 |

4, so it didn't work

5, change host to juju-6ae090-focal2-9 in both nova.conf and neutron.conf, then restart nova-compute and neutron-openvswitch-agent, now we have two nova-compute services.

# grep -r 'juju-6ae090-focal2-9' /etc/nova/
/etc/nova/nova.conf:host = juju-6ae090-focal2-9
# grep -r 'juju-6ae090-focal2-9' /etc/neutron/
/etc/neutron/neutron.conf:host = juju-6ae090-focal2-9

$ openstack compute service list |grep nova-compute
| 10 | nova-compute   | juju-6ae090-focal2-9.cloud.sts | nova     | enabled | down  | 2021-12-14T10:07:45.000000 |
| 11 | nova-compute   | juju-6ae090-focal2-9           | nova     | enabled | up    | 2021-12-14T10:09:08.000000 |

uuid is 5ed16f18-f83a-4109-a8ef-74b8e5d81218, so id=4 in nova.compute_nodes will be used.

mysql> select * from placement.resource_providers;
| created_at          | updated_at          | id | uuid                                 | name                           | generation | root_provider_id | parent_provider_id |
| 2021-12-14 09:28:01 | 2021-12-14 10:11:18 | 52 | 5ed16f18-f83a-4109-a8ef-74b8e5d81218 | juju-6ae090-focal2-9.cloud.sts |          8 |               52 |               NULL |
mysql> select deleted_at, id, hypervisor_hostname, host_ip, host, uuid  from nova.compute_nodes where host like 'juju-6ae090-focal2-9%';
| deleted_at          | id | hypervisor_hostname            | host_ip   | host                           | uuid                                 |
| NULL                |  4 | juju-6ae090-focal2-9.cloud.sts | | juju-6ae090-focal2-9           | 5ed16f18-f83a-4109-a8ef-74b8e5d81218 |
| NULL                |  5 | juju-6ae090-focal2-9.cloud.sts | | juju-6ae090-focal2-9.cloud.sts | d9898d66-182c-4494-93a2-95e656fc1001 |
| 2021-12-14 09:21:26 |  3 | juju-6ae090-focal2-9.cloud.sts | | juju-6ae090-focal2-9.cloud.sts | 1a4600d5-f889-429e-bb67-00498f3166ab |

6, it works as well.


  • nova.conf与neutron.conf中都配置有host, 如:host=juju-6ae090-focal2-9
  • 当创建虚机时,虚机通过instance_mapping找到cell id, 进而通过cell_mappings找到MQ与DB地址(可在ncc unit上运行‘nova-manage cell_v2 list_cells’获取cell信息)。另一方面,通过host_mapping找到host (在ncc unit运行’nova-manage cell_v2 discover_hosts --verbose’却总是报:Found 0 unmapped computes in cell)
mysql> select instance_uuid,cell_id from nova_api.instance_mappings;
| instance_uuid                        | cell_id |
| 0be2dce7-c85b-4354-82ea-c08524382565 |       2 |
mysql> select * from nova_api.host_mappings;
| created_at          | updated_at | id | cell_id | host                 |
| 2021-12-14 11:05:45 | NULL       |  4 |       2 | juju-6ae090-focal2-9 |
  • 别忘了service:
$ openstack compute service list |grep nova-compute
| 10 | nova-compute   | juju-6ae090-focal2-9.cloud.sts | nova     | enabled | down  | 2021-12-14T10:07:45.000000 |
| 12 | nova-compute   | juju-6ae090-focal2-9           | nova     | enabled | up    | 2021-12-14T11:10:31.000000 |

mysql> select * from nova.services where host like 'juju-6ae090-focal2-9%';
| created_at          | updated_at          | deleted_at          | id | host                           | binary       | topic   | report_count | disabled | deleted | disabled_reaso>
| 2021-12-14 11:05:33 | 2021-12-14 11:12:30 | NULL                | 12 | juju-6ae090-focal2-9           | nova-compute | compute |           42 |        0 |       0 | NULL          >
| 2021-12-14 10:08:02 | 2021-12-14 11:04:46 | 2021-12-14 11:04:48 | 11 | juju-6ae090-focal2-9           | nova-compute | compute |          339 |        0 |      11 | NULL          >
| 2021-12-14 09:28:00 | 2021-12-14 10:07:45 | NULL                | 10 | juju-6ae090-focal2-9.cloud.sts | nova-compute | compute |          238 |        0 |       0 | NULL          >
| 2021-12-14 07:44:38 | 2021-12-14 09:21:17 | 2021-12-14 09:21:26 |  9 | juju-6ae090-focal2-9.cloud.sts | nova-compute | compute |          580 |        0 |       9 | NULL          >
  • compute_nodes往往和service是一起的, compute_nodes有uuid它再关联resource_provider
mysql> select deleted_at, id, hypervisor_hostname, host_ip, host, uuid  from nova.compute_nodes where host like 'juju-6ae090-focal2-9%';
| deleted_at          | id | hypervisor_hostname            | host_ip   | host                           | uuid                                 |
| NULL                |  6 | juju-6ae090-focal2-9.cloud.sts | | juju-6ae090-focal2-9           | 1f39f93f-9ecf-4284-8001-af34c1d2f34c |
| 2021-12-14 11:04:48 |  4 | juju-6ae090-focal2-9.cloud.sts | | juju-6ae090-focal2-9           | 5ed16f18-f83a-4109-a8ef-74b8e5d81218 |
| NULL                |  5 | juju-6ae090-focal2-9.cloud.sts | | juju-6ae090-focal2-9.cloud.sts | d9898d66-182c-4494-93a2-95e656fc1001 |
| 2021-12-14 09:21:26 |  3 | juju-6ae090-focal2-9.cloud.sts | | juju-6ae090-focal2-9.cloud.sts | 1a4600d5-f889-429e-bb67-00498f3166ab |
  • resource_providers表, 这里是通过uuid和nova_computes关联的,然后它的id再和allocation这些表关联,所以实际上它这里的name反而无所谓,name是juju-6ae090-focal2-9.cloud.sts是因为之后的测试使用的是juju-6ae090-focal2-9.cloud.sts,后来没有删除resource_provider所以它还是FQDN
    mysql> select * from placement.resource_providers;
    | created_at | updated_at | id | uuid | name | generation | root_provider_id | parent_provider_id |
    | 2021-12-14 11:05:35 | 2021-12-14 11:09:29 | 54 | 1f39f93f-9ecf-4284-8001-af34c1d2f34c | juju-6ae090-focal2-9.cloud.sts | 3 | 54 | NULL |

20240102 -

VM明明被删除了,但openstack server list时能看到它, 那是nova/nova_cell0里有了dumplicate instance uuid (nova_cell0 will store instances that cannot be scheduled), 可能是网络问题造成的,可以用下列方法修复DB:

1. Double-check the cell mappings first using `select * from nova_api.cell_mappings\G` to confirm the mappings:
cell0 id=2 uuid=00000000-0000-0000-0000-000000000000
cell1 id=5 uuid=03e1129c-2952-4512-874b-e45bc8f280a2
2. Use the instance UUID to check which database contains an alive entry:
# select * from nova.instances where uuid='<-instance_uuid->' and deleted=0\G
# select * from nova_cell0.instances where uuid='<-instance_uuid->' and deleted=0\G
3. Modify the instance mapping:
If nova has an alive entry, let the instance map point to cell1.
If nova_cell0 has an alive entry, the mapping should point to cell0.
Confirm the current mapping with `select cell_id from nova_api.instance_mappings where instance_uuid='<-instance_uuid->'\G` command.
If the cell ID doesn't match, delete it and map instances again:
# delete from nova_api.instance_mappings where instance_uuid='<-instance_uuid->' [PLEASE EXECUTE THIS WITH CAUTION]
# nova-manage cell_v2 map_instances --cell_uuid <-cell_uuid->
4. After confirming that the instance maps to the desired cell, delete the instance with the command:
openstack server delete <-instance_uuid->


例如在一个只有2个pCPU的机器上, 它的max_unit与total应该是2

$ openstack resource provider inventory list df00a126-2390-4dca-b057-c4d3b443c545 |head -n 4
| resource_class | allocation_ratio | min_unit | max_unit | reserved | step_size | total | used |
| VCPU           |              4.0 |        1 |        2 |        0 |         1 |     2 |    3 |

>>> import libvirt
>>> conn = libvirt.open("qemu:///system")
>>> cpu_nums = conn.getCPUMap()[0]
>>> print(cpu_nums)

root@juju-99d74e-ovn-11:/home/ubuntu# nc 4444
> /usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py(7558)_get_vcpu_available()
-> online_cpus = self._host.get_online_cpus()
(Pdb) bt
-> result = function(*args, **kwargs)
-> result = func(*self.args, **self.kw)
-> return self.manager.periodic_tasks(ctxt, raise_on_error=raise_on_error)
-> return self.run_periodic_tasks(context, raise_on_error=raise_on_error)
-> task(self, context)
-> self._update_available_resource_for_node(context, nodename,
-> self.rt.update_available_resource(context, nodename,
-> resources = self.driver.get_available_resource(nodename)
-> data["vcpus"] = len(self._get_vcpu_available())
> /usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py(7558)_get_vcpu_available()
-> online_cpus = self._host.get_online_cpus()

(Pdb) l
743             :returns: set of online CPUs, raises libvirtError on error
744             """
745             cpus, cpu_map, online = self.get_connection().getCPUMap()
747  ->         online_cpus = set()
748             for cpu in range(cpus):
749                 if cpu_map[cpu]:
750                     online_cpus.add(cpu)
752             return online_cpus
(Pdb) p cpus


root@juju-99d74e-ovn-10:/home/ubuntu# grep -r '5a7a8e59-71a4-4624-a368-a024803600bd' /var/log/nova/nova-scheduler.log
2024-06-03 05:03:39.536 2881454 DEBUG nova.scheduler.manager [req-634c42c7-8800-497b-82a1-545b5589cdfe 0389d2c5def94e4ca14366aa7a7a5228 e8c92bf3cd804ae694ef49749daf6eea - 4c83f99642134191b11ad2139afd3497 4c83f99642134191b11ad2139afd3497] Starting to schedule for instances: ['5a7a8e59-71a4-4624-a368-a024803600bd'] select_destinations /usr/lib/python3/dist-packages/nova/scheduler/manager.py:141

root@juju-99d74e-ovn-10:/home/ubuntu# grep -r 'req-634c42c7-8800-497b-82a1-545b5589cdfe' /var/log/nova/nova-scheduler.log
2024-06-03 05:03:40.056 2881454 INFO nova.scheduler.manager [req-634c42c7-8800-497b-82a1-545b5589cdfe 0389d2c5def94e4ca14366aa7a7a5228 e8c92bf3cd804ae694ef49749daf6eea - 4c83f99642134191b11ad2139afd3497 4c83f99642134191b11ad2139afd3497] Got no allocation candidates from the Placement API. This could be due to insufficient resources or a temporary occurrence as compute nodes start up.

nova-scheuder会调用osc-placement的/allocation_candidates api, 也可以直接用下列命令快速测试:

 openstack allocation candidate list --resource VCPU=3

要想’openstack allocation candidate list --resource VCPU=3’能work,可以用下列命令设置max_unit和total等于3

openstack resource provider inventory class set --allocation_ratio 4.0 --total 3 --max_unit 3 df00a126-2390-4dca-b057-c4d3b443c545 VCPU

但是上面这个命令的设置将在创建虚机后由这个path ( instance_claim -> _update -> _update_to_placement -> update_provider_tree #_get_vcpu_available())所覆盖,难道是要做如下设置吗?

$ git diff ./nova/virt/libvirt/driver.py
diff --git a/nova/virt/libvirt/driver.py b/nova/virt/libvirt/driver.py
index b1851296ac..9404cb03c5 100644
--- a/nova/virt/libvirt/driver.py
+++ b/nova/virt/libvirt/driver.py
@@ -9314,9 +9314,9 @@ class LibvirtDriver(driver.ComputeDriver):
         # forbids reporting inventory with total=0
         if vcpus:
             result[orc.VCPU] = {
-                'total': vcpus,
+                'total': vcpus * ratios[orc.VCPU],
                 'min_unit': 1,
-                'max_unit': vcpus,
+                'max_unit': vcpus * ratios[orc.VCPU],
                 'step_size': 1,
                 'allocation_ratio': ratios[orc.VCPU],
                 'reserved': CONF.reserved_host_cpus,

可能不需要照上面的修改,because if you have with the same 10 CPU server, a cpu-allocation-ratio of 2, this means that you can use 20 CPU on that server. Example:

  • 20 VMs with 1 CPU
  • 10 VMS with 2 CPU
  • 4 VMS with 5 CPU
  • 2 VM with 10 CPU.
    But you cannot allocate 1 VM with > 10 CPU. 即不好把一个VM的CPU数量分配超过某个物理节点的总cpu数量. 所以从这个角度上也可以说它没有bug.

20240904 - vGPU placement

GPU SR-IOV也分PF(real GPU)和VF(slice)

$ lspci | grep NVIDIA
25:00.0 3D controller: NVIDIA Corporation GA102GL [A10] (rev a1)
25:03.6 3D controller: NVIDIA Corporation GA102GL [A10] (rev a1)

mdevs (mediated devices)是一个中间层,位于VM与VM之间,

sudo nvidia-smi
juju deploy ch:nova-compute-nvidia-vgpu --channel=yoga/stable
juju integrate nova-compute-nvidia-vgpu:nova-vgpu nova-compute:nova-vgpu
juju attach nova-compute-nvidia-vgpu nvidia-vgpu-software=./nvidia-vgpu-ubuntu-510_510.47.03_amd64.deb
juju exec -a nova-compute-nvidia-vgpu -- sudo reboot
juju config nova-compute-nvidia-vgpu vgpu-device-mappings="{'nvidia-610': ['0000:25:02.3', '0000:25:00.5']}"
$ juju run nova-compute-nvidia-vgpu/0 list-vgpu-types
  nvidia-604, 0000:25:02.3, NVIDIA A10-12A, num_heads=1, frl_config=60, framebuffer=12288M, max_resolution=1280x1024, max_instance=2

openstack resource provider list
openstack resource provider inventory list xxx
openstack flavor set <flavor-name> --property resources:VGPU=1
openstack resource provider allocation show <vm-uuid>
openstack trait create CUSTOM_VGPU_PLACEMENT
for uuid in $(openstack resource provider list | grep pci_0000 | -e 0000_25_00_4 -e 0000_25_00_5 | awk '{print $2}'); 
   do openstack resource provider trait set --trait CUSTOM_VGPU_PLACEMENT $uuid
openstack flavor set 21e3177c-4879-429d-9f81-53199f38ec59 --property trait:CUSTOM_VGPU_PLACEMENT=required


[1] https://blog.csdn.net/jmilk/article/details/81264240

评论 1




当前余额3.43前往充值 >
领取后你会自动成为博主和红包主的粉丝 规则




¥1 ¥2 ¥4 ¥6 ¥10 ¥20



钱包余额 0


