作者:张华 发表于:2016-07-22
版权声明:可以任意转载,转载时请务必以超链接形式标明文章原始出处和作者信息及本版权声明
( http://blog.csdn.net/quqi99 )
虚机模拟NUMA硬件
我们可能没有NUMA硬件,但可以采用VM模拟。
1, 参考链接[1]打开KVM的嵌套虚拟化功能,Ubuntu KVM默认应该是开启的。
2, [可选] host机的网络如下,将仅有一块网卡eth0插到了br-phy中。
auto br-phy
allow-ovs br-phy
iface br-phy inet static
pre-up /usr/bin/ovs-vsctl -- --may-exist add-br br-phy
pre-up /usr/bin/ovs-vsctl -- --may-exist add-port br-phy eth0
address 192.168.99.125
gateway 192.168.99.1
network 192.168.99.0
netmask 255.255.255.0
broadcast 192.168.99.255
ovs_type OVSBridge
ovs_ports eth0
#sudo ip -6 addr add 2001:2:3:4500:fa32:e4ff:febe:87cd/64 dev br-phy
iface br-phy inet6 static
pre-up modprobe ipv6
address 2001:2:3:4500:fa32:e4ff:febe:87cd
netmask 64
gateway 2001:2:3:4500::1
auto eth0
allow-br-phy eth0
iface eth0 inet manual
ovs_bridge br-phy
ovs_type OVSPort
然后让KVM支持Openvswitch网桥:
sudo ovs-vsctl add-br br-phy
sudo virsh net-destroy default
sudo virsh net-edit default
<network>
<name>default</name>
<forward mode='bridge'/>
<bridge name='br-phy'/>
<virtualport type='openvswitch'/>
</network>
sudo virsh net-undefine default
sudo virsh net-autostart br-phy
3, 结合virt-manger安装一VM.
sudo apt-get install virt-viewer openvswitch-switch qemu-kvm libvirt-bin virt-manager virtinst virt-top python-libvirt
sudo virt-install \
--name openstack_demo \
--ram 8096 \
--vcpus 8 \
--file /images/kvm/openstack_demo.img \
--file-size 20 \
--cdrom /images/iso/ubuntu-20.04-legacy-server-amd64.iso
4, 一定要在VM关闭的情况下修改VM的拓扑为NUMA拓扑。
sudo virsh destroy openstack_demo
sudo virsh edit openstack_demo
<cpu mode='host-passthrough'>
<numa>
<cell id='0' cpus='0-3' memory='4096000'/>
<cell id='1' cpus='4-5' memory='2048000'/>
<cell id='2' cpus='6-7' memory='2048000'/>
</numa>
</cpu>
sudo virsh start openstack_demo
5, VM内打开大页支持,因为openstack numa需要大页支持。
hua@demo:~$ cat /etc/default/grub |grep GRUB_CMDLINE_LINUX
GRUB_CMDLINE_LINUX_DEFAULT=""
GRUB_CMDLINE_LINUX="transparent_hugepage=never hugepagesz=2M hugepages=512 default_hugepagesz=2M"
# add also add isolcpus=0,1,2,3
cat << EOF | sudo tee -a /etc/fstab
nodev /mnt/huge hugetlbfs pagesize=2MB 0 0
EOF
sudo update-grub
sudo mkdir -p /mnt/huge
sudo reboot
#hua@demo:~$ sudo sysctl -w vm.nr_hugepages=512
#vm.nr_hugepages = 512
6, 这时我们看到(或使用lstopo命令):
ubuntu@demo:~$ numactl --hardware
available: 3 nodes (0-2)
node 0 cpus: 0 1 2 3
node 0 size: 3846 MB
node 0 free: 3115 MB
node 1 cpus: 4 5
node 1 size: 1969 MB
node 1 free: 1542 MB
node 2 cpus: 6 7
node 2 size: 1966 MB
node 2 free: 1466 MB
node distances:
node 0 1 2
0: 10 20 20
1: 20 10 20
2: 20 20 10
ubuntu@demo:~$ grep Hugepagesize /proc/meminfo
Hugepagesize: 2048 kB
ubuntu@demo:~$ cat /proc/meminfo | grep Huge
AnonHugePages: 0 kB
ShmemHugePages: 0 kB
FileHugePages: 0 kB
HugePages_Total: 512
HugePages_Free: 512
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
Hugetlb: 1048576 kB
ubuntu@demo:~$ sudo cat /sys/devices/system/node/node*/meminfo | grep -i huge
Node 0 AnonHugePages: 0 kB
Node 0 ShmemHugePages: 0 kB
Node 0 FileHugePages: 0 kB
Node 0 HugePages_Total: 171
Node 0 HugePages_Free: 171
Node 0 HugePages_Surp: 0
Node 1 AnonHugePages: 0 kB
Node 1 ShmemHugePages: 0 kB
Node 1 FileHugePages: 0 kB
Node 1 HugePages_Total: 171
Node 1 HugePages_Free: 171
Node 1 HugePages_Surp: 0
Node 2 AnonHugePages: 0 kB
Node 2 ShmemHugePages: 0 kB
Node 2 FileHugePages: 0 kB
Node 2 HugePages_Total: 170
Node 2 HugePages_Free: 170
Node 2 HugePages_Surp: 0
安装OpenStack
还需要配置pypi,详见:开发用的devstack (by quqi99)_quqi99的博客-CSDN博客_devstack docker
git clone git://github.com/openstack-dev/devstack.git
cd devstack
cat << EOF | sudo tee local.conf
[[local|localrc]]
#OFFLINE=True
GIT_BASE=http://git.trystack.cn
NOVNC_REPO=http://git.trystack.cn/kanaka/noVNC.git
SPICE_REPO=http://git.trystack.cn/git/spice/sice-html5.git
DEST=/home/ubuntu/openstack
DATA_DIR=\$DEST/data
SERVICE_DIR=\$DEST/status
DOWNLOAD_DEFAULT_IMAGES=False
IMAGE_URLS="http://download.cirros-cloud.net/0.5.1/cirros-0.5.1-x86_64-disk.img"
LOGFILE=\$DATA_DIR/logs/stack.log
VERBOSE=True
disable_service n-net
enable_service neutron q-svc q-dhcp q-l3 q-meta q-agt
MYSQL_PASSWORD=password
DATABASE_PASSWORD=password
SERVICE_TOKEN=password
SERVICE_PASSWORD=password
ADMIN_PASSWORD=password
RABBIT_PASSWORD=password
[[post-config|$NOVA_CONF]]
[DEFAULT]
firewall_driver=nova.virt.firewall.NoopFirewallDriver
[filter_scheduler]
enabled_filters=RamFilter,ComputeFilter,AvailabilityZoneFilter,ComputeCapabilitiesFilter,ImagePropertiesFilter,PciPassthroughFilter,NUMATopologyFilter
EOF
FORCE=yes ./stack.sh
sudo systemctl enable mysql.service
sudo systemctl enable rabbitmq-server.service
sudo systemctl enable apache2.service
. openrc admin
VM内安装配置OpenStack
1, VM内使用devstack安装OpenStack,略。但记得添加AggregateInstanceExtraSpecsFilter, NUMATopologyFilter到scheduler_default_filters (/etc/nova/nova.conf)中并重启nova-schedule进程。
2, 配置aggregate
nova aggregate-create hpgs-aggr
nova aggregate-set-metadata hpgs-aggr hpgs=true
nova aggregate-create normal-aggr
nova aggregate-set-metadata normal-aggr hpgs=false
#Add one or more hosts to them:
nova aggregate-add-host hpgs-aggr demo
hua@demo:~$ nova aggregate-show hpgs-aggr
+----+-----------+-------------------+--------+-------------+
| Id | Name | Availability Zone | Hosts | Metadata |
+----+-----------+-------------------+--------+-------------+
| 1 | hpgs-aggr | - | 'demo' | 'hpgs=true' |
+----+-----------+-------------------+--------+-------------+
3, 配置flavor
nova flavor-create --ephemeral 0 --swap 0 --rxtx-factor 1.0 --is-public True m1.numa2nodes b8c065ce-90c2-41f9-8d50-1d47a040b494 256 1 2
nova flavor-key m1.numa2nodes set aggregate_instance_extra_specs:hpgs=true #使用aggregate
nova flavor-key m1.numa2nodes set hw:mem_page_size=2048 #配置大页支持
#nova flavor-key m1.numa2nodes set hw:cpu_policy='dedicated' hw:cpu_thread_policy='isolate'
#nova flavor-key m1.numa2nodes set hw:emulator_threads_policy='isolate' hypervisor_type='QEMU'
nova flavor-key m1.numa2nodes set hw:numa_nodes=2 #配置numa_nodes,若不配置下列两行就是自动模型,配置下面两行为手动模式
nova flavor-key m1.numa2nodes set hw:numa_mem.0=128 hw:numa_mem.1=128 hw:numa_mempolicy=strict
nova flavor-key m1.numa2nodes set hw:numa_cpus.0=0 hw:numa_cpus.1=1 hw:cpu_policy=dedicated
hua@demo:/bak/openstack/nova$ nova flavor-show m1.numa2nodes |grep extra_specs
| extra_specs | {"hw:cpu_policy": "dedicated", "hw:mem_page_size": "2048", "hw:numa_mempolicy": "strict", "hw:numa_mem.1": "128", "hw:numa_mem.0": "128", "hw:numa_nodes": "2", "aggregate_instance_extra_specs:hpgs": "true", "hw:numa_cpus.0": "0", "hw:numa_cpus.1": "1"} |
4, 创建虚机
NET_ID=$(neutron net-list |grep 'private' |awk '{print $2}')
nova boot --image cirros-0.5.1-x86_64-disk --flavor m1.numa2nodes --nic net-id=$NET_ID i1
5, 创建虚机后的大页情况,可以看到在node0与node1上已经各用了171-107=64块大页
hua@demo:/bak/openstack/nova$ sudo cat /sys/devices/system/node/node*/meminfo | grep -i huge
Node 0 AnonHugePages: 0 kB
Node 0 HugePages_Total: 171
Node 0 HugePages_Free: 107
Node 0 HugePages_Surp: 0
Node 1 AnonHugePages: 0 kB
Node 1 HugePages_Total: 171
Node 1 HugePages_Free: 107
Node 1 HugePages_Surp: 0
Node 2 AnonHugePages: 0 kB
Node 2 HugePages_Total: 170
Node 2 HugePages_Free: 170
Node 2 HugePages_Surp: 0
6, DB情况
mysql> select numa_topology from instance_extra;
{
"nova_object.version": "1.2",
"nova_object.changes": [
"cells"
],
"nova_object.name": "InstanceNUMATopology",
"nova_object.data": {
"cells": [
{
"nova_object.version": "1.3",
"nova_object.changes": [
"cpu_topology",
"pagesize",
"cpuset",
"cpu_policy",
"memory",
"cpu_pinning_raw",
"id",
"cpu_thread_policy"
],
"nova_object.name": "InstanceNUMACell",
"nova_object.data": {
"pagesize": 2048,
"cpu_topology": {
"nova_object.version": "1.0",
"nova_object.changes": [
"cores",
"threads",
"sockets"
],
"nova_object.name": "VirtCPUTopology",
"nova_object.data": {
"cores": 1,
"threads": 1,
"sockets": 1
},
"nova_object.namespace": "nova"
},
"cpuset": [
0
],
"cpu_policy": "dedicated",
"memory": 128,
"cpu_pinning_raw": {
"0": 0
},
"id": 0,
"cpu_thread_policy": null
},
"nova_object.namespace": "nova"
},
{
"nova_object.version": "1.3",
"nova_object.changes": [
"cpu_topology",
"pagesize",
"cpuset",
"cpu_policy",
"memory",
"cpu_pinning_raw",
"id",
"cpu_thread_policy"
],
"nova_object.name": "InstanceNUMACell",
"nova_object.data": {
"pagesize": 2048,
"cpu_topology": {
"nova_object.version": "1.0",
"nova_object.changes": [
"cores",
"threads",
"sockets"
],
"nova_object.name": "VirtCPUTopology",
"nova_object.data": {
"cores": 1,
"threads": 1,
"sockets": 1
},
"nova_object.namespace": "nova"
},
"cpuset": [
1
],
"cpu_policy": "dedicated",
"memory": 128,
"cpu_pinning_raw": {
"1": 4
},
"id": 1,
"cpu_thread_policy": null
},
"nova_object.namespace": "nova"
}
]
},
"nova_object.namespace": "nova"
}
其他有用信息:
export MYSQL_PASSWORD=ChangeMe123
juju run --application mysql "mysql -u root -p$MYSQL_PASSWORD -e \"select * from nova.instances where host='juju-38b529-ovn-6.cloud.sts'\G\""
juju run --application mysql "mysql -u root -p$MYSQL_PASSWORD -e \"select * from nova.instance_extra where instance_uuid='0761d1de-7acd-4781-ae8b-f5ba864ab6ec'\G\""
juju run --application mysql "mysql -u root -p$MYSQL_PASSWORD -e \"select * from nova.compute_nodes where hypervisor_hostname='p2-bits-cloud-xxx.maas' or hypervisor_hostname='p2-bits-cloud-xxx.maas'\G\""
juju run --application mysql "mysql -u root -p$MYSQL_PASSWORD -e \"select * from nova_api.request_specs where instance_uuid='0761d1de-7acd-4781-ae8b-f5ba864ab6ec'\G\""
生成的虚机配置文件
<domain type='kvm' id='1'>
<name>instance-00000001</name>
<uuid>3bddd8f6-ee18-4b44-a9e3-86ba2c6c25cc</uuid>
<metadata>
<nova:instance xmlns:nova="http://openstack.org/xmlns/libvirt/nova/1.0">
<nova:package version="14.0.0"/>
<nova:name>i1</nova:name>
<nova:creationTime>2016-07-22 03:46:03</nova:creationTime>
<nova:flavor name="m1.numa2nodes">
<nova:memory>256</nova:memory>
<nova:disk>1</nova:disk>
<nova:swap>0</nova:swap>
<nova:ephemeral>0</nova:ephemeral>
<nova:vcpus>2</nova:vcpus>
</nova:flavor>
<nova:owner>
<nova:user uuid="e6001bf33a174ffb9d3ad3e9ff47d059">admin</nova:user>
<nova:project uuid="f5a578104510494da0ecae0fb514a6f1">demo</nova:project>
</nova:owner>
<nova:root type="image" uuid="d8184047-f7b3-4622-83e1-fb9d7ede807f"/>
</nova:instance>
</metadata>
<memory unit='KiB'>262144</memory>
<currentMemory unit='KiB'>262144</currentMemory>
<memoryBacking>
<hugepages>
<page size='2048' unit='KiB' nodeset='0'/>
<page size='2048' unit='KiB' nodeset='1'/>
</hugepages>
</memoryBacking>
<vcpu placement='static'>2</vcpu>
<cputune>
<shares>2048</shares>
<vcpupin vcpu='0' cpuset='0'/>
<vcpupin vcpu='1' cpuset='4'/>
<emulatorpin cpuset='0,4'/>
</cputune>
<numatune>
<memory mode='strict' nodeset='0-1'/>
<memnode cellid='0' mode='strict' nodeset='0'/>
<memnode cellid='1' mode='strict' nodeset='1'/>
</numatune>
<resource>
<partition>/machine</partition>
</resource>
<sysinfo type='smbios'>
<system>
<entry name='manufacturer'>OpenStack Foundation</entry>
<entry name='product'>OpenStack Nova</entry>
<entry name='version'>14.0.0</entry>
<entry name='serial'>4d63962c-1942-0164-e7a7-fa97578f4e3a</entry>
<entry name='uuid'>3bddd8f6-ee18-4b44-a9e3-86ba2c6c25cc</entry>
<entry name='family'>Virtual Machine</entry>
</system>
</sysinfo>
<os>
<type arch='x86_64' machine='pc-i440fx-wily'>hvm</type>
<kernel>/opt/stack/data/nova/instances/3bddd8f6-ee18-4b44-a9e3-86ba2c6c25cc/kernel</kernel>
<initrd>/opt/stack/data/nova/instances/3bddd8f6-ee18-4b44-a9e3-86ba2c6c25cc/ramdisk</initrd>
<cmdline>root=/dev/vda console=tty0 console=ttyS0</cmdline>
<boot dev='hd'/>
<smbios mode='sysinfo'/>
</os>
<features>
<acpi/>
<apic/>
</features>
<cpu>
<topology sockets='2' cores='1' threads='1'/>
<numa>
<cell id='0' cpus='0' memory='131072' unit='KiB' memAccess='shared'/>
<cell id='1' cpus='1' memory='131072' unit='KiB' memAccess='shared'/>
</numa>
</cpu>
<clock offset='utc'>
<timer name='pit' tickpolicy='delay'/>
<timer name='rtc' tickpolicy='catchup'/>
<timer name='hpet' present='no'/>
</clock>
<on_poweroff>destroy</on_poweroff>
<on_reboot>restart</on_reboot>
<on_crash>destroy</on_crash>
<devices>
<emulator>/usr/bin/kvm-spice</emulator>
<disk type='file' device='disk'>
<driver name='qemu' type='qcow2' cache='none'/>
<source file='/opt/stack/data/nova/instances/3bddd8f6-ee18-4b44-a9e3-86ba2c6c25cc/disk'/>
<backingStore type='file' index='1'>
<format type='raw'/>
<source file='/opt/stack/data/nova/instances/_base/5b06ec6b6abd700935b24a454e8ce3461d050a9f'/>
<backingStore/>
</backingStore>
<target dev='vda' bus='virtio'/>
<alias name='virtio-disk0'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
</disk>
<disk type='file' device='cdrom'>
<driver name='qemu' type='raw' cache='none'/>
<source file='/opt/stack/data/nova/instances/3bddd8f6-ee18-4b44-a9e3-86ba2c6c25cc/disk.config'/>
<backingStore/>
<target dev='hdd' bus='ide'/>
<readonly/>
<alias name='ide0-1-1'/>
<address type='drive' controller='0' bus='1' target='0' unit='1'/>
</disk>
<controller type='usb' index='0'>
<alias name='usb'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/>
</controller>
<controller type='pci' index='0' model='pci-root'>
<alias name='pci.0'/>
</controller>
<controller type='ide' index='0'>
<alias name='ide'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/>
</controller>
<interface type='bridge'>
<mac address='fa:16:3e:0b:52:0b'/>
<source bridge='qbref7fef26-3e'/>
<target dev='tapef7fef26-3e'/>
<model type='virtio'/>
<alias name='net0'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
</interface>
<serial type='file'>
<source path='/opt/stack/data/nova/instances/3bddd8f6-ee18-4b44-a9e3-86ba2c6c25cc/console.log'/>
<target port='0'/>
<alias name='serial0'/>
</serial>
<serial type='pty'>
<source path='/dev/pts/20'/>
<target port='1'/>
<alias name='serial1'/>
</serial>
<console type='file'>
<source path='/opt/stack/data/nova/instances/3bddd8f6-ee18-4b44-a9e3-86ba2c6c25cc/console.log'/>
<target type='serial' port='0'/>
<alias name='serial0'/>
</console>
<memballoon model='virtio'>
<stats period='10'/>
<alias name='balloon0'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
</memballoon>
</devices>
<seclabel type='dynamic' model='apparmor' relabel='yes'>
<label>libvirt-3bddd8f6-ee18-4b44-a9e3-86ba2c6c25cc</label>
<imagelabel>libvirt-3bddd8f6-ee18-4b44-a9e3-86ba2c6c25cc</imagelabel>
</seclabel>
</domain>
20201208更新
1, create a VM with 3 numa nodes according to the doc [1]
<cpu mode='host-passthrough'>
<numa>
<cell id='0' cpus='0-3' memory='4096000'/>
<cell id='1' cpus='4-5' memory='2048000'/>
<cell id='2' cpus='6-7' memory='2048000'/>
</numa>
</cpu>
2, enable huge page, and use 'isolcpus=0,1,2,3' to just use cpu in numa0
注意:grub里定义isolcpus并不会让nova不使用这些cpu, nova里专门有vcpu_pin_set来做这件事。
$ cat /proc/cmdline |grep isolcpu
BOOT_IMAGE=/boot/vmlinuz-5.4.0-26-generic root=/dev/mapper/vgdemo-root ro transparent_hugepage=never hugepagesz=2M hugepages=512 default_hugepagesz=2M isolcpus=0,1,2,3
3, set up openstack test env.
4, create test aggreate and flavor and test vm
nova aggregate-create hpgs-aggr
nova aggregate-set-metadata hpgs-aggr hpgs=true
nova aggregate-create normal-aggr
nova aggregate-set-metadata normal-aggr hpgs=false
nova aggregate-add-host hpgs-aggr demo
nova aggregate-show hpgs-aggr
nova flavor-create --ephemeral 0 --swap 0 --rxtx-factor 1.0 --is-public True m1.numa2nodes b8c065ce-90c2-41f9-8d50-1d47a040b494 256 1 2
nova flavor-key m1.numa2nodes set aggregate_instance_extra_specs:hpgs=true
nova flavor-key m1.numa2nodes set hw:mem_page_size=2048
nova flavor-key m1.numa2nodes set hw:cpu_policy='dedicated' hw:cpu_thread_policy='isolate'
nova flavor-key m1.numa2nodes set hw:emulator_threads_policy='isolate' hypervisor_type='QEMU'
nova boot --image cirros-0.5.1-x86_64-disk --flavor m1.numa2nodes --nic net-id=$NET_ID i1
3, This instance i1 uses cpu 0 and 1, and use cpu 2 as emulatorpin
$ ps -eLo psr,args |grep qemu- |grep -v 'grep' |awk '{print $1}' |sort -n |uniq
0
1
2
$ sudo virsh emulatorpin instance-00000001
emulator: CPU Affinity
----------------------------------
*: 2
$ sudo virsh vcpuinfo instance-00000001
VCPU: 0
CPU: 0
State: running
CPU time: 21.2s
CPU Affinity: y-------
VCPU: 1
CPU: 1
State: running
CPU time: 14.9s
CPU Affinity: -y------
$ taskset -apc `pidof qemu-system-x86_64`
pid 15014's current affinity list: 2
pid 15016's current affinity list: 2
pid 15022's current affinity list: 2
pid 15023's current affinity list: 0
pid 15025's current affinity list: 1
pid 15039's current affinity list: 2
sudo virsh dumpxml instance-00000001
...
<vcpu placement='static'>2</vcpu>
<cputune>
<shares>2048</shares>
<vcpupin vcpu='0' cpuset='0'/>
<vcpupin vcpu='1' cpuset='1'/>
<emulatorpin cpuset='2'/>
</cputune>
<cpu mode='custom' match='exact' check='full'>
<model fallback='forbid'>qemu64</model>
<topology sockets='2' cores='1' threads='1'/>
<feature policy='require' name='x2apic'/>
<feature policy='require' name='hypervisor'/>
<feature policy='require' name='lahf_lm'/>
<feature policy='disable' name='svm'/>
<numa>
<cell id='0' cpus='0-1' memory='262144' unit='KiB' memAccess='shared'/>
</numa>
</cpu>
4, host's numa_toplogy
select numa_topology from nova_cell1.compute_nodes where host='demo' \G;
see https://paste.ubuntu.com/p/MjmMHZxS6s/
5, instance's numa_toplogy
select numa_topology from instance_extra where instance_uuid in (select uuid from instances where host='demo') \G;
see https://paste.ubuntu.com/p/CP2t2ghCfH/
20220519
引入nova placement之后对调度的影响(by quqi99)_quqi99的博客-CSDN博客
OpenStack对NUMA的支持情况(by quqi99)_quqi99的博客-CSDN博客
参考
[1] http://docs.openstack.org/developer/devstack/guides/devstack-with-nested-kvm.html
[2] Testing NUMA related hardware setup with libvirt — nova 25.1.0.dev110 documentation