为什么nova计算节点上报的剩余磁盘空间为负数?

<span style="font-family: Tahoma; text-align: -webkit-auto; background-color: rgb(255, 255, 255);">注:本文针对Kilo版本。</span>

在使用openstack时,遇到了计算节点上报的可用磁盘空间为负数的情况,这里通过代码走读来一窥究竟。

在计算节点上运行的nova-compute服务中,由一个周期任务update_available_resource来负责资源统计和上报:
    @<strong>periodic_task</strong>.<strong>periodic_task</strong>
    def <span style="color:#000066;">update_available_resource</span>(self, context):
        """See driver.get_available_resource()

        Periodic process that keeps that the compute host's understanding of
        resource availability and usage in sync with the underlying hypervisor.

        :param context: security context
        """


这个函数中,调用的是ResourceTracker的接口获取可用资源:
            rt = self._get_resource_tracker(nodename)
            rt.<strong>update_available_resource</strong>(context)

ResourceTracker又是实际调用libvirt driver来进行资源统计信息的获取:
    def <strong>update_available_resource</strong>(self, context):
        """Override in-memory calculations of compute node resource usage based
        on data audited from the hypervisor layer.

        Add in resource claims in progress to account for operations that have
        declared a need for resources, but not necessarily retrieved them from
        the hypervisor layer yet.
        """
        LOG.info(_LI("Auditing locally available compute resources for "
                     "node %(node)s"),
                 {'node': self.nodename})
        resources = <strong>self.driver.get_available_resource</strong>(self.nodename)

这个获取资源统计信息的函数定义在virt\libvirt\driver.py中:
    def <strong>get_available_resource</strong>(self, nodename):
        """Retrieve resource information.

        This method is called when nova-compute launches, and
        as part of a periodic task that records the results in the DB.

        :param nodename: will be put in PCI device
        :returns: dictionary containing resource info
        """

        disk_info_dict = self._get_local_gb_info()
        data = {}

        # NOTE(dprince): calling capabilities before getVersion works around
        # an initialization issue with some versions of Libvirt (1.0.5.5).
        # See: https://bugzilla.redhat.com/show_bug.cgi?id=1000116
        # See: https://bugs.launchpad.net/nova/+bug/1215593

        # Temporary convert supported_instances into a string, while keeping
        # the RPC version as JSON. Can be changed when RPC broadcast is removed
        data["supported_instances"] = jsonutils.dumps(
            self._get_instance_capabilities())

        data["vcpus"] = self._get_vcpu_total()
        data["memory_mb"] = self._get_memory_mb_total()
        data["local_gb"] = disk_info_dict['total']
        data["vcpus_used"] = self._get_vcpu_used()
        data["memory_mb_used"] = self._get_memory_mb_used()
        data["local_gb_used"] = disk_info_dict['used']
        data["hypervisor_type"] = self._host.get_driver_type()
        data["hypervisor_version"] = self._host.get_version()
        data["hypervisor_hostname"] = self._host.get_hostname()
        # TODO(berrange): why do we bother converting the
        # libvirt capabilities XML into a special JSON format ?
        # The data format is different across all the drivers
        # so we could just return the raw capabilities XML
        # which 'compare_cpu' could use directly
        #
        # That said, arch_filter.py now seems to rely on
        # the libvirt drivers format which suggests this
        # data format needs to be standardized across drivers
        data["cpu_info"] = jsonutils.dumps(self._get_cpu_info())

        disk_free_gb = disk_info_dict['free']
        disk_over_committed = self._get_disk_over_committed_size_total()
        available_least = disk_free_gb * units.Gi - disk_over_committed
        data['disk_available_least'] = available_least / units.Gi

        data['pci_passthrough_devices'] = \
            self._get_pci_passthrough_devices()

        numa_topology = self._get_host_numa_topology()
        if numa_topology:
            data['numa_topology'] = numa_topology._to_json()
        else:
            data['numa_topology'] = None

        return data

看一下跟磁盘资源相关的部分,首先是调用了libvirt driver的这个静态函数,得到total/free/used三个值,以gigabytes为单位:
    @staticmethod
    def get_local_gb_info():
        """Get local storage info of the compute node in GB.

        :returns: A dict containing:
             :total: How big the overall usable filesystem is (in gigabytes)
             :free: How much space is free (in gigabytes)
             :used: How much space is used (in gigabytes)
        """

        if CONF.libvirt.images_type == 'lvm':
            info = libvirt_utils.get_volume_group_info(CONF.libvirt.images_volume_group)
        else:
            info = libvirt_utils.get_fs_info(CONF.instances_path)

        for (k, v) in info.iteritems():
            info[k] = v / units.Gi  //注意:这里把结果的单位都换算成了GB!

        return info

get_local_gb_info 这个函数中可以看到,如果存放instances用的是文件系统而非lvm,则调用下面的函数获取资源数据:
def get_fs_info(path):
    """Get free/used/total space info for a filesystem

    :param path: Any dirent on the filesystem
    :returns: A dict containing:

             :free: How much space is free (in bytes)
             :used: How much space is used (in bytes)
             :total: How big the filesystem is (in bytes)
    """
    hddinfo = os.statvfs(path)
    total = hddinfo.f_frsize * hddinfo.f_blocks
    free = hddinfo.f_frsize * hddinfo.f_bavail
    used = hddinfo.f_frsize * (hddinfo.f_blocks - hddinfo.f_bfree)
    return {'total': total,
            'free': free,
            'used': used}
get_fs_info 这个函数获取到的信息和用df命令看到的结果基本是一样的:
[root@host123 ~]# python
Python 2.7.5 (default, Feb 11 2014, 07:46:25)
[GCC 4.8.2 20140120 (Red Hat 4.8.2-13)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> hddinfo = os.statvfs("/var/lib/nova")
>>> total = hddinfo.f_frsize * hddinfo.f_blocks
>>> free = hddinfo.f_frsize * hddinfo.f_bavail
>>> used = hddinfo.f_frsize * (hddinfo.f_blocks - hddinfo.f_bfree)
>>>
>>> print total/1024/1024/1024
254
>>> print free/1024/1024/1024
194
>>> print used/1024/1024/1024
46

[root@host123 ~]# df -h
Filesystem                   Size  Used Avail Use% Mounted on
/dev/mapper/vg_sys-lv_root    20G  3.6G   16G  20% /
devtmpfs                      11G     0   11G   0% /dev
tmpfs                         12G     0   12G   0% /dev/shm
tmpfs                         12G   83M   12G   1% /run
tmpfs                         12G     0   12G   0% /sys/fs/cgroup
/dev/sda1                    380M   96M  260M  27% /boot
/dev/mapper/vg_nova-lv_nova   255G   47G  195G   20% /var/lib/nova


update_status直接利用了获取到的total和used数据项,但是注意free却没有直接使用,而是计算成了disk_available_least:
        <strong>disk_free_gb </strong>= disk_info_dict['free']
        <strong>disk_over_committed </strong>= self.<strong>_get_disk_over_committed_size_total</strong>()
        <strong>available_least </strong>= <strong>disk_free_gb </strong>* units.Gi - <strong>disk_over_committed</strong>
        data['<strong>disk_available_least</strong>'] = available_least / units.Gi

可以看到,它从操作系统给的 disk_free_gb   里面又减去了 disk_over_committed 的值。
我们来看看 get_disk_over_committed_size_total 是怎么获取的,这个函数也是libvirt driver的成员:

      def _get_disk_over_committed_size_total(self):
        """Return total over committed disk size for all instances."""
        # Disk size that all instance uses : virtual_size - disk_size
        disk_over_committed_size = 0
        for dom in self._host.list_instance_domains():
            try:
                xml = dom.XMLDesc(0)
                disk_infos = jsonutils.loads(
                        self._get_instance_disk_info(dom.name(), xml))
                for info in disk_infos:
                    disk_over_committed_size += int(
                        info['over_committed_disk_size'])
            except ……(此处略过)
            # NOTE(gtt116): give other tasks a chance.
            greenthread.sleep(0)
        return disk_over_committed_size
它是逐个获取每个instance的 over_committed_disk_size,然后把它们累加起来。
意思是有的instance已经在超额使用磁盘了,那么超额在哪里呢?

对于每一个instance,是通过下面的函数获取 over_committed_disk_size的:
    def _get_instance_disk_info(self, instance_name, xml,
                                block_device_info=None):
        block_device_mapping = driver.block_device_info_get_mapping(
            block_device_info)

        volume_devices = set()
        for vol in block_device_mapping:
            disk_dev = vol['mount_device'].rpartition("/")[2]
            volume_devices.add(disk_dev)

        disk_info = []
        doc = etree.fromstring(xml)
        disk_nodes = doc.findall('.//devices/disk')
        path_nodes = doc.findall('.//devices/disk/source')
        driver_nodes = doc.findall('.//devices/disk/driver')
        target_nodes = doc.findall('.//devices/disk/target')

        for cnt, path_node in enumerate(path_nodes):
            disk_type = disk_nodes[cnt].get('type')
            path = path_node.get('file') or path_node.get('dev')
            target = target_nodes[cnt].attrib['dev']

            if not path:
                LOG.debug('skipping disk for %s as it does not have a path',
                          instance_name)
                continue

            if disk_type not in ['file', 'block']:
                LOG.debug('skipping disk because it looks like a volume', path)
                continue

            if target in volume_devices:
                LOG.debug('skipping disk %(path)s (%(target)s) as it is a '
                          'volume', {'path': path, 'target': target})
                continue

            # get the real disk size or
            # raise a localized error if image is unavailable
<strong>            if disk_type == 'file':
                dk_size = int(os.path.getsize(path))
            elif disk_type == 'block':
                dk_size = lvm.get_volume_size(path)

            disk_type = driver_nodes[cnt].get('type')
            if disk_type == "qcow2":
                backing_file = libvirt_utils.get_disk_backing_file(path)
                virt_size = disk.get_disk_size(path)
                over_commit_size = int(virt_size) - dk_size
            else:
                backing_file = ""
                virt_size = dk_size
                over_commit_size = 0</strong>

            disk_info.append({'type': disk_type,
                              'path': path,
                              'virt_disk_size': virt_size,
                              'backing_file': backing_file,
                              'disk_size': dk_size,
                              'over_committed_disk_size': over_commit_size})
        return jsonutils.dumps(disk_info)

举个例子,对于qcow2格式的镜像,这个overcommit size等于virt_size减去dk_size:

[root@host123 ~]# ll -h /var/lib/nova/instances/109291c0-0bf0-412c-9e87-6ab01e16bc06/disk
-rw-r--r-- 1 root root  5.0G Feb 25 11:41 /var/lib/nova/instances/109291c0-0bf0-412c-9e87-6ab01e16bc06/disk

镜像文件实际大小dk_size是5.0G。我们再用qemu-img命令查看一下 qcow2的详细信息:

[root@host123 ~]#  qemu-img info /var/lib/nova/instances/109291c0-0bf0-412c-9e87-6ab01e16bc06/disk
image:  /var/lib/nova/instances/109291c0-0bf0-412c-9e87-6ab01e16bc06/disk
file format: qcow2
virtual size: 20G (21474836480 bytes)
disk size: 4.9G

cluster_size: 65536
backing file: /var/lib/nova/instances/_base/afd631de55a9b7026775a4a1ada098a9ae6888c7
Format specific information:
    compat: 0.10

这里的virtual size减去disk size,便是over_commit_size。

可以看到,这里仅仅对qcow2格式的镜像做了overcommit处理,其它文件的 over_commit_size等于0。

我们知道,在nova调度服务的DiskFilter里面,用到了disk_allocation_ratio对磁盘资源做了超分,它和这里的overcommit不是一个概念,它是从控制节点角度看到的超额使用,而计算节点看不到,overcommit是计算节点看到了磁盘qcow2压缩格式之后所得到的结果,它最终上报的剩余空间是扣除了假设qcow2镜像文件解压之后的实际结果。所以会遇到实际上报的剩余空间小于肉眼看到的空间大小。

如果管理员部署时指定了计算节点,则不走调度流程,就会把虚拟机硬塞给该计算节点,强行占用了已经归入超额分配计划的空间,则最终可能导致计算节点上报的磁盘资源为负数。并且将来随着虚拟机实际占用的磁盘空间越来越大,最终可能就导致计算节点硬盘空间不足了。




  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值