Transforming Cumulative Ceilometer Stats to Gauges

最新推荐文章于 2024-08-27 15:04:31 发布

S1234567_89

最新推荐文章于 2024-08-27 15:04:31 发布

阅读量471

点赞数

I’ve been dabbling a bit more with OpenStack as of late. If you know me, you can likely guess my goal is how to ingest logs, monitor resources, etc.

I’ve been dabbling a bit more with OpenStack as of late. If you know me, you can likely guess my goal is how to ingest logs, monitor resources, etc.
I’ve been trying to see how well Ceilometer, one of the core components of OpenStack that actually provides some of this stuff, would work. Initially, I was a bit bummed, but after fumbling around for a while, I am starting to see the light.

I’ve been dabbling a bit more with OpenStack as of late. If you know me, you can likely guess my goal is how to ingest logs, monitor resources, etc.
You see, the reason I almost abandoned the idea of using Ceilometer was due to the fact that some of the “meters” it provides are, well, a bit nonsensical (IMHO).

I’ve been dabbling a bit more with OpenStack as of late. If you know me, you can likely guess my goal is how to ingest logs, monitor resources, etc.

For example, there’s network.outgoing.bytes, which is what you would expect… sort of. Turns out, this is a “cumulative” meter. In other words, this meter tells me thetotal number of bytes sent out a given Instance’s virtual NIC. Ceilometer has the following meter types to measure :

Cumulative: Increasing over time (e.g.: instance hours)

I’ve been dabbling a bit more with OpenStack as of late. If you know me, you can likely guess my goal is how to ingest logs, monitor resources, etc.

Gauge: Discrete items (e.g.: floating IPs, image uploads) and fluctuating values (e.g.: disk I/O)
Delta: Changing over time (bandwidth)

Maybe I am naive, but seems quite a bit more helpful to track this as a value based on a given period… you know, so I can get a hint of how much bandwidth a given instance is using. In Ceilometer parlance, this would be a delta metric.

I’ve been dabbling a bit more with OpenStack as of late. If you know me, you can likely guess my goal is how to ingest logs, monitor resources, etc.
I’ll take this aside and defend the fine folks working on Celiometer on this one. Ceilometer was built initially to generate non-repudiable bulling info. Technically, AFAIK, that is the project’s only goal – though it has morphed to gain things like an alarm framework.

I’ve been dabbling a bit more with OpenStack as of late. If you know me, you can likely guess my goal is how to ingest logs, monitor resources, etc.

So, now you can see why network.outgoing.bytes would be cumulative: so you can bill a customer for serving up torrents, etc.

Anyway, I can’t imagine that I’m the only person looking to get Delta metrics out of a Cumulative one, so I thought I’d document my way of getting there. Ultimately there might be a better way, YMMV, caveat emptor, covering my backside, yadda yadda.

I’ve been dabbling a bit more with OpenStack as of late. If you know me, you can likely guess my goal is how to ingest logs, monitor resources, etc.

Transformers to the Rescue!

… no, not that kind of Transformer. Lemme ‘splain. No, there is too much. Let me sum up.

Actually, before we start diving in, let’s take a quick tour of Ceilometer’s workflow.

The general steps to the Ceilometer workflow are:

Collect -> (Optionally) Transform -> Publish -> Store -> Read

I’ve been dabbling a bit more with OpenStack as of late. If you know me, you can likely guess my goal is how to ingest logs, monitor resources, etc.

Collect

There are two methods of collection:

Services (Nova, Neutron, Cinder, Glance, Swift) push data into AMQP and Ceilometer slurps them down
Agents/Notification Handlers poll APIs of the services

This is where our meter data comes from.

I’ve been dabbling a bit more with OpenStack as of late. If you know me, you can likely guess my goal is how to ingest logs, monitor resources, etc.

Transform/Publish

This is the focus of this post. Transforms are done via the “Pipeline.”

The flow for the Pipeline is:

Meter -> Transformer -> Publisher -> Receiver

Meter: Data/Event being collected
Transformer: (Optional) Take meters and output new data based on those meters
Publisher: How you want to push the data out of Ceilometer
- To my knowledge, there are only two options:
  - RPC
  - UDP (using msgpack)
Receiver: This is the system outside Ceilometer that will receive what the Publisher sends (Logstash for me, at least at the present – will likely move to StatsD + Graphite later on)

I’ve been dabbling a bit more with OpenStack as of late. If you know me, you can likely guess my goal is how to ingest logs, monitor resources, etc.

Store

While tangental to this post, I won’t leave you wondering about the “Store” part of the pipeline. Here are the storage options:
* Default: Embedded MongoDB
* Optional:
* SQL
* HBase
* DB2

I’ve been dabbling a bit more with OpenStack as of late. If you know me, you can likely guess my goal is how to ingest logs, monitor resources, etc.

Honorable Mention: Ceilometer APIs

Like pretty much everything else in OpenStack, Ceilometer has a suite of OpenAPIsthat can also be used to fetch metering data. I initially considered this route, but in the interest of efficiency (read: laziness), I opted to use the Publisher vs rolling my own code to call the APIs.

I’ve been dabbling a bit more with OpenStack as of late. If you know me, you can likely guess my goal is how to ingest logs, monitor resources, etc.

Working the Pipeline

There are two Transformers (at least that I see in the source):

Scaling
Rate of Change

I’ve been dabbling a bit more with OpenStack as of late. If you know me, you can likely guess my goal is how to ingest logs, monitor resources, etc.

In our case, we are interested in the latter, as it will give us the delta between two samples.

To change/use a given Transformer, we need to create a new pipeline via /etc/ceilometer/pipeline.yaml

Here is the default pipeline.yaml:

I’ve been dabbling a bit more with OpenStack as of late. If you know me, you can likely guess my goal is how to ingest logs, monitor resources, etc.

---
 -
     name: meter_pipeline
     interval: 600
     meters:
         - "*"
     transformers:
     publishers:
         - rpc://
 -
     name: cpu_pipeline
     interval: 600
     meters:
         - "cpu"
     transformers:
         - name: "rate_of_change"
           parameters:
               target:
                   name: "cpu_util"
                   unit: "%"
                   type: "gauge"
                   scale: "100.0 / (10**9 * (resource_metadata.cpu_number or 1))"
     publishers:
         - rpc://

The “cpu_pipeline” pipeline gives us a good example of what we will need:

A name for the pipeline
The interval (in seconds) that we want the pipeline triggered

I’ve been dabbling a bit more with OpenStack as of late. If you know me, you can likely guess my goal is how to ingest logs, monitor resources, etc.

Which meters we are interested in (“*” is a wildcard for everything, but you can also have an explicit list for when you want the same transformer to act on multiple meters)
The name of the transformation we want to use (scaling|rate_of_change)
Some parameters to do our transformation:
- Name: Optionally used if you want to override the metric’s original name * Unit: Like Name, can be used to override the original unit (useful for things like converting network.*.bytes from B(ytes) to MB or GB)
- Type: If you want to override the default type (remember they are:(cumulative|gauge|delta))

I’ve been dabbling a bit more with OpenStack as of late. If you know me, you can likely guess my goal is how to ingest logs, monitor resources, etc.
Scale: A snippet of Python for when you want to scale the result in some way (would typically be used along with Unit)

Side note: This one seems to be required, as when I omitted it, I got the value of the cumulative metric. Please feel free to comment if I goobered something up there.

I’ve been dabbling a bit more with OpenStack as of late. If you know me, you can likely guess my goal is how to ingest logs, monitor resources, etc.

Looking at all of this, we can see that the cpu_pipeline, er, pipeline:

Multiplies the number of vCPUs in the instance (resource_metadata.cpu_number) times 1 billion (10^9, or 10**9 in Python)
- Note the “or 1”, which is a catch for whenresource_metadata.cpu_number doesn’t exist
Divides 100 by the result

The end result is a value that tells us how taxed the Instance’s is from a CPU standpoint, expressed as a percentage.

I’ve been dabbling a bit more with OpenStack as of late. If you know me, you can likely guess my goal is how to ingest logs, monitor resources, etc.

Bringing it all Home

Armed with this knowledge, here is what I came up with to get a delta metric out of the network.*.bytes metrics:

I’ve been dabbling a bit more with OpenStack as of late. If you know me, you can likely guess my goal is how to ingest logs, monitor resources, etc.

Armed with this knowledge, here is what I came up with to get a delta metric out of the network.*.bytes metrics:

    name: network_stats
    interval: 10
    meters:
        - "network.incoming.bytes"
        - "network.outgoing.bytes"
    transformers:
        - name: "rate_of_change"
          parameters:
              target:
                  type: "gauge"
                  scale: "1"
    publishers:
        - udp://192.168.255.149:31337

In this case, I’m taking the network.incoming.bytes and network.outgoing.bytesmetrics and passing them through the “Rate of Change” transformer to spit a gauge out of what was previously a comulative metric.

I’ve been dabbling a bit more with OpenStack as of late. If you know me, you can likely guess my goal is how to ingest logs, monitor resources, etc.

I could have (and likely will) taken it a step further and used the scale parameter to change it from bytes to KB. For now, I am playing with OpenStack in a VM on my laptop, so the amount of traffic is small. After all, the difference between 1.1 and 1.4 in a histogram panel in Kibana isn’t very interesting looking :)

Oh, I forgot… the Publisher. Remember how I said the UDP Publisher uses msgpack to stuff its data in? It just so happens that Logstash has both a UDP input and a msgpack codec. As a result, my Receiver is Logstash – at least for now. Again, it would make alot more sense to ship this through StatsD and use Graphite to visualize the data. But, even then, I can still use Logstash’s StatsD output for that. Decisions, decisions :)

Since the data is in Logstash, that means I can use Kibana to make pretty charts with the data.

Here are the bits I added to my Logstash config to make this happen:

I’ve been dabbling a bit more with OpenStack as of late. If you know me, you can likely guess my goal is how to ingest logs, monitor resources, etc.

input {
  udp {
    port => 31337
    codec => msgpack
    type => ceilometer
  }
}

At that point, I get lovely input in ElasticSearch like the following:

{
  "_index": "logstash-2014.01.16",
  "_type": "logs",
  "_id": "CDPI8-ADSDCoPiqY9YqlEw",
  "_score": null,
  "_source": {
    "user_id": "21c98bfa03f14d56bb7a3a44285acf12",
    "name": "network.incoming.bytes",
    "resource_id": "instance-00000009-41a3ff24-f47e-4e29-86ce-4f50a61f78bd-tap30829cd9-5e",
    "timestamp": "2014-01-16T21:54:56Z",
    "resource_metadata": {
      "name": "tap30829cd9-5e",
      "parameters": {},
      "fref": null,
      "instance_id": "41a3ff24-f47e-4e29-86ce-4f50a61f78bd",
      "instance_type": "bfeabe24-08dc-4ea9-9321-1f7cf74b858b",
      "mac": "fa:16:3e:95:84:b8"
    },
    "volume": 1281.7777777777778,
    "source": "openstack",
    "project_id": "81ad9bf97d5f47da9d85569a50bdf4c2",
    "type": "gauge",
    "id": "d66f268c-7ef8-11e3-98cb-000c29785579",
    "unit": "B",
    "@timestamp": "2014-01-16T21:54:56.573Z",
    "@version": "1",
    "tags": [],
    "host": "192.168.255.147"
  },

I’ve been dabbling a bit more with OpenStack as of late. If you know me, you can likely guess my goal is how to ingest logs, monitor resources, etc.

 "sort": [
    1389909296573
  ]
}

Finally, I can then key a Histogram panel in Kibana to key on the “Volume” field for these documents and graph the result, like so:

I’ve been dabbling a bit more with OpenStack as of late. If you know me, you can likely guess my goal is how to ingest logs, monitor resources, etc.

OK, so maybe not as pretty as I sold it to be, but that’s the data’s fault – not the toolchain :) It will look much more interesting once I mirror this on a live system.

Hopefully someone out there on the Intertubes will find this useful and let me know if there’s a better way to get at this!

原文链接：

https://cjchand.wordpress.com/2014/01/16/transforming-cumulative-ceilometer-stats-to-gauges/

I’ve been dabbling a bit more with OpenStack as of late. If you know me, you can likely guess my goal is how to ingest logs, monitor resources, etc.