Data Discovery in Prometheus – Platon – Infrastructure Intelligence

In our first tries, we installed Prometheus on an Ubuntu box using apt to query the metrics exposed by this installation without the need of a full-fledged Kubernetes cluster.

This was a good way to interact with a Prometheus deployment and allowed for fast iterations. However, being used to mostly the metrics exposed by Kubernetes API servers, we quickly needed some tools that allow browsing what’s available in a given Prometheus deployment and create a baseline cube definition from it.

The Platon CLI comes with commands to easy the disovery of data in your Prometheus deployment so there is no need to leave the CLI to talk to the Platon API yourself.

Metrics Discovery

The first step is figuring out which metrics even are available in your Prometheus deployment. By default, Platon will look for metrics that have been available in the past 1 hour.

This seems to be a good timeframe to search for metrics that are updated regularly.

$ platon list metrics -p http://localhost:9090

All Metrics:
+---------------------------+------------------------------+
| METRIC                    | DIMENSIONS                   |
+---------------------------+------------------------------+
| node_arp_entries          | job, device, instance        |
| node_boot_time_seconds    | instance, job                |
| node_cpu_seconds_total    | job, mode, cpu, instance     |
| node_memory_Active_bytes  | instance, job                |
| node_memory_MemFree_bytes | instance, job                |
+---------------------------+------------------------------+
listing 648 metrics out of 648 found in Prometheus instance.

Note that this is only an excerpt of all the metrics available (648 in this Prometheus instance) to give you an overview of the idea.

What’s listed as Dimensions are all the labels that are exposed by the given metric. In the column-store format, each of these Dimension will be represented by its own column. When new labels are discovered while Platon is running and synchronizing data, new columns will be added to the table accordingly.

Listing the Dimensions here is important to figure out which of the metrics may have common dimensions that can be used to relate data in the same cube. There is always the Time dimension which will be shared across the metrics (since it’s a time series), but the additional dimensions will allow drilling down further in related data structures.

In the example above, you can see that instance and job dimensions are shared across all the node_ dimensions, so data can be stored in the same cube with shared information in these dimensions:

Time	instance	job	MemFree_bytes	cpu_seconds_total
1	localhost	j1	8000000	60
2	localhost	j1	7800000	70
3	localhost	j1	7700000	80

Relating different metrics using shared dimensions.

While this is a very simplified example, it may give you an idea of how this allows to create graphs and do data analysis on metrics with shared dimensions easily.

Discovering Similar Metrics

Our simple installation of Prometheus already exposes more than 600 metrics, a default installation of OpenShift exposes thousands of Metrics. Knowing all these metrics well enough so that you can make connections like the one in the simplified case above is virtually impossible. Platon helps you discover metrics that are related to the metric that you’re interested in by allowing you to filter metrics by the dimensions exposed:

$ platon list metrics -p http://localhost:9090 -d job -d instance

If you’re interested in figuring out the dimensions that are available in the whole instance or when querying multiple metrics, you can do that with the list dimensions command:

$ platon list dimensions -p http://localhost:9090 -m node_cpu_seconds_total -m node_memory_MemFree_bytes

All Dimensions:
cpu
instance
job
mode
4 dimensions found in Prometheus instance.

Joining Time

To ensure the Time dimension can be joined between metrics, Platon will query all metrics using the same timeframe and interval so values for the exact same timestamps will be returned by Prometheus.

Creating a first Cube

So far all the relationships between data have only been made in your brain and not manifested in a cube definition. The cube definition is necessary for Platon to know which queries to query and how to relate them so they can be stored in the column store database.

To generate a baseline cube, you can use the generate cube command:

$ platon generate cube -p http://localhost:9090 -m node_cpu_seconds_total -m node_memory_MemFree_bytes

cubes:
- name: MyCube
  description: My Cube
  ttl: 5m0s
  scrape-interval: 1m0s
  queries:
  - name: node_cpu_seconds_total
    promql: node_cpu_seconds_total
    value: node_cpu_seconds_total
  - name: node_memory_MemFree_bytes
    promql: node_memory_MemFree_bytes
    value: node_memory_MemFree_bytes
  joined-labels:
  - instance
  - job

This is the input that Platon will later use to query Prometheus for the defined metrics.

Passing the joined labels is important, as just because they have the same name, doesn’t mean they have the same meaning and joined content. E.g. the Kubernetes API server exposes instance in a different form than the node_ label shown above. So this cube definition will need some manual adjustments done by the data modeler to ensure the resulting data cube has the correct data and the seemingly related dimensions are actually related.