Transforming Time-series into Multi-dimensional Data – Platon

Time-series data stored by Prometheus in fact already is multi-dimensional. You can add dimensions to your data by adding labels to the metrics you expose. Take the Kubernetes API Server metric apiserver_request_total as an example. It exposes the count of requests that the API server received and enriches each data point with the following information as required:

code
component
dry_run
group
resource
scope
subresource
verb
version

Each of these labels can be seen as dimension of the datapoint (the Prometheus documentation even calls it “dimension”) that can be used to group the data when querying Prometheus later.

E.g. if you were interested in the count of requests by verb, you could do the following PromQL query:

sum by(verb) (apiserver_requests_total)

The query would return the number of requests sent for each of the verbs, aggregating over the other labels by sum. Executing the query at different timestamps will give you a trend over time.

Time series to Columns

While the time series database is very well suited and flexible for collecting monitoring data of services, more complex analytics use-cases than the one above will result in hard-to-write and hard-to-read PromQL queries.

There are tools out there that greatly help crafting and explaining PromQL queries, such as the Grafana query editor, or PromLens. However, choices are limited when it comes to visualizing data from Prometheus, the default being Grafana. Grafana is a great platform for visualizing and discussing data, but it still requires you to understand PromQL and the data you’re querying for deeper analysis.

Transforming data into a column-based database on the other hand allows us to query the data with a huge variety of analytics tools out there, both open source and commercial, like Apache Superset, Metabase, or Tableau to name just a few.

These tools have advanced query and dashboarding features that allow users to create the charts they need to review regularly with a few clicks. They also make it easy to slice and dice the data ad-hoc. E.g. when one of the charts shows unexpected results, drilling down into other dimensions is a matter of seconds. Column-stores like Clickhouse are built for ad-hoc aggregations, so running the adjusted query will likely be fast too.

To do that transformation, Platon treats every label of every Prometheus query that is defined in the Cube as dimension, creating a dedicated column in the output table.

A table for the single metric shown in the example above would result in a table with the following columns, data types in paranthesis:

Time (DATETIME)
code (STRING)
component (STRING)
dry_run (STRING)
group (STRING)
resource (STRING)
scope (STRING)
subresource (STRING)
verb (STRING)
version (STRING)
apiserver_requests_total (FLOAT64)

When first querying Prometheus, Platon will create the table with these columns. During the next update, it will append rows for new values.

Since time-series can have new labels added in the meantime, Platon also verifies that the table schema has all the possible labels. If not, it will add columns for new labels, giving the cube another dimension.

Querying another metric, that exposes the same dimension would be as easy as adding a column for the new metric, or JOINing two such tables. However, having more than one matching data point between two different metrics is a frequent situation. JOINing two tables with partially matching values leads to a carthesian product of matching rows, leading to duplicate metric values and hence, non-aggregatable results.

To circumvent this, Platon implemented a special logic to query different metrics and represent them in a single table, avoiding carthesian products while still allowing to show data side-by-side in a single visualization.

How this logic works in detail will be discussed in a follow-up blog post – so stay tuned!