Introduction To Prometheus Metrics And It's Types

Prometheus is an excellent tool for collecting metrics from your application in order to better understand how it is behaving. You will have four types of metrics to choose from when deciding how to publish metrics.

In this blog, we are going to cover different types of Prometheus metrics. There are 4 different types of metrics in Prometheus. Let’s explore them.

Counters

Counters are a basic method of tracking how frequently an event occurs within an application or service. They are used to track and measure Prometheus metrics that have constantly increasing values (i.e. monotonically increasing values) and are exposed as time series. A counter metric is http requests total, which reports the running total of HTTP requests to an application or service endpoint. At query time, the rate() function is applied to counters to measure or calculate how many requests occur at a given time per second.

Counters are running or cumulative counts that use metric client libraries to keep an ever increasing total sum of the number of events over the course of the application’s lifetime. These events can be measured on a regular basis by instructing Prometheus to scrape the metrics endpoint exposed by a client library.

Running counts are extremely dependable because they allow for the interpolation of any missed sample collections, resulting in close approximations for an aggregation or total sum of values at a given point in time. If you want to aggregate a running sum of many counts, you must first use the rate() function to visualise the changes per second on each count, and then aggregate the counts with sum ().

When to use counters?

We will use counters, when:

We want to keep track of a value that only rises.
We want to be able to query how quickly the value is increasing (i.e. its rate) later on.

Use cases for counters

request count
error count
tasks completed

Gauges

Gauges are used to take measurements or snapshots of a metric at a specific point in time on a regular basis. A gauge is similar to a counter in that its value can fluctuate arbitrarily over time (e.g. CPU utilisation and temperature).

When you want to query a metric that can go up or down but don’t need to know the rate at which it’s doing so, gauges are useful. The rate() function does not work with gauges because rates can only be applied to metrics that increase continuously (i.e. counters).

When to use gauges?

We will use gauges:

If you want to record a value that can change
You don’t need to query its rate.

Use cases for gauges

memory usage
number of requests in progress
queue size

Histograms

Histograms group observations into pre-defined buckets based on their frequency or count. If no buckets are specified, the Prometheus client library will use a set of default buckets (e.g. for the Go client library, it uses .005, .01, .025, .05, .1, .25, .5, 1, 2.5, 5, 10). These buckets are used to track an attribute’s distribution across multiple events (i.e. event latency). Note that the default buckets can be overridden if more or different values are required, but doing so may result in an increase in costs and/or cardinality because each bucket has its own unique time series.

Histograms are known to be highly performant in general because they only require a count per bucket and can be accurately aggregated across time series and instances (provided they have the same buckets configured). This means you can accurately aggregate histograms across multiple instances or regions without generating additional time series for aggregate views (unlike computed percentile values with summaries).

The main disadvantage of histograms is that you must pre-define the boundary values for your histogram buckets. Because code modifications are required to change the buckets, you must consider the expected latency ranges and configure the buckets accordingly. Furthermore, if you want to read your histograms as percentiles or quantiles in order to better understand the distribution, you must use the histogram quantile() function to estimate the desired quantile.

When to use histograms?

You want to take multiple measurements of a value in order to calculate averages or percentiles later.
you don’t care about exact values and are content with approximations
Because you know the range of values ahead of time, you can use the default bucket definitions or create your own.

Use cases for histograms

request duration
response size

Summaries

Summaries are similar to histograms in that they track attribute distributions over time, but they differ in that they expose quantile values directly (i.e. on the client side at collection time vs. on the Prometheus monitoring service at query time). They are most commonly used for monitoring latencies (e.g., P50, P90, P99) and are best suited for use cases requiring an accurate latency value or sample without the need for histogram bucket configuration.

Summaries are generally not recommended for use cases where a histogram can be used instead. This is due to the fact that quantiles cannot be aggregated and it can be difficult to determine which timeframe the quantiles cover. Please keep in mind that this is defined independently by each client library (e.g. Prometheus Go client library uses 10 minutes by default).

Once calculated, client-side quantiles cannot be merged with quantile values from other instances. This means that summaries cannot be aggregated across time series with any level of accuracy. The average of two P95 values, for example, does not equal the P95 for the combined set of values.

When to use summaries?

You want to take multiple measurements of a value in order to calculate averages or percentiles later.
You don’t care about exact values and are content with an approximation.
You can’t use histograms because you don’t know what the range of values will be.

Use cases for summaries

request duration
response size

Conclusion

The preceding overview introduces the primary Prometheus metric types, as well as some recommendations on when and how to use them. Metric type metadata may be used and propagated more strongly in future Prometheus versions, but for now, understanding the differences between types as a user is critical in order to build correct instrumentation and queries.

Related/References

Join FREE Masterclass

To know about what is the Prometheus Certified Associate, why you should learn Prometheus, Job opportunities for Prometheus Certified Associate in the market, and what to study Including Hands-On labs you must perform to clean Prometheus Certified Associate (PCA) certification exam by registering for our FREE Masterclass Waitlist.

An Introduction To Prometheus Metrics And It’s Types