Dr. Fabian Stäber is engineering manager and monitoring enthusiast at Grafana. He is a member of the Prometheus open source project, where he is maintainer of the Prometheus Java client library and the JMX exporter. At Grafana Fabian has his focus on application monitoring with OpenTelemetry.
Understanding how fast REST services respond is one if the key signals in application performance monitoring.
However, there is no established best practice for monitoring REST service performance. Popular Java metric libraries offer multiple algorithms and representations to choose from, each of which requiring specific trade-offs and targeting specific use cases.
Moreover, this is an area of active development: The Prometheus community and the OpenTelemetry community are about to finalize the specification of sparse / exponential histograms, which will come with their own strengths and weaknesses.
In this talk we will give an overview of the algorithms implemented by the most popular Java metrics libraries, explain the implications on dashboards, SLOs, and alerts, and explore trade-offs.
After the talk you will be able to choose the implementation that fits your use case, and you'll be able to better understand latency data on your monitoring dashboard.