Comments on: Integrated Alerting Design in Percona Monitoring and Management https://www.percona.com/blog/integrated-alerting-design-in-percona-monitoring-and-management/ Sat, 17 Feb 2024 01:16:29 +0000 hourly 1 https://wordpress.org/?v=6.5.2 By: Laurent Indermühle https://www.percona.com/blog/integrated-alerting-design-in-percona-monitoring-and-management/#comment-10973241 Wed, 30 Jun 2021 13:51:20 +0000 https://www.percona.com/blog/?p=74927#comment-10973241 In the current state, PMM is useless for alerting.

The integrated alerts doesn’t work. And the two only tools I know of to look for logs are less than ideal. Either you download a 120MB zip, or you podman exec -it pmm-server bash then navigate to /srv/logs.

What’s missing is:

– A button in “Notification Channels” to test the channel
– A button in “Alerts” to test the message sent
– A way to rename service_id, agents_id and node_id in the messages sent. The title should contain a service_name or node_name, not long id unreadable by a human.
– An history of the alerts (where goes the silenced one?)
– A log of the subsystems accessible from the UI

But what’s making me saying that integrated alerts doesn’t work is that one day I received some alerts, then nothing for days. But when I login into PMM UI I see plenty of alerts fired.

Today I’ve randomly stopped some MariaDB nodes. I never got any alerts. One alert registered in the UI and all others didn’t appeared at all in the UI.

An alert manager is a critical system that should never ever fails silently. If the documentation is incomplete, the process of copying rules automatically from Prometheus to vmalert is obscure and alerts are not always triggered, nobody will use that system.

And it would be a shame, because I love PMM2 so much. You did a very good job for the monitoring part. I really wish the Alerting part will catch up! 🙂

]]>