alertmanagerOne of the requests we get most often on the Percona Monitoring and Management (PMM) team is “Do you support alerting?”  The answer to that question has always been “Yes” but the feedback on how we offered it natively was that it was, well, not robust enough!  We’ve been hard at work to change that and are excited to offer, starting with the newly released PMM version 2.3.0, a more dynamic alerting mechanism for your PMM installations: Integration with Prometheus Alertmanager.

Prometheus Alertmanager

If you don’t know what Alertmanager is you can read all about it on the Prometheus website, but the short version is that Alertmanger is a receiver, consolidator, and router of alerting messages that offers LOTS of flexibility when it comes to configurations.  From my old days as a SysAdmin, the tools I used weren’t smart enough to deduplicate alerts so I’d have my boss yelling, my coworkers emailing, and my phone (ok…Blackberry) battery depleting itself vibrating to the same alert over and over until I could manage to put the alert in maintenance mode and the queue of alerts drained.  Alertmanager is smart enough to deduplicate alerts so you don’t get 50 pages telling you the disk is 90% full before you can grow the volume or purge files. It’s also extremely easy to group alerts so that you don’t get alerts for ‘Application Down’, ‘MySQL Down’, ‘CPU|RAM|Disk: Unavail’, etc. because someone rebooted the DB server without putting monitoring in maintenance mode.  Alertmanager also offers many native integrations so you can route alerts to email, SMS, PagerDuty, Slack, and more!

Now, this is our first iteration of Alertmanager support so at this point you will need your own working Alertmanager installation that your PMM server can communicate with.  The only other thing you’ll need are the rules you want to trigger alerts from. That’s basically it! You most likely already know how to create yaml style rules but for the curious, it looks something like this:

alertmanager

The above will trigger an alert to let you know which PMM instances of PostgreSQL are down for more than 5 minutes.  Since this first pass targets the experienced users, I’ll leave it to you to craft your own rules but we’re really excited to be adding this sorely needed functionality!

 

For more information, you can read our AlertManager integration documentation and FAQs.  Update your instance today and let us know what you think, we would love to hear your feedback!

9 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Eugene

I’m trying to add the following in the Alertmanager rule section:

– alert: MongodbReplicationLag
expr: (avg by (cluster,environment,set)(mongodb_mongod_replset_member_optime_date{state=”PRIMARY”}) – min by (cluster,environment,set) (mongodb_mongod_replset_member_optime_date{state=”SECONDARY”})) > 120
for: 5m
labels:
severity: warning
source: pmmprod

Once I click “Apply Alertmanager settings button” button, the error pops-up: Invalid Alert Manager rules.

What am I doing wrong here?

Eugene

ttext in the comment above did not preserve yaml indentation which was correct originally

Vadim Yalovets

Could you check your expr part directly in prometheus UI?
https:///prometheus/

elisetta1984

Hello.
If I don’t want to use the new Prometheus Alertmanager, is it still possible to use the Grafana Alerting feature? I cannot find anymore the Alert Tab on the dashboard graph panel for PMM 2.3.0.
Thanks and Regards, Elisa

Sai

Very nice article.

Couple of questions —

1. Does this resolve template variables issue which we have been facing till graphana 4.x?
2. If we have 250 mongo servers, How can we configure and send alerts only to those servers where there is issue?

elisetta1984

Hello Steve.
Thanks for your explanations. Actually I can add an alert for a new Dashboard.
But is there also the way to add directly an alert for an existing Dashboard without creating a new one?
Thanks and Regards,
Elisa

Steve Hoffman

Sai, sorry for the delay…never got an alert that there was a new post until today so let me answer you first! This doesn’t technically resolve template variables…that’s an issue in Grafana but I think I heard that 7.0 lays the foundation to resolve that! What the AlertManager option does is lets you do what you’re after: dynamically alert on systems that meet certain criteria. There’s a ton of great AlertManager recipes online you can find that will allow you to set thresholds and alerting values on any parameters in prometheus you like. CPU over a certain threshold for a certain period of time on a production only system with greater than 12GB of installed RAM…you can alert on that! Want to restrict that alert down to only systems running mongo, no problem. We’re also working on integrating alert manager into PMM as well so you don’t have to have your own setup but not quite ready for production.

Steve Hoffman

Elisa, Unfortunately not, we use template variables in our dashboards to make them quite dynamic and the built-in alerting capabilities of grafana does not work with them. So when you create the copy of the dashboard you’re actually stripping out the offending variables and alerting on specific instances.