Keeping an eye on puppet (updated)

Having been away from home for close to two months, I was very happy that I managed to stick a lot of my system configuration in puppet before leaving.

My puppet manifests live in a git repository to which the puppetmaster automatically synchronises every 15 minutes or so. The upshot of that is that I can make a configuration change in a git repository on my netbook anywhere in the world, push that change to the master repository and have it be applied to the appropriate nodes within about half an hour. Lovely.

The problem is that - for whatever reason - puppet on a node sometimes hangs and thus stops fetching updated configuration files. I can discover this by finding that a configuration change isn't applying every after waiting for an hour, which isn't a particularly good use of my time.

Happily I run a nagios server that does some basic monitoring, so all I need is a way to test the last time a node successfully connected to the puppetmaster. Turns out there is apparently not an easy way to do this without making puppet log to MySQL. At that point I can query MySQL to get status info, nice. To accomplish this, I set up puppet-dashboard, but although installation and configuraiton went fine, I found nothing was getting logged.

This was turning into somewhat of a palaver, getting some status info shouldn't be this hard. So back to the drawing board.

I was pointed at a nagios plugin that checks the system log on each puppet node, but that seems horribly inefficient. nagios needs to connect to each client, greps the log and find a date. I noticed at this point that the puppetmaster keeps node data in yaml format in a directory on the puppetmaster. Aha!

With some bash scripting, I now have a nagios plugin that runs on the puppetmaster. It checks the last activity time of each node that - as far as the puppetmaster is concerned - has a valid certificate and creates a single status string. Lovely.

[UPDATE]

Rather than use (or keep using) the attached script, a newer, faster (lies), better, improved version is hosted on github at https://github.com/cafuego/check_puppetmaster. Use that version instead. Really.

 

Comments

I've implemented your script, thanks. However, you have a bug. Near the end of your script you set "ret=${STATUS_WARNING}", however you should have STATUS_WARN instead of STATUS_WARNING.

For me, the output of '/usr/sbin/puppetca -la' as the nagios user produces no output, meaning nagios always reported ok for 0 hosts. Not very useful, but if you specify the puppet ssl directory like '/usr/sbin/puppetca -la --ssldir /var/lib/puppet/ssl', then the output in nagios shows the correct number of hosts!

Oh, that's peculiar. I've got the ssldir specified in /etc/puppet/puppet.conf, do you? If not, it might not know where to look and thus want it on the commandline.

I just pushed a change to github that allows you to add extra CLI parameters for puppetca. See how you go :-)

Add new comment