Scom

SCOM 2007 R2 Expired certs stop management server communication

SCOM 2007 R2 Expired certs stop management server communication

I had a funny one at work the other day.  We have a legacy SCOM 2007 R2 environment that just monitors our remaining Windows 2003 servers (I know don’t get me started!) .  Up until recently we had some Windows 2000 servers  in a firewalled domain, that  connected to SCOM via certificate authentication.   These certs were SHA1 and had long expired and i hadn’t given it much more thought.  That was until a co-worker asked it they were still needed as another project was behind schedule and he was having to extend the life of our SHA1 infrastructure.   I checked SCOM and i couldn’t see any agents that used cert authentication, so i was happy for them to expire. 

This obviously was an error.  The 4 management servers (3 + 1 RMS) have certificated installed and configured to be used for SCOM by  the MomCertImport tool.  About 1 second after these expired, the 4 management servers, stopped talking to each other and we got about 100 “Unable to Connect” errors in the console.  When i checked the operations manager logs on each server, i could see that they were failing to communicate.  So i removed the cert from one of the servers and instantly was able to access the SCOM console on that management servers (again, don’t ask why the console is on there), however the management server itself couldn’t connect.  I restarted the server and it sprang back into life, however there were still errors in the event log.  It was complaining that a certificate was specified, but not available.  So i found the MomCertImport tool and ran it from the command line with the /Remove switch .  This followed by another reboot (Services restart would probably have been ok) resolved the issue, so i repeated this process on the remaining management servers.  I guess the take away is that in 2007, if you specify a certificate, SCOM will use this for all communications if it can.