The replication service on both mailbox servers slowly ramped CPU usage so that the server went up to 100% which then caused all manner of nasty failures.
Through-out our troubled times we have
- Looked at the Exchange 2010 configuration with a fine toothed comb
- Had 3 "Exchange Experts" come in and check things out
- Raised PSS call with Microsoft
- Raise call with VMWare (the servers are all virtual)
- Increased RAM and CPU to ESX limits (almost - we only have 96GB in our hosts)
- Tweaked registry settings
- Moved to use pvSCSI instead of vSAS and back again (pvSCSI has issues in high IO environments!! We need new pvSCSI drivers)
- Rebuilt a second set of Mailbox servers
The guys who rebuilt the environment failed to install the SCOM agent (at first). These boxes showed no signs of CPU creep. Once they realised the missing SCOM agent they quickly installed it. Suddenly we saw CPU ramping up albeit slowly over a period of 2-3weeks.
So we turned off the agent and the CPU settles back down.
We are now looking at removing the Exchange 2010 management pack from our environment to see if this is the cause rather than the SCOM agent.
More news when I know.
Ok so now we have an Exchange environment with NO live mailboxes only a handful of test mailboxes. With the SCOM agent installed we see the rising CPU. Uninstall the SCOM agent and no rise.
Current situation is that we are not using SCOM to monitor our Exchange production servers. Not good. Microsoft don't seem to know what is going on either.