A shared MySQL server in our Salt Lake City datacenter started returning ‘Too many open connections’ today around 12:24 PM MST.
The operations team began an immediate investigation and determined that the machine was out of disk space. Additional disk space was immediately provisioned and the service was restarted. MySQL services on db1.slc1 returned at 12:33 MST.
This machine is monitored 24x7 and a check for free disk space is done on a regular schedule. It appears that though our monitoring software caught the error and listed the disk space issue as ‘CRITICAL’, it did not send an alarm notifying administrators of the issue. We are investigating that issue and will update this post with additional details when they become available.
UPDATE (1:02 PM MST): Problems with monitoring were traced back to a misconfiguration. In an attempt to fix a packaging issue with our monitoring software, a configuration was applied which inadvertently caused services which were marked as CRITICAL to be reported as OK. The incorrect configuration has been backed out and all infrastructure monitoring is now operating normally.