db2.slc1
Friday, November 12, 2010
Around 9:00 PM MST on November 11th, the shared database server db2.slc1 began to see rapid increases in disk utilization by the MySQL server. System administrators immediately began working on the problem and additional space was quickly provisioned.
Around 1:00 AM MST and again at around 4:40 AM, the MySQL server mysteriously began rejecting new connections. Though the problem has been temporarily resolved, systems engineers continue to research the issue.
We know that any downtime on our database servers can be extremely disruptive and we’re genuinely sorry about the outage. We’ll continue to update this site as we learn more.
Hardware failure in slc1
Tuesday, October 19, 2010
At 15:17:48, our automated monitoring system detected a problem with a server in our Salt Lake City, Utah data center which hosts a number of client containers. On-site administrators immediately responded by issuing a power-cycle command to the failed server. After a reboot, all containers were once again online by 15:31:38.
Engineers are investigating the cause of the issue. We know that even a few moments of downtime are extremely disruptive to our clients and we apologize for the outage.
lb1.slc1 upgrade
Monday, September 13, 2010
An upgrade was made to a load-balancer serving some sites in our Salt Lake City, Utah data center.
In order to ensure security for our users and allow them to pass the most stringent security audits, the Stackable Engineering team determined that an upgrade was necessary to the load-balancer infrastructure.
As a part of the upgrade, all sites now site behind the newest update to the 0.7 branch of the nginx web server software which is 0.7.67.
The upgrade was transparent to all sites behind the load-balancers and no traffic disruption was recorded.
db1.slc1 outage
Wednesday, September 8, 2010
A shared MySQL server in our Salt Lake City datacenter started returning ‘Too many open connections’ today around 12:24 PM MST.
The operations team began an immediate investigation and determined that the machine was out of disk space. Additional disk space was immediately provisioned and the service was restarted. MySQL services on db1.slc1 returned at 12:33 MST.
This machine is monitored 24x7 and a check for free disk space is done on a regular schedule. It appears that though our monitoring software caught the error and listed the disk space issue as ‘CRITICAL’, it did not send an alarm notifying administrators of the issue. We are investigating that issue and will update this post with additional details when they become available.
UPDATE (1:02 PM MST): Problems with monitoring were traced back to a misconfiguration. In an attempt to fix a packaging issue with our monitoring software, a configuration was applied which inadvertently caused services which were marked as CRITICAL to be reported as OK. The incorrect configuration has been backed out and all infrastructure monitoring is now operating normally.
Errors on new database creation
Tuesday, August 31, 2010
When creating new databases via the Control Panel or the API, an error is thrown: ’Argument ‘databaseName’ failed validation: Database doesn’t exist’.
This error is spurious and can be safely ignored. The database is created and will work properly, however the newly created username will not appear in the list of users for the database.
Engineers are investigating the issue.
UPDATE: This issue is only affecting a very small set of users who create databases with a username that’s already in use. [August 31st, 11:33 MST]
Postgres database creation
Thursday, August 19, 2010
At present, the creation of new Postgres databases via the Control Panel are failing. Existing databases are not affected.
UPDATE: Database creation was re-enabled at 12:30 PM MST.
Upgrade in progress
Wednesday, August 18, 2010
Stackable is currently upgrading our Control Panel to a new release version. During the next hour or so, the panel at control.stackable.com will be inaccessible. Please bear with us while we make these changes.
Sites and their services should not be affected by these upgrades.
UPDATE: This was completed at 11:51 AM MST. All services are back online. Check http://blog.stackable.com soon for a complete list of new features.
MySQL outage on db1.slc1
Wednesday, August 11, 2010
This morning at approximately 6:18 AM MST the MySQL server on db1.slc1.stackable.com stopped allowing new connections.
Our monitoring system detected this condition as being one in which the MySQL process wasn’t running and elected to perform a full restart on the MySQL daemon.
It took several attempts before the process was fully restarted and MySQL was returned to service on the machine at 6:35 AM MST.
The operations team is investigating the root cause of the outage.
Control Panel logins
Tuesday, July 13, 2010
From approximately 12:00 AM MST to 9:35 AM MST, some logins to https://control.stackable.com were failing with users unable to progress beyond the “Loading Account Details…” message.
This was an unscheduled outage. Monitoring for this service was errantly removed in a recent code commit. We’ve corrected the monitoring problem and continue to investigate the root cause of the incident.