January 2012
2 posts
Hardware Node Outage
For as of yet unknown reasons, at approximately 3:02 AM MST one of the hardware nodes servicing our cluster started behaving erratically and negativity impacting the performance of customer containers on that hardware node. Systems administrators were immediately notified and the decision to reboot the affected hardware node was made at 4:43 AM. The hardware node full completed a reboot at 5:13 AM...
Jan 17th
Hardware Node Outage
At approximately 2:46 PM MST one of the hardware nodes servicing our cluster unexpectedly experienced a kernel panic following a configuration change. Systems administrators on site were immediately notified and the node was brought back online at 2:59 PM MST. In total, 27 customers were affected. Customer containers on that hardware node proceeded to recover over the course of the next few...
Jan 11th
November 2011
2 posts
Upstream core router outage
One of our upstream provider’s core routers is currently experiencing significant routing issues, preventing us from contacting our DNS servers and degrading access to, from, and within our cluster. They are working on the problem now and we will update this status as more information becomes available. Thank you for your patience. Update 1:55 PM MST: Network service has been restored. We...
Nov 16th
DNS resolution failures
During the early-morning hours of November 11th, 2011, DNS resolution for the stackablehost.com domain ceased to work correctly. The most frequent issues where arose as a result is that customers depending on resolution of the stackablehost.com domain for inter-container requests such as MySQL or Postgres saw failures.  Due to a series of mistakes, the domain was not renewed by Stackable as it...
Nov 11th
October 2011
3 posts
Exim regression causing mail delivery to fail on...
A normal security update to the Exim mail delivery package on PHP containers has been rolled out, however it has been discovered that in some cases mail delivery will fail when mail is sent via a PHP script which utilizes the system binaries to send mail. A temporary fix for this problem is being rolled out but may not be available to all customer containers immediately. If you continue...
Oct 27th
DNS server degradation
At approximately 12:15 PM MST on Sunday, October 23, 2011 the recursive name servers at our upstream provider experienced degraded performance as of a result of a Denial of Service attack lasting until about 6:33 PM MST. Systems engineers at Stackable were notified quickly after the problem began and, in concert with our upstream provider, were able to identify and resolve the issue. Our provider...
Oct 25th
Scheduled service outage Wednesday morning
Stackable will be performing maintenance in its data center. If you are one of the customers expected to be affected, you have already been notified by email. We plan to begin this work at Wednesday, Oct 19, 7:00 AM MDT and expect that all work will be concluded by 7:30 AM (30 minutes). We expect that these container(s) will be down for only a few minutes during this time frame. Stackable...
Oct 18th
September 2011
1 post
Control panel issues
The Stackable Control panel is currently inaccessible. Engineers are working to correct the problem. If you have urgent requests, please use our Live Chat feature to connect with a technical support agent who can assist you or email help@stackable.com. All containers and sites are operating normally.  Update: As of 12:45 AM MST, the Stackable Control Panel is now functioning normally. We are...
Sep 6th
August 2011
1 post
Managed switch replacement
On Friday, August 26, at 5:00 PM MST, we will be replacing a faulty managed switch with a hot spare we have available for such situations. The faulty switch has not resulted in any problems or outages to date, but we have decided to replace it as a precaution. There will be no downtime as a result of this replacement. As always, our goal is to provide our customers with high-availability, reliable...
Aug 25th
July 2011
1 post
Scheduled service outage Friday morning
Stackable will be performing maintenance in its data center. If you are one of the customers expected to be affected, you have already been notified by email. We are beginning work at 12:01 AM MDT on Friday, July 8th, 2011, and don’t anticipate it taking longer than 10 minutes. During this time period, we will be shutting down the load balancer which directs traffic for some websites to...
Jul 7th
June 2011
4 posts
Scheduled service outage Sunday morning
On Sunday morning, Stackable will be performing maintenance in its data center which will affect 13 customers. If you are one of the customers expected to be affected, you have already been notified by email. We plan to begin this work at Sunday, Jul 3, 9:00 AM MST and expect that all work will be concluded by 9:30 AM (30 minutes). Stackable does offer high-availability options to ensure that...
Jun 30th
Phones and Live Chat offline
As of 9:00 AM MST, we are unable to take support requests via phone or live chat. We are working to correct the issue.  In the interim, please direct all support requests to help@stackable.com and they will be handled promptly.  UPDATE: This issue was resolved at 9:41 AM MST.
Jun 22nd
Increased file system lag
Between 6:15 AM and 1:30 PM MST our back end storage file system experienced an usually high amount of lag causing some customers to experience slow page load times. Our systems administrators quickly identified the problem and worked as quickly as possible to resolve the issue. We are currently evaluating a variety of options to prevent this issue for occurring in the future. We are deeply sorry...
Jun 20th
Brief Network Outage
Today between 1:35 PM and 1:45 PM MST, the core routers which handle all XMission and Stackable traffic to and from the Internet experienced an overload which caused sporadic traffic outages during this time period.   XMission has redundant routers with connectivity to several Internet backbone providers. Currently only one of these is IPv6 capable. During an attempt to bring the second router...
Jun 7th
May 2011
1 post
VPS Outages
A small number of VPS containers in our SLC1 datacenter were offline on April 30 from 8:30 PM MST - 9:50 PM MST and again on May 1 from 10:44 AM MST - 11:24 AM MST. These issues have been identified and stem from problems we’ve had with our back end ZFS file system. A decision has been made to replace the Nexenta platform currently in use with a NetApp FAS2020 to serve as the storage...
May 3rd
April 2011
4 posts
VPS Outage
11 VPS (beta) containers in our SLC1 datacenter were offline from 9:04PM MST to 10:45PM MST, affecting a total of 8 customers. This outage was caused by the same issue which affected the same VPS containers on April 26, 2011. We are deeply sorry for any inconvenience this may have caused. Our engineers continue to investigate the root cause of this problem and are working to develop a solution
Apr 26th
VPS outage
A small number of VPS containers in our SLC1 datacenter were offline from 11:03 AM MST to 11:25 MST.  Without warning, a storage server which provides disk access for VPS containers suddenly flipped one of its filesystems into read-only mode. Not only did this prevent containers from being able to write files, it also prevented containers from being started, stopped or restarted.  The hardware...
Apr 20th
Control Panel billing issues
For a short period of time over the weekend, customers were not able to access their billing page through the Stackable Control panel. This issue was first brought to our attention on Sunday, April 17, at 1:02 PM MST and was resolved by 2:38 PM MST. We believe this issue was caused by an upgrade to our third party billing system. We apologize for any inconvenience this may have cause.
Apr 18th
Control Panel issues
The Stackable Control Panel is not functioning correctly as of 9:00 AM MST, Monday April 11th. Users may experience high wait times and may be unable to upgrade/downgrade or purchase new containers during this period. We believe this issue to have been caused by an upgrade to our 3rd party billing software which was undertaken by our vendor a day earlier than was scheduled. We are working with...
Apr 11th
March 2011
1 post
Disc failure in slc1
At approximately 5:20 AM MST  disc in a server in our SLC1 datacenter crashed, bringing a critical machine offline. A number of customer containers were affected and all provisioning capabilities to existing containers were brought offline. Systems engineers were immediately alerted to the problem and were on-site shortly after. The machine was brought fully back online by 7:11 AM MST and as of...
Mar 11th
November 2010
1 post
db2.slc1
Around 9:00 PM MST on November 11th, the shared database server db2.slc1 began to see rapid increases in disk utilization by the MySQL server. System administrators immediately began working on the problem and additional space was quickly provisioned.  Around 1:00 AM MST and again at around 4:40 AM, the MySQL server mysteriously began rejecting new connections. Though the problem has been...
Nov 12th
October 2010
1 post
Hardware failure in slc1
At 15:17:48, our automated monitoring system detected a problem with a server in our Salt Lake City, Utah data center which hosts a number of client containers. On-site administrators immediately responded by issuing a power-cycle command to the failed server. After a reboot, all containers were once again online by 15:31:38.  Engineers are investigating the cause of the issue. We know that even...
Oct 19th
September 2010
2 posts
lb1.slc1 upgrade
An upgrade was made to a load-balancer serving some sites in our Salt Lake City, Utah data center. In order to ensure security for our users and allow them to pass the most stringent security audits, the Stackable Engineering team determined that an upgrade was necessary to the load-balancer infrastructure.  As a part of the upgrade, all sites now site behind the newest update to the 0.7 branch...
Sep 14th
db1.slc1 outage
A shared MySQL server in our Salt Lake City datacenter started returning ‘Too many open connections’ today around 12:24 PM MST.   The operations team began an immediate investigation and determined that the machine was out of disk space. Additional disk space was immediately provisioned and the service was restarted. MySQL services on db1.slc1 returned at 12:33 MST.  This machine is...
Sep 8th
August 2010
4 posts
Errors on new database creation
When creating new databases via the Control Panel or the API, an error is thrown: ’Argument ‘databaseName’ failed validation: Database doesn’t exist’. This error is spurious and can be safely ignored. The database is created and will work properly, however the newly created username will not appear in the list of users for the database.  Engineers are investigating...
Aug 31st
Postgres database creation
At present, the creation of new Postgres databases via the Control Panel are failing. Existing databases are not affected.  UPDATE: Database creation was re-enabled at 12:30 PM MST.
Aug 19th
Upgrade in progress
Stackable is currently upgrading our Control Panel to a new release version.  During the next hour or so, the panel at control.stackable.com will be inaccessible.  Please bear with us while we make these changes. Sites and their services should not be affected by these upgrades. UPDATE: This was completed at 11:51 AM MST. All services are back online. Check http://blog.stackable.com soon for a...
Aug 18th
2 tags
MySQL outage on db1.slc1
This morning at approximately 6:18 AM MST the MySQL server on db1.slc1.stackable.com stopped allowing new connections. Our monitoring system detected this condition as being one in which the MySQL process wasn’t running and elected to perform a full restart on the MySQL daemon. It took several attempts before the process was fully restarted and MySQL was returned to service on the machine...
Aug 11th
July 2010
1 post
Control Panel logins
From approximately 12:00 AM MST to 9:35 AM MST, some logins to https://control.stackable.com were failing with users unable to progress beyond the “Loading Account Details…” message. This was an unscheduled outage. Monitoring for this service was errantly removed in a recent code commit. We’ve corrected the monitoring problem and continue to investigate the root cause of...
Jul 13th