Incident Details

This page contains all current information on the status of this incident.



Start

2020-05-13 08:30:13 AEST


End

2020-05-13 22:55:49 AEST


Severity

Major Outage


Current Status

Resolved


Services Impacted

- ServerMule Console
- VPS


Description

Current network issues are effecting some clients services. Resolution is being worked on currently.

Update: 11:01
Work is continuing on resolving ongoing issues, please refrain from restarting any effected VPS services at the current time.



Timeline

2020-05-14 13:31:46 AEST

The incident that took place on the 13th of May, 2020 was highly likely to due to a stuck process when the affected storage pool was undergoing automatic self-recovery. The incident affected some backend management systems as well as some customers hosted on the KVM platform.

Initially, the automatic self-recovery process had caused some major slowness gradually to certain systems that are hosted on said storage pool, and might render the VMs not being able to respond in a timely fashion granted they were still responding.

While we were in the process of recovering the backend management systems and identifying the cause, some scheduled backend tasks as well as customer-initated/enabled actions (e.g. auto backup, reboot) were also initiated which had then further worsened the situation and causing more delay to the recovery and investigation process.

Once our technicians were able to identify and mitigate the stuck process on the storage pool, all the VMs had started to resume back to normal, even thoguh some VMs would still need to reboot to fully resume back to a normal state.

Going forward, we will be performing backend platform upgrades to mitigate this issue as much as possible. You will receive further updates on this in the future.

We would like to again apologise for any inconvenience caused. We appreciate your patience and understanding.

2020-05-13 22:55:49 AEST

The backend storage should be stable now. Please let us know if your VM is still not responding as we have no direct way of fully verifying if a VM is responding in relation to storage.

We would like to apologise for any inconvenience caused.

2020-05-13 20:37:38 AEST

Many customer VMs have recovered, we are still going through the list of the affected VMs

2020-05-13 19:20:41 AEST

We are seeing some gradual improvement for some customers, but due to the large amount of data and live system load (e.g. customer-generated backup), the mitigation may still some time to fully kick in.

2020-05-13 18:19:53 AEST

The outage is still being worked on

2020-05-13 17:25:16 AEST

Unfortunately we have not been able to identify the exact root cause of the storage lockup/slowness as it seems the customers who have been affected do not fall under a very specific pattern.

2020-05-13 16:03:58 AEST

While we are still trying to repair the storage as the load is so high that doing anything major may cause it to lock up again, we are also trying out other recovery plans.

2020-05-13 14:28:06 AEST

We are attempting multiple possible workarounds to mitigate the situation due to different types of customers who have been affected.

2020-05-13 13:42:44 AEST

The automatic recovery process on the affected storage is taking much longer time than expected, we are working preventing further load onto said storage from any unnecessary sources.

2020-05-13 11:39:03 AEST

Please note that the slowness due to high system load is still causing the VPSes to not be able to respond in a timely fashion. We are still keeping an eye on the automatic recovery progress on the backend storage.

2020-05-13 11:12:48 AEST

Sincere apologies for the delay. Our technicians are still working on the potential possibility of this incident. Upon investigation, it seems there is a specific set of backend storage that is performing automatic data consistency maintenance, causing customers on certain KVM platforms to have been experiencing system slowness on their VPSes. For customers who have experienced this problem, we'd like to recommend avoidng logging to the VPS to do any major changes or reboot/shutdown the server as it may relatively cause more load to the potentially affected systems.

We will update this status once we have further information.

We would like to apologise for any inconvenience caused.