In a notable recent incident involving Google Cloud, Australian pension fund UniSuper experienced a significant disruption due to a misconfiguration by Google operators. The error resulted in the deletion of UniSuper’s Google Cloud account, leading to two weeks of downtime for its 647,000 members.

The root cause of this incident was traced back to an operator who inadvertently left a parameter blank during the deployment of a Google Cloud VMware Engine (GCVE) Private Cloud. As a result, the account was set to be deleted after a fixed term. One of the critical consequences of this misstep was the loss of backups across two geographies. UniSuper reported that their backups were deleted, exacerbating the impact of the downtime.

However, Google clarified that backups stored in Google Cloud Storage remained unaffected and were crucial for data restoration.

The incident provides valuable lessons for companies of all sizes. Here are some key takeaways:

  1. Backups and Redundancies Are Vital:
    • Having at least one backup stored offsite or with a 4th party cloud provider.
    • UniSuper had backups in place, but the misconfiguration led to deletion across both geographies. Having backups with another service provider helped minimize data loss.
    • Lesson: Diversify backup solutions and ensure geographical redundancy to mitigate risks.
  2. Retention Periods for Data:
    • Customer data should remain available for a reasonable period post-subscription and not be immediately deleted unless directed by the customer.
    • Lesson: Review data retention policies to prevent accidental data loss.
  3. Disaster Recovery Planning:
    • UniSuper’s outage underscores the need for robust Disaster Recovery (DR) planning. Organizations should prepare for all disaster scenarios.
    • Lesson: Invest in comprehensive DR strategies to minimize downtime and data loss.
  4. Avoid Relying Solely on a Single Cloud Provider:
    • Even reputable cloud providers like Google Cloud can experience misconfigurations or incidents.
    • Lesson: Distribute critical services across multiple providers to reduce dependency and risk.
  5. Transparency and Incident Response:
    • Google’s openness about the incident is commendable. Transparency helps others learn from mistakes.
    • Lesson: Be transparent about incidents, conduct internal reviews, and take preventive measures.

