The incident impact: Marketplace Simulations web services were intermittently down for some users mostly located in the LATAM, APAC regions of the world. The intermittent unavailability was significantly lower in the US and Europe regions.
Marketplace Simulations web services uptime APAC region on August 30: 96.66%.
Marketplace Simulations web services uptime APAC region last 30 days: 99.89%.
Marketplace Simulations web services uptime LATAM region on August 30: 97.28%.
Marketplace Simulations web services uptime LATAM region last 30 days: 99.91%.
Marketplace Simulations web services uptime US&EU region on August 30: 99.86%.
Marketplace Simulations web services uptime US&EU region last 30 days: >99.99%.
The actions taken to mitigate or resolve the incident: The Marketplace web service monitoring notified the sysadmins about the issue. The first analysis showed the issue was geographically spread. However, we knew that the Marketplace web services were functioning properly. We suspected an issue with one of the Internet backbones. We reached out to the Marketplace webhosting provider (Flexential.com). Flexential admin on-call stated that the Flexential network team was seeing a widespread issue in the CenturyLink/Level3 network. We were told that the Flexential network team was working on routing the traffic inbound to Flexential so that it was not flowing via CenturyLink/Level3 network.
The incident's root cause: At this point, we do not have the Root Cause Analysis from CenturyLink/Level3 via Flexential. We are sharing a blog entry by CloudFlare (https://blog.cloudflare.com/analysis-of-todays-centurylink-level-3-outage/), which provides a good summary of the observed event.
Follow-up actions taken to prevent the incident from happening again: We are going to investigate if we need to purchase additional services that would help us minimize the impact of the kind of issues experienced in the CenturyLink/Level3 network.