Marketplace web services access issue
Incident Report for Marketplace® Simulations
Postmortem

The incident impact: Marketplace Simulations web services were intermittently down for some users mostly located in the LATAM, APAC regions of the world. The intermittent unavailability was significantly lower in the US and Europe regions.

Marketplace Simulations web services uptime APAC region on August 30: 96.66%.

Marketplace Simulations web services uptime APAC region last 30 days: 99.89%.

Marketplace Simulations web services uptime LATAM region on August 30: 97.28%.

Marketplace Simulations web services uptime LATAM region last 30 days: 99.91%.

Marketplace Simulations web services uptime US&EU region on August 30: 99.86%.

Marketplace Simulations web services uptime US&EU region last 30 days: >99.99%.

The actions taken to mitigate or resolve the incident: The Marketplace web service monitoring notified the sysadmins about the issue. The first analysis showed the issue was geographically spread. However, we knew that the Marketplace web services were functioning properly. We suspected an issue with one of the Internet backbones. We reached out to the Marketplace webhosting provider (Flexential.com). Flexential admin on-call stated that the Flexential network team was seeing a widespread issue in the CenturyLink/Level3 network. We were told that the Flexential network team was working on routing the traffic inbound to Flexential so that it was not flowing via CenturyLink/Level3 network.

The incident's root cause: At this point, we do not have the Root Cause Analysis from CenturyLink/Level3 via Flexential. We are sharing a blog entry by CloudFlare (https://blog.cloudflare.com/analysis-of-todays-centurylink-level-3-outage/), which provides a good summary of the observed event.

Follow-up actions taken to prevent the incident from happening again: We are going to investigate if we need to purchase additional services that would help us minimize the impact of the kind of issues experienced in the CenturyLink/Level3 network.

Posted Sep 02, 2020 - 17:45 EDT

Resolved
This incident has been resolved.
Posted Aug 30, 2020 - 11:08 EDT
Investigating
The issues in the CenturyLink (Level 3) resurfaced again between 9:05 am and 10:10 am EDT. There was an impact on Marketplace web services. Users from some regions experienced intermittent problems accessing Marketplace web services. Since 10:10 am EDT, all of our monitoring agents in PingDom and ThousandEyes show no problems. At this time, the Marketplace web services are operational.
Posted Aug 30, 2020 - 11:07 EDT
This incident affected: Marketplace® Simulations Web Services.