Application Service Disruption
2019-04-30 and 2019-05-02: Root Cause Analysis
Matt Slosky avatar
Written by Matt Slosky
Updated over a week ago

May 2, 2019

Summary of Impact
Between 14:37 and 16:15 CT on May 2, 2019, users experienced intermittent connectivity issues for the Avianis application and API services due to Microsoft network infrastructure issues.

Timeline (All times 24-hour Central US)

  • 05/02 14:37 - First reported to Avianis around issues running reports. Avianis engineers investigated the issue and escalated to Microsoft.

  • 05/02 15:40 - The Microsoft status site was updated indicating that they were investigating intermittent connectivity issue with their Network Infrastructure across all regions. The connectivity issues affected multiple Microsoft cloud services including Office 365.

  • 05/02 16:15 - Microsoft restored network connectivity and all application functions resumed. Avianis engineers confirmed issues had been resolved.

Preliminary Root Cause
Microsoft was experiencing intermittent Network Infrastructure issues across all of their data center regions. Engineers identified the underlying root cause as a name server delegation change affecting DNS resolution and resulting in downstream impact to various services.


April 30, 2019

Summary of Impact
Between 12:00 and 15:00 CT on April 30, 2019, users experienced intermittent performance issues and potential timeouts within certain areas of the Avianis application due to abnormally high infrastructure loads.

Timeline (All times 24-hour Central US)

  • 04/30 12:00 - Avianis determined that multiple users were intermittently experiencing slower load times in the application. Auto-scale operations began to accommodate for the elevated server load.

  • 04/30 13:05 - Avianis Platform Engineering team escalated the issue to Microsoft to further diagnose and mitigate the issue.

  • 04/30 15:00 - The performance issues were mitigated and all customers were notified.

Preliminary Root Cause
The Avianis data servers experienced unusual loads that caused performance degradation across certain areas of the application that have more intensive data requirements. This caused intermittent timeouts for users in these areas.


The servers automatically scaled to accommodate the extra load while the Avianis Development team continued to investigate the cause of the higher traffic with Microsoft engineers and all applications were restored to normal performance levels.

Preventative Steps Taken

  • The Avianis data server resource pool has been increased to minimize impact on users if unusually high data loads are experienced in the future.

  • The Avianis Engineering team is performing analysis to determine if there are refactoring opportunities in the more data intensive areas of the application.

Ref: A-I01

Did this answer your question?