IQI Report - App Insights as the Source of Truth
There are multiple similar functions happening that are similar within Optimizely. They all need to be utilizing the same data a the singular source of truth. They are:
Incident Management, primary data is App Insights.
Deep diagnostics done by Problem Management, primary data is App Insights
App Insights Dashboard
IQI Health Check reporting, primary information (aside from pingdom's availability info which for legal reasons) is Cloud Flare for health information.
The inconsistency of source data causes issues because we cannot make a consistent presentation of the information when the source being utilized is entirely different.
It also adds possibility of deviation of information because CloudFlare adds network hops and the possibility of transforming HTTP Status Codes, etc. So, the information can be altered in an unproductive way.
If we want to add CloudFlare reporting, we should add targeted CloudFlare specific reports to the IQI report. These should identify discrepancies within the CloudFlare environment that are frequently done as differentials between App Insights and CloudFlare - such as delays in response from CloudFlare that are not caused by backend systems.
-
Currently, the IQI report offers information such as a breakdown of HTTP status codes. Unfortunately, that does not offer the customer or partner information they can actively take action on because it requires further investigation on their part to make it actionable.
We want to bridge the gap between the data and being able to take action, on our side. The information we offer within the Dashboard is much closer to directly actionable, in many cases it is directly actionable.
For example, on the dashboard we offer a EpiCms database deadlock report. The customer or partner needs to identify which queries are causing the deadlocks, but we could surface that information up as well. We just need to do some additional work to do that.
But, this information is actionable. The same information surfaced on the IQI is simply a 500 error, which the developer has no idea what to do with and that could be an aggregation of 10 different underlying issues.