No amount of application performance assurance activities can guarantee 100% Production Availability or absolute perfect performance because some changes in production cannot be anticipated. It is with this reason that operation teams use Production Application Monitoring initiatives to keep an eye on applications on a 24 x 7 basis. However, when application performance degrades for a critical application, the loss of revenue, reputational damage or other negative consequences can be detrimental to the business and thus such a problem needs to be ironed out urgently. Without a clear process and suitable technology, most teams struggle for days or longer to identify and rectify the root cause of a problem and usually teams from each technology stack get involved when ultimately the problem is very specific to one technology stack. A top-down approach is thus required, one that first identifies the fault domain, then produces high level information proving the problem and then gives deep-dive diagnostics information to identify the specific code that is affected.

Application Performance Troubleshooting

Application Performance Troubleshooting disciplines that IT Ecology assists with include:
Fault Domain Isolation is performed to isolate the problem domain (Client, Network, Server) when an issue arises. Armed with this information quickly, the relevant team can now focus on pinpointing the problem in their domain whilst other teams are freed up to continue with other operational tasks.

Root-cause Analysis

Root-cause Analysis is performed once the fault domain of a problem has been isolated. If network congestion is for example the problem, then a traffic analysis exercise can show the reason for congestion. If the server farm is the cause, then infrastructure and application statistics analysis may give a clear indication which server and what underlying resource is causing the slowdown.

Deep-Dive diagnostics

In today’s complex and highly virtualised environments simple resource monitoring is not sufficient to drill down into the application code (e.g. method or SQL call) to see which code is either causing the significant resource constraint or which is most affected in terms of performance by resource issues. Deep-Dive diagnostics enable such deep inspection so that the correct resolution can quickly be identified and applied.