Troubleshooting SQL and Code Challenges on Azure

Sometimes we are engaged to solve tough performance problems. We’ve all been in situations where the infrastructure team is blaming the developers, the developers are blaming the infrastructure, and the product or project manager is frustrated because nothing is getting solved.

Recently we were asked to provide a second set of eyes on a problem that had been causing delays on an important project. The SQL database was hosted in an FCI cluster on Azure IaaS, the application was running as Azure App Services, and ultimately there were “random and sporadic” issues. Random and sporadic are every troubleshooter’s favorite words of course, because we like repeatable and predictable… but that’s just not the way the world works!

We worked with their development team to implement Azure App Insights in order to log the random and sporadic failures so that we could find them quickly instead of hoping to be able to reproduce the errors.

AppInsights.PNG

Setting up App insights as a VM is as easy as:

AppInsights

However for setting up against App Services, there is great documentation at docs.microsoft.com to learn how you use special tags to monitor the code.

After some due diligence we configured memory threshold limitations on the SQL cluster to help eliminate several memory pressure events that were being triggered. We believed this would help with a few memory management issues, but on continued investigation we identified a few memory issues that appeared to be the result of unhandled exceptions in the code.

Although the failed methods reported by the developers were stating a connection error, the actual exceptions were throwing various logic issues (manifesting as closed connections in some cases), three examples of which are obfuscated and shown below.



Sample 1: Referenced Database does not exist; causes a 1.3 second timeout

CUSTOMER-USE-APP-API Log Query:

// All telemetry for Operation ID: xxx-yyy1
union *
// Apply filters
| where timestamp > datetime("2019-03-05T19:38:26.111Z") and timestamp < datetime("2019-03-07T19:38:26.111Z")
| where operation_Id == "xxx-yyy1"

Failed method

System.Data.SqlClient.SqlConnection.OnError

Exception
System.Data.SqlClient.SqlException (0x80131904): Invalid object name 'XXX.YYYY'.

  

Sample 2: Unhandled Deadlocked resource exceptions; causes a 3.4 minute timeout

CUSTOMER-USE-APP-API Log Query:

// All telemetry for Operation ID: xxx-yyy2
union *
// Apply filters
| where timestamp > datetime("2019-03-03T23:05:13.481Z") and timestamp < datetime("2019-03-05T23:05:13.481Z")
| where operation_Id == "xxx-yyy2"

Failed Method

System.Data.SqlClient.SqlConnection.OnError

Exception

System.Data.SqlClient.SqlException (0x80131904): The ROLLBACK TRANSACTION request has no corresponding BEGIN TRANSACTION.

 

Sample 3: Circular reference causing a 1.3 second delay

CUSTOMER-USE-APP-API Log Query:

// All telemetry for Operation ID: xxx-yyy3
union *
// Apply filters
| where timestamp > datetime("2019-03-05T19:38:27.181Z") and timestamp < datetime("2019-03-07T19:38:27.181Z")
| where operation_Id == "xxx-yyy3"

Failed Method

Newtonsoft.Json.Serialization.JsonSerializerInternalWriter.CheckForCircularReference

 

Ultimately, a neutral look by Infused Innovations as an outside party, paired with Azure App Insights and our SQL expertise, helped move the project forward and helped put the focus back on areas of opportunity in both the deployment of the SQL cluster and the structure and error trapping of the code as well as the frameworks selected.