Got a reproducible bug during performance test

Start test with an estimate 50-100 RPS per WFE and adjust from  there.  Things like network, AD, etc. will play a factor from there.

Always the same pages causing the issue (100% CPU on 8 cores just by pinging one page using only 40 users). But there aren’t really any crashes or leaks. None of the worker processes is recycled either. I ran PerfMon and PAL. What popped out was that the ASP.NET Request Execution Time was VERY high. Also, they have a lot of Deferred Procedure Calls when we load test. They use NIC teaming…. Well, and they love XSLT. It’s everywhere on those pages…

First Step: collecting some memory dumps using DebugDiag — . You may want to setup two rules – one crash dump and another for performance.

if you are seeing lots of DPC, you may have some issues with some drivers of which the SharePoint problem is a symptom. You may want to try the Tracelog  utility to find out where the DPC events are coming from —

Example 15: Measuring DPC/ISR Time

You can measure the amount of time that a driver spends in deferred procedure calls (DPCs) and interrupt service routines (ISRs) by tracing these events in the Windows kernel. This information will help you to minimize the time the driver spends at higher IRQLs, making the driver and the system more efficient.

Microsoft recommends that DPCs should not run longer than 100 microseconds and ISRs should not run longer than 25 microseconds. For the most recent version of the Windows Logo Program System and Device Requirements, see the Windows Logo Program Windows Vista Logo Program web page.

The procedure that is described in this section includes the following steps:

One guy had perf issues for my customer in the past and after a lot of digging we found out that the teamed
NICs in the DB servers are the problem. We broke the teaming and teamed back the NICs and that
resolved the issue. They later updated their NIC drivers.