Hi,
I''ll try to explain our problem we experience with one of our application.
Our application is a multi AppDomain based Windows Service using the framework .Net 4. This is an application with only managed code libraries. The database provider for our application is npgsql (Postgres). Our architecture is based on a Controller (Main thread) which runs multiple jobs (each job is in his own application domain). Theses jobs are doing lot of stuff but mostly using npgsql to communicate with our database.
This service is running continuously on multiple computers (Windows 7 64bits or Windows Server 2003 / 2008).
The problem : Sometimes the application blocks and our threads are not running anymore. This happens after some days,weeks or months or even never for some installations.
I used Process Explorer to see what's happen and it seems that one thread (17) is running continuously (CPU core ~90%) and the managed callstack (got using WinDbg) is always :
Child SP IP Call Site0000000003ffe8a8 000007fef89c4efe [PrestubMethodFrame: 0000000003ffe8a8] System.Net.ContextAwareResult.Complete(IntPtr)
0000000003ffe910 000007ff02372045 System.Net.Sockets.Socket.ConnectCallback()
0000000003ffe990 000007ff02371edd System.Net.Sockets.Socket.RegisteredWaitCallback(System.Object, Boolean)
0000000003ffea10 000007fef7ef9fdc System.Threading._ThreadPoolWaitOrTimerCallback.PerformWaitOrTimerCallback(System.Object, Boolean)
0000000003ffec98 000007fef89e44c4 [GCFrame: 0000000003ffec98]
0000000003ffee70 000007fef89e44c4 [DebuggerU2MCatchHandlerFrame: 0000000003ffee70]
0000000003fff048 000007fef89e44c4 [ContextTransitionFrame: 0000000003fff048]
0000000003fff230 000007fef89e44c4 [DebuggerU2MCatchHandlerFrame: 0000000003fff230]
All there others threads are blocking on Garbage collector.
0:017> !threadsThreadCount: 30
UnstartedThread: 0
BackgroundThread: 11
PendingThread: 0
DeadThread: 15
Hosted Runtime: no
PreEmptive Lock
ID OSID ThreadOBJ State GC GC Alloc Context Domain Count APT Exception
0 1 930 00000000004e41f0 6020 Enabled 0000000000000000:0000000000000000 00000000004dd100 0 STA
2 2 940 00000000004ea540 b220 Enabled 0000000010b9d338:0000000010b9f1e8 00000000004dd100 0 MTA (Finalizer)
5 7 974 000000000193e920 b020 Enabled 0000000010c75788:0000000010c75980 00000000004dd100 0 MTA
6 8 978 00000000019407b0 1220 Enabled 0000000010b9f2d0:0000000010ba11e8 00000000004dd100 0 Ukn
7 9 97c 0000000001979ed0 100a220 Enabled 0000000000000000:0000000000000000 00000000004dd100 0 MTA (Threadpool Worker)
8 a ad8 0000000001938a80 1000220 Enabled 0000000000000000:0000000000000000 00000000004dd100 0 Ukn (Threadpool Worker)
XXXX d 00000000035df4c0 19820 Enabled 0000000000000000:0000000000000000 00000000004dd100 0 Ukn
10 15 b8c 00000000035e24c0 b020 Enabled 0000000010bdb278:0000000010bdd1e8 000000000538ee00 0 MTA
11 1c 10b4 00000000055c4a20 b220 Enabled 0000000000000000:0000000000000000 000000000538ee00 1 MTA
XXXX 22 000000000543ff50 19820 Enabled 0000000000000000:0000000000000000 00000000004dd100 0 MTA
12 f 2b8 0000000006400dc0 1019220 Enabled 0000000010c412f8:0000000010c43258 00000000004dd100 0 Ukn (Threadpool Worker)
13 1d 140 000000000542b5b0 1009220 Enabled 0000000000000000:0000000000000000 00000000004dd100 0 MTA (Threadpool Worker)
14 10 12a8 0000000005440d70 1009220 Enabled 0000000010c01288:0000000010c031e8 00000000004dd100 0 MTA (Threadpool Worker)
15 12 990 000000000376d220 1009220 Enabled 0000000010bcb288:0000000010bcd1e8 00000000004dd100 0 MTA (Threadpool Worker)
16 5 9f4 000000000376d930 1009220 Enabled 0000000000000000:0000000000000000 00000000004dd100 0 MTA (Threadpool Worker)
17 21 e70 0000000005469770 8009222 Disabled 0000000010c4d2e0:0000000010c4f258 0000000005ef1660 0 MTA (Threadpool Completion Port)
XXXX 19 0000000006048940 19820 Enabled 0000000000000000:0000000000000000 00000000004dd100 0 Ukn
XXXX 14 00000000064f6b90 19820 Enabled 0000000000000000:0000000000000000 00000000004dd100 0 Ukn
XXXX 17 00000000035e32e0 19820 Enabled 0000000000000000:0000000000000000 00000000004dd100 0 Ukn
XXXX 3 00000000062ca410 19820 Enabled 0000000000000000:0000000000000000 00000000004dd100 0 Ukn
XXXX b 00000000062c9d00 19820 Enabled 0000000000000000:0000000000000000 00000000004dd100 0 Ukn
XXXX 18 00000000036d5250 19820 Enabled 0000000000000000:0000000000000000 00000000004dd100 0 MTA
XXXX 6 00000000036d6070 19820 Enabled 0000000000000000:0000000000000000 00000000004dd100 0 Ukn
XXXX 24 00000000036d5960 19820 Enabled 0000000000000000:0000000000000000 00000000004dd100 0 Ukn
XXXX c 00000000063460c0 19820 Enabled 0000000000000000:0000000000000000 00000000004dd100 0 MTA
XXXX 1f 00000000062cab20 19820 Enabled 0000000000000000:0000000000000000 00000000004dd100 0 MTA
XXXX 1e 00000000060a06f0 19820 Enabled 0000000000000000:0000000000000000 00000000004dd100 0 Ukn
XXXX 13 00000000062c95f0 19820 Enabled 0000000000000000:0000000000000000 00000000004dd100 0 MTA
XXXX e 00000000063467d0 19820 Enabled 0000000000000000:0000000000000000 00000000004dd100 0 Ukn
19 1a 1078 0000000006346ee0 b020 Disabled 0000000010c81a60:0000000010c81a70 0000000005ef1660 3 MTA (GC)
The highlighted line is the thread which is running continusly. The thread 19 is one of our thread which requesting a Gargabe Collector run .
For me, the garbage collector is waiting that the thread 17 (Managed Thread ) is suspended to be able to collect objects and release them. Why this thread is not going in suspended state like the other ones ? Because PreEmptiveGC is Disabled. Right ?
So, why a managed thread can stay always in preEmptive GC Disabled continuously ????
I'm suspecting npgsql library because this is the only one progam code that uses asynchronous socket. But I don't understand what could be the problem with async sockets and deadlock affecting GC....
Please, tell me if i'm right and what I missed ?
Your help is really appreciated !!!!!!!
David