We have the following situation:
- Server GC enabled
- COMPlus_InteropValidatePinnedObjects enabled
- .NET 4.6.1 was rolled out to 500+ machines over the weekend
- 600+ uploaded crashes within 4 days
As such, we have had a very high volume of crashes with the following stack trace:
clr! | SVR::seg_mapping_table_segment_of |
clr! | SVR::gc_heap::find_segment |
clr! | SVR::GCHeap::NextObj |
clr! | StubHelpers::ValidateObjectInternal |
clr! | StubHelpers::ProcessByrefValidationList |
clr! | CNameSpace::GcStartWork |
clr! | SVR::gc_heap::garbage_collect |
clr! | SVR::gc_heap::gc_thread_function |
clr! | SVR::gc_heap::gc_thread_stub |
In nearly all of these crashes, I observe 1 or more threads that are in the middle of PInvoke calls. When inspecting the dump, the address of the object passed to seg_mapping_table_segment_of is pretty much trashed. Some of the PInvoke calls are from our code and suspect, however, there are a significant number that are largely from the .NET Framework. Is it possible that the Server GC optimizations in .NET 4.6/4.6.1 resulted in a regression with this validation check?
Looking at the ref source, the following stands out:
ProcessByrefValidationList() - does not hold the lock guarding the entries/index
StubHelpers::ValidateByref() - locks while adding the entries and potentially growing the buffer