記一次 .NET 某工廠MES系統(tǒng) 內(nèi)存泄漏分析
一:背景
1. 講故事
上個(gè)月有位朋友加微信求助,說他的程序跑著跑著就內(nèi)存爆掉了,尋求如何解決,截圖如下:

從聊天內(nèi)容看,這位朋友壓力還是蠻大的,話說這貌似是我分析的第三個(gè) MES 系統(tǒng)了,看樣子 .NET 在傳統(tǒng)工廠是巨無霸的存在哈。。。
話不多說,一起用 Windbg 一探究竟吧。
二:Windbg 分析
1. 托管還是非托管
先看下進(jìn)程的commit內(nèi)存,用 !address -summary 即可。
0:000>?!address?-summary
?????????????????????????????????????
Mapping?file?section?regions...
Mapping?module?regions...
Mapping?PEB?regions...
Mapping?TEB?and?stack?regions...
Mapping?heap?regions...
Mapping?page?heap?regions...
Mapping?other?regions...
Mapping?stack?trace?database?regions...
Mapping?activation?context?regions...
---?Type?Summary?(for?busy)?------?RgnCount?-----------?Total?Size?--------?%ofBusy?%ofTotal
MEM_PRIVATE?????????????????????????????971??????????e7d6b000?(???3.622?GB)??95.24%???90.56%
MEM_IMAGE??????????????????????????????1175???????????ac5d000?(?172.363?MB)???4.43%????4.21%
MEM_MAPPED???????????????????????????????34????????????d08000?(??13.031?MB)???0.33%????0.32%
---?State?Summary?----------------?RgnCount?-----------?Total?Size?--------?%ofBusy?%ofTotal
MEM_COMMIT?????????????????????????????1806??????????edfd9000?(???3.719?GB)??97.77%???92.97%
MEM_FREE????????????????????????????????190???????????c920000?(?201.125?MB)????????????4.91%
MEM_RESERVE?????????????????????????????374???????????56f7000?(??86.965?MB)???2.23%????2.12%
...
可以看到,當(dāng)前占用內(nèi)存是 3.79G,從內(nèi)存地址看是一個(gè) 32bit 程序,看樣子程序在崩潰的邊緣哈??????,接下來我們看下 托管堆內(nèi)存 占用,使用 !eeheap -gc 命令。
0:000>?!eeheap?-gc
Number?of?GC?Heaps:?1
generation?0?starts?at?0xf35a90c0
generation?1?starts?at?0xf33a1000
generation?2?starts?at?0x01db1000
ephemeral?segment?allocation?context:?none
?segment?????begin??allocated??????size
?...
?f7790000??f7791000??f8058854??0x8c7854(9205844)
f33a0000??f33a1000??f3ba6e84??0x805e84(8412804)
Large?object?heap?starts?at?0x02db1000
?segment?????begin??allocated??????size
02db0000??02db1000??0387e988??0xacd988(11327880)
Total?Size:??????????????Size:?0xdcab5ca8?(3702217896)?bytes.
------------------------------
GC?Heap?Size:????Size:?0xdcab5ca8?(3702217896)?bytes.
從輸出信息看,托管堆內(nèi)存占用 3.7G,這是一個(gè)相對(duì)簡(jiǎn)單的 托管內(nèi)存泄漏 問題了。
2. 探究托管堆
要查看托管堆還是很簡(jiǎn)單的,先來一個(gè)大一統(tǒng)的命令 !dumpheap -stat。
0:000>?!dumpheap?-stat
Statistics:
??????MT????Count????TotalSize?Class?Name
...
04b045d0????67663?????25711940?xxx.Product.Mes.DataStore.EF.MesDbContext
719f0100??3458387?????41500644?System.Object
719f1b84???281492?????42391384?System.Int32[]
0489adb0??2238394?????44767880?xxx.Application.Features.FeatureChecker
71551e00??2238503?????53724072?System.Collections.Generic.List`1[[System.String,?mscorlib]]
07c473e0??5615923?????67391076?System.Data.Entity.Core.Objects.Internal.ObjectQueryExecutionPlanFactory
07c68954??5683589?????68203068?System.Data.Entity.Core.Common.Internal.Materialization.Translator
04c7e3a8??4042677?????71990132?Castle.DynamicProxy.IInterceptor[]
014a80c0??3142755?????80480594??????Free
042ecd18??5869494?????93911904?xxxx.Domain.Uow.UnitOfWorkInterceptor
096ed32c????67663?????97164068?System.Collections.Generic.Dictionary`2+Entry[[System.Type,?mscorlib],[System.Data.Entity.Internal.Linq.IInternalSetAdapter,?EntityFramework]][]
0488edb0?12641117????151693404?xxx.Domain.Uow.AsyncLocalCurrentUnitOfWorkProvider
0488fa50?10769173????215383460?xxx.Domain.Uow.UnitOfWorkManager
07cc0fb0??5548261????355088704?System.Data.Entity.Core.Objects.EntitySqlQueryState
719efd60?11275964???1268805768?System.String
從卦象上看,沉底的基本都是和 EF 相關(guān)的類,相對(duì)來說 string 一般都是被這些 EF 所持有,而且還發(fā)現(xiàn)了一個(gè)非常異常的地方,就是 MesDbContext 居然有 6w 多,看樣子有些不正常,接下來就抽幾個(gè)查一下引用,大概都是如下輸出:
0:000>?!gcroot?17d2e438
HandleTable:
????014313c8?(pinned?handle)
????->?02dd9020?System.Object[]
????->?0260abf4?System.Collections.Concurrent.ConcurrentDictionary`2[[System.Data.Entity.DbContext,?EntityFramework],[System.Collections.Concurrent.ConcurrentDictionary`2[[System.String,?mscorlib],[EntityFramework.DynamicFilters.DynamicFilterParameters,?EntityFramework.DynamicFilters]],?mscorlib]]
????->?b96074a4?System.Collections.Concurrent.ConcurrentDictionary`2+Tables[[System.Data.Entity.DbContext,?EntityFramework],[System.Collections.Concurrent.ConcurrentDictionary`2[[System.String,?mscorlib],[EntityFramework.DynamicFilters.DynamicFilterParameters,?EntityFramework.DynamicFilters]],?mscorlib]]
????->?02fcddb0?System.Collections.Concurrent.ConcurrentDictionary`2+Node[[System.Data.Entity.DbContext,?EntityFramework],[System.Collections.Concurrent.ConcurrentDictionary`2[[System.String,?mscorlib],[EntityFramework.DynamicFilters.DynamicFilterParameters,?EntityFramework.DynamicFilters]],?mscorlib]][]
????->?b955eecc?System.Collections.Concurrent.ConcurrentDictionary`2+Node[[System.Data.Entity.DbContext,?EntityFramework],[System.Collections.Concurrent.ConcurrentDictionary`2[[System.String,?mscorlib],[EntityFramework.DynamicFilters.DynamicFilterParameters,?EntityFramework.DynamicFilters]],?mscorlib]]
????->?17d2e438?xxx.DataStore.EF.MesDbContext
從引用鏈來看,這些 MesDbContext 都是被 ConcurrentDictionary 所持有,接下來需要判斷下這個(gè)字典的 size 到底有多大,可以用 !objsize 命令。
0:000>?!objsize?0260abf4
e06d7363?Exception?in?c:\mysymbols\SOS_x86_x86_4.7.3701.00.dll\5F4FF1AE6f0000\SOS_x86_x86_4.7.3701.00.dll.objsize?debugger?extension.
??????PC:?757ea842??VA:?022ce8f4??R/W:?19930520??Parameter:?7b9bb528
0:000>?!DumpObj?/d?02fcddb0
Name:????????System.Collections.Concurrent.ConcurrentDictionary`2+Node[[System.Data.Entity.DbContext,?EntityFramework],[System.Collections.Concurrent.ConcurrentDictionary`2[[System.String,?mscorlib],[EntityFramework.DynamicFilters.DynamicFilterParameters,?EntityFramework.DynamicFilters]],?mscorlib]][]
MethodTable:?0973cb60
EEClass:?????715c4fc0
Size:????????573440(0x8c000)?bytes
Array:???????Rank?1,?Number?of?elements?143357,?Type?CLASS?(Print?Array)
Fields:
None
經(jīng)過漫長(zhǎng)的等待,害,最后報(bào)錯(cuò)了,但也可以看到這個(gè) dictionary 有 14.3w 條記錄, 接下來嚴(yán)峻的問題就來了,這個(gè) ConcurrentDictionary 是朋友定義的還是框架內(nèi)的?所以下一步就需要找到它的歸屬類?
3. 探究字典到底屬于哪個(gè)類
要想找到 字典 的歸屬類,這個(gè)相對(duì)有點(diǎn)麻煩,我為此在 B 站上錄了一集專門聊這個(gè),有興趣的朋友可以看一看。
總而言之,整體思路是:
先找 17d2e438(MesDbContext) 在 0260abf4(dictionary) 中的 address (address1) 。 再?gòu)膬?nèi)存中尋找這個(gè) address(address1) 的 address (address2)。
這個(gè) address2 就存在于那個(gè)引用此dictionary的方法體,然后就可以反編譯出該方法體,查看它的EEClass,最終找到所屬類名。
接下來我們就實(shí)戰(zhàn)一下。
查看 object[] 的 size。
0:000>?!do?02dd9020
Name:????????System.Object[]
MethodTable:?719f0154
EEClass:?????715c4fc0
Size:????????65532(0xfffc)?bytes
Array:???????Rank?1,?Number?of?elements?16380,?Type?CLASS?(Print?Array)
Fields:
None
尋找 address1
用 s -d 搜索內(nèi)存。
0:000>?s?-d?02dd9020?L?0xfffc?0260abf4
02de11a4??0260abf4?0260ad04?0260ad2c?08320d20??..`...`.,.`.?.2.
這個(gè) 02de11a4 就是我要找的 address1,這里稍微解釋一下,-d 表示按 32bit 搜索, -q 按 64bit 搜索, L?0xfffc 是 object[] 數(shù)組的 size 。
尋找 address2
這里將地址拆成 02de11a4 = a4 11 de 02 去搜索,不然有坑的哈。
0:000>?s-b?0?L?0xffffffff?a4?11?de?02
0695d2f9??a4?11?de?02?e8?be?14?f9-6b?b9?18?3c?34?70?e8?bc??........k..<4p..
09e9438b??a4?11?de?02?39?09?e8?9a-11?af?67?8b?f0?a1?bc?11??....9.....g.....
從輸出看,有兩個(gè)代碼區(qū)域用到了 dict, 因?yàn)槭侨珒?nèi)存搜索的,這里就挑選最后一個(gè) address2=09e9438b 吧。
反編譯address2
使用 !U 反編譯,然后再 !name2ee + !dumpmd + !dumpclass 即可。
0:000>?!U?09e9438b
Normal?JIT?generated?code
EntityFramework.DynamicFilters.DynamicFilterExtensions.GetOrCreateScopedFilterParameters(System.Data.Entity.DbContext,?System.String)
Begin?09e94320,?size?1e1
09e94320?55??????????????push????ebp
...
09e9433a?8bf1????????????mov?????esi,ecx
09e9433c?b95088ea09??????mov?????ecx,9EA8850h?(MT:?EntityFramework.DynamicFilters.DynamicFilterExtensions+<>c__DisplayClass71_0)
09e94341?e882ed5af7??????call????014430c8?(JitHelp:?CORINFO_HELP_NEWSFAST)
09e94346?8bf8????????????mov?????edi,eax
09e94348?8d5704??????????lea?????edx,[edi+4]
09e9434b?e800a5a568??????call????clr!JIT_WriteBarrierESI?(728ee850)
0:000>?!name2ee?*!EntityFramework.DynamicFilters.DynamicFilterExtensions.GetOrCreateScopedFilterParameters
Module:??????0973aef4
Assembly:????EntityFramework.DynamicFilters.dll
Token:???????0600005e
MethodDesc:??0973b8fc
Name:????????EntityFramework.DynamicFilters.DynamicFilterExtensions.GetOrCreateScopedFilterParameters(System.Data.Entity.DbContext,?System.String)
JITTED?Code?Address:?09e94320
0:000>?!dumpmd?0973b8fc
Method?Name:??EntityFramework.DynamicFilters.DynamicFilterExtensions.GetOrCreateScopedFilterParameters(System.Data.Entity.DbContext,?System.String)
Class:????????0974c7d8
MethodTable:??0973b938
mdToken:??????0600005e
Module:???????0973aef4
IsJitted:?????yes
CodeAddr:?????09e94320
Transparency:?Critical
0:000>?!dumpclass?0974c7d8
Class?Name:??????EntityFramework.DynamicFilters.DynamicFilterExtensions
mdToken:?????????02000006
File:????????????D:\xxx\Debug\EntityFramework.DynamicFilters.dll
Parent?Class:????715415b0
Module:??????????0973aef4
Method?Table:????0973b938
Vtable?Slots:????4
Total?Method?Slots:??20
Class?Attributes:????100181??Abstract,?
Transparency:????????Critical
NumInstanceFields:???0
NumStaticFields:?????5
??????MT????Field???Offset?????????????????Type?VT?????Attr????Value?Name
0973bfcc??400000d????????c?....DynamicFilters]]??0???static?0260a9d4?_GlobalParameterValues
0973c3f4??400000e???????10?...ers]],?mscorlib]]??0???static?0260abf4?_ScopedParameterValues
70343c18??400000f???????14?...tring,?mscorlib]]??0???static?0260ad04?_PreventDisabledFilterConditions
71a34804??4000010???????43???????System.Boolean??1???static????????1?_Initialized
05ec9adc??4000011???????18?...rsion,?mscorlib]]??0???static?0260ad2c?_OracleInstanceVersions
終于給找到了,原來是EF底層的 EntityFramework.DynamicFilters.DynamicFilterExtensions 類哈,導(dǎo)出源碼如下:

最后就是拿 6w多的 MesDbContext 和 14w+的 _ScopedParameterValues 字典和朋友做了溝通,朋友也找到了解決辦法。



三:總結(jié)
根據(jù)朋友提供的信息,最后注釋掉了構(gòu)造函數(shù)中的 MesDbContext 解決了問題,EF我不熟,有懂的朋友可以留言分析下哈。
