騰訊二面:ThreadLocal比FastThreadLocal慢在哪里?
你知道的越多,不知道的就越多,業(yè)余的像一棵小草!
你來(lái),我們一起精進(jìn)!你不來(lái),我和你的競(jìng)爭(zhēng)對(duì)手一起精進(jìn)!
編輯:業(yè)余草
blog.csdn.net/mycs2012
推薦:https://www.xttblog.com/?p=5300
吊打 ThreadLocal,談?wù)?FastThreadLocal 為啥能這么快?
1 FastThreadLocal的引入背景和原理簡(jiǎn)介
既然jdk已經(jīng)有ThreadLocal,為何netty還要自己造個(gè)FastThreadLocal?FastThreadLocal快在哪里?
這需要從jdk ThreadLocal的本身說(shuō)起。如下圖:

在java線(xiàn)程中,每個(gè)線(xiàn)程都有一個(gè)ThreadLocalMap實(shí)例變量(如果不使用ThreadLocal,不會(huì)創(chuàng)建這個(gè)Map,一個(gè)線(xiàn)程第一次訪問(wèn)某個(gè)ThreadLocal變量時(shí),才會(huì)創(chuàng)建)。該Map是使用線(xiàn)性探測(cè)的方式解決hash沖突的問(wèn)題,如果沒(méi)有找到空閑的slot,就不斷往后嘗試,直到找到一個(gè)空閑的位置,插入entry,這種方式在經(jīng)常遇到hash沖突時(shí),影響效率。
FastThreadLocal(下文簡(jiǎn)稱(chēng)ftl)直接使用數(shù)組避免了hash沖突的發(fā)生,具體做法是:每一個(gè)FastThreadLocal實(shí)例創(chuàng)建時(shí),分配一個(gè)下標(biāo)index;分配index使用AtomicInteger實(shí)現(xiàn),每個(gè)FastThreadLocal都能獲取到一個(gè)不重復(fù)的下標(biāo)。當(dāng)調(diào)用ftl.get()方法獲取值時(shí),直接從數(shù)組獲取返回,如return array[index],如下圖:

2 實(shí)現(xiàn)源碼分析
根據(jù)上文圖示可知,ftl的實(shí)現(xiàn),涉及到InternalThreadLocalMap、FastThreadLocalThread和FastThreadLocal幾個(gè)類(lèi),自底向上,我們先從InternalThreadLocalMap開(kāi)始分析。
InternalThreadLocalMap類(lèi)的繼承關(guān)系圖如下:

2.1 UnpaddedInternalThreadLocalMap的主要屬性
static?final?ThreadLocal?slowThreadLocalMap?=?new?ThreadLocal();
static?final?AtomicInteger?nextIndex?=?new?AtomicInteger();
Object[]?indexedVariables;
數(shù)組indexedVariables就是用來(lái)存儲(chǔ)ftl的value的,使用下標(biāo)的方式直接訪問(wèn)。nextIndex在ftl實(shí)例創(chuàng)建時(shí)用來(lái)給每個(gè)ftl實(shí)例分配一個(gè)下標(biāo),slowThreadLocalMap在線(xiàn)程不是ftlt時(shí)使用到。
2.2 InternalThreadLocalMap分析
InternalThreadLocalMap的主要屬性:
//?用于標(biāo)識(shí)數(shù)組的槽位還未使用
public?static?final?Object?UNSET?=?new?Object();
/**
?*?用于標(biāo)識(shí)ftl變量是否注冊(cè)了cleaner
?* BitSet簡(jiǎn)要原理:
?* BitSet默認(rèn)底層數(shù)據(jù)結(jié)構(gòu)是一個(gè)long[]數(shù)組,開(kāi)始時(shí)長(zhǎng)度為1,即只有l(wèi)ong[0],而一個(gè)long有64bit。
?*?當(dāng)BitSet.set(1)的時(shí)候,表示將long[0]的第二位設(shè)置為true,即0000?0000?...?0010(64bit),則long[0]==2
?*?當(dāng)BitSet.get(1)的時(shí)候,第二位為1,則表示true;如果是0,則表示false
?*?當(dāng)BitSet.set(64)的時(shí)候,表示設(shè)置第65位,此時(shí)long[0]已經(jīng)不夠用了,擴(kuò)容處long[1]來(lái),進(jìn)行存儲(chǔ)
?*
?*?存儲(chǔ)類(lèi)似?{index:boolean}?鍵值對(duì),用于防止一個(gè)FastThreadLocal多次啟動(dòng)清理線(xiàn)程
?*?將index位置的bit設(shè)為true,表示該InternalThreadLocalMap中對(duì)該FastThreadLocal已經(jīng)啟動(dòng)了清理線(xiàn)程
?*/
private?BitSet?cleanerFlags;?
private?InternalThreadLocalMap()?{
????????super(newIndexedVariableTable());
}
private?static?Object[]?newIndexedVariableTable()?{
????????Object[]?array?=?new?Object[32];
????????Arrays.fill(array,?UNSET);
????????return?array;
}
比較簡(jiǎn)單,newIndexedVariableTable()方法創(chuàng)建長(zhǎng)度為32的數(shù)組,然后初始化為UNSET,然后傳給父類(lèi)。之后ftl的值就保存到這個(gè)數(shù)組里面。注意,這里保存的直接是變量值,不是entry,這是和jdk ThreadLocal不同的。InternalThreadLocalMap就先分析到這,其他方法在后面分析ftl再具體說(shuō)。
2.3 ftlt的實(shí)現(xiàn)分析
要發(fā)揮ftl的性能優(yōu)勢(shì),必須和ftlt結(jié)合使用,否則就會(huì)退化到j(luò)dk的ThreadLocal。ftlt比較簡(jiǎn)單,關(guān)鍵代碼如下:
public?class?FastThreadLocalThread?extends?Thread?{
??//?This?will?be?set?to?true?if?we?have?a?chance?to?wrap?the?Runnable.
??private?final?boolean?cleanupFastThreadLocals;
??
??private?InternalThreadLocalMap?threadLocalMap;
??
??public?final?InternalThreadLocalMap?threadLocalMap()?{
????????return?threadLocalMap;
??}
??public?final?void?setThreadLocalMap(InternalThreadLocalMap?threadLocalMap)?{
????????this.threadLocalMap?=?threadLocalMap;
??}
}??
ftlt的訣竅就在threadLocalMap屬性,它繼承java Thread,然后聚合了自己的InternalThreadLocalMap。后面訪問(wèn)ftl變量,對(duì)于ftlt線(xiàn)程,都直接從InternalThreadLocalMap獲取變量值。
2.4 ftl實(shí)現(xiàn)分析
ftl實(shí)現(xiàn)分析基于netty-4.1.34版本,特別地聲明了版本,是因?yàn)樵谇宄牡胤?,該版本的源碼已經(jīng)注釋掉了ObjectCleaner的調(diào)用,和之前的版本有所不同。
2.4.1 ftl的屬性和實(shí)例化
private?final?int?index;
public?FastThreadLocal()?{
????index?=?InternalThreadLocalMap.nextVariableIndex();
}
非常簡(jiǎn)單,就是給屬性index賦值,賦值的靜態(tài)方法在InternalThreadLocalMap:
?public?static?int?nextVariableIndex()?{
????????int?index?=?nextIndex.getAndIncrement();
????????if?(index?0)?{
????????????nextIndex.decrementAndGet();
????????????throw?new?IllegalStateException("too?many?thread-local?indexed?variables");
????????}
????????return?index;
??}
可見(jiàn),每個(gè)ftl實(shí)例以步長(zhǎng)為1的遞增序列,獲取index值,這保證了InternalThreadLocalMap中數(shù)組的長(zhǎng)度不會(huì)突增。
2.4.2 get()方法實(shí)現(xiàn)分析
??public?final?V?get()?{
????????InternalThreadLocalMap?threadLocalMap?=?InternalThreadLocalMap.get();?//?1
????????Object?v?=?threadLocalMap.indexedVariable(index);?//?2
????????if?(v?!=?InternalThreadLocalMap.UNSET)?{
????????????return?(V)?v;
????????}
????????V?value?=?initialize(threadLocalMap);?//?3
????????registerCleaner(threadLocalMap);??//?4
????????return?value;
????}
先來(lái)看看***InternalThreadLocalMap.get()***方法如何獲取threadLocalMap:
=======================InternalThreadLocalMap=======================??
??public?static?InternalThreadLocalMap?get()?{
????????Thread?thread?=?Thread.currentThread();
????????if?(thread?instanceof?FastThreadLocalThread)?{
????????????return?fastGet((FastThreadLocalThread)?thread);
????????}?else?{
????????????return?slowGet();
????????}
????}
????
??private?static?InternalThreadLocalMap?fastGet(FastThreadLocalThread?thread)?{
????????InternalThreadLocalMap?threadLocalMap?=?thread.threadLocalMap();
????????if?(threadLocalMap?==?null)?{
????????????thread.setThreadLocalMap(threadLocalMap?=?new?InternalThreadLocalMap());
????????}
????????return?threadLocalMap;
????}????
因?yàn)榻Y(jié)合FastThreadLocalThread使用才能發(fā)揮FastThreadLocal的性能優(yōu)勢(shì),所以主要看fastGet方法。該方法直接從ftlt線(xiàn)程獲取threadLocalMap,還沒(méi)有則創(chuàng)建一個(gè)InternalThreadLocalMap實(shí)例并設(shè)置進(jìn)去,然后返回。
***threadLocalMap.indexedVariable(index)***就簡(jiǎn)單了,直接從數(shù)組獲取值,然后返回:
??public?Object?indexedVariable(int?index)?{
????????Object[]?lookup?=?indexedVariables;
????????return?index?????}如果獲取到的值不是UNSET,那么是個(gè)有效的值,直接返回。如果是UNSET,則初始化。
***initialize(threadLocalMap)***方法:
??private?V?initialize(InternalThreadLocalMap?threadLocalMap)?{
????????V?v?=?null;
????????try?{
????????????v?=?initialValue();
????????}?catch?(Exception?e)?{
????????????PlatformDependent.throwException(e);
????????}
????????threadLocalMap.setIndexedVariable(index,?v);?//?3-1
????????addToVariablesToRemove(threadLocalMap,?this);?//?3-2
????????return?v;
????}
「3-1」 獲取ftl的初始值,然后保存到ftl里的數(shù)組,如果數(shù)組長(zhǎng)度不夠則擴(kuò)充數(shù)組長(zhǎng)度,然后保存,不展開(kāi)。
「3-2」 「addToVariablesToRemove(threadLocalMap, this)」 的實(shí)現(xiàn),是將ftl實(shí)例保存在threadLocalMap內(nèi)部數(shù)組第0個(gè)元素的Set集合中。此處不貼代碼,用圖示如下:

registerCleaner(threadLocalMap)的實(shí)現(xiàn),netty-4.1.34版本中的源碼:
private?void?registerCleaner(final?InternalThreadLocalMap?threadLocalMap)?{
????????Thread?current?=?Thread.currentThread();
????????if?(FastThreadLocalThread.willCleanupFastThreadLocals(current)?||?threadLocalMap.isCleanerFlagSet(index))?{
????????????return;
????????}
????????threadLocalMap.setCleanerFlag(index);
????????//?TODO:?We?need?to?find?a?better?way?to?handle?this.
????????/*
????????//?We?will?need?to?ensure?we?will?trigger?remove(InternalThreadLocalMap)?so?everything?will?be?released
????????//?and?FastThreadLocal.onRemoval(...)?will?be?called.
????????ObjectCleaner.register(current,?new?Runnable()?{
????????????@Override
????????????public?void?run()?{
????????????????remove(threadLocalMap);
????????????????//?It's?fine?to?not?call?InternalThreadLocalMap.remove()?here?as?this?will?only?be?triggered?once
????????????????//?the?Thread?is?collected?by?GC.?In?this?case?the?ThreadLocal?will?be?gone?away?already.
????????????}
????????});
????????*/
}
由于ObjectCleaner.register這段代碼在該版本已經(jīng)注釋掉,而余下邏輯比較簡(jiǎn)單,因此不再做分析。關(guān)于ObjectCleaner,本文不做探討。
2.5 普通線(xiàn)程使用ftl的性能退化
隨著**get()**方法分析完畢,**set(value)**方法原理也呼之欲出,限于篇幅,不再單獨(dú)分析。前文說(shuō)過(guò),ftl要結(jié)合ftlt才能最大地發(fā)揮其性能,如果是其他的普通線(xiàn)程,就會(huì)退化到j(luò)dk的ThreadLocal的情況,因?yàn)槠胀ň€(xiàn)程沒(méi)有包含InternalThreadLocalMap這樣的數(shù)據(jù)結(jié)構(gòu),接下來(lái)我們看如何退化。
從InternalThreadLocalMap的***get()***方法看起:
=======================InternalThreadLocalMap=======================??
??public?static?InternalThreadLocalMap?get()?{
????????Thread?thread?=?Thread.currentThread();
????????if?(thread?instanceof?FastThreadLocalThread)?{
????????????return?fastGet((FastThreadLocalThread)?thread);
????????}?else?{
????????????return?slowGet();
????????}
????}
??private?static?InternalThreadLocalMap?slowGet()?{
???????//?父類(lèi)的類(lèi)型為jdk?ThreadLocald的靜態(tài)屬性,從該threadLocal獲取InternalThreadLocalMap
????????ThreadLocal?slowThreadLocalMap?=?UnpaddedInternalThreadLocalMap.slowThreadLocalMap;
????????InternalThreadLocalMap?ret?=?slowThreadLocalMap.get();
????????if?(ret?==?null)?{
????????????ret?=?new?InternalThreadLocalMap();
????????????slowThreadLocalMap.set(ret);
????????}
????????return?ret;
????}
從ftl看,退化操作的整個(gè)流程是:從一個(gè)jdk的ThreadLocal變量中獲取InternalThreadLocalMap,然后再?gòu)腎nternalThreadLocalMap獲取指定數(shù)組下標(biāo)的值,對(duì)象關(guān)系示意圖:

3 ftl的資源回收機(jī)制
在netty中對(duì)于ftl提供了三種回收機(jī)制:
自動(dòng):使用ftlt執(zhí)行一個(gè)被FastThreadLocalRunnable wrap的Runnable任務(wù),在任務(wù)執(zhí)行完畢后會(huì)自動(dòng)進(jìn)行ftl的清理。
手動(dòng):ftl和InternalThreadLocalMap都提供了remove方法,在合適的時(shí)候用戶(hù)可以(有的時(shí)候也是必須,例如普通線(xiàn)程的線(xiàn)程池使用ftl)手動(dòng)進(jìn)行調(diào)用,進(jìn)行顯示刪除。
自動(dòng):為當(dāng)前線(xiàn)程的每一個(gè)ftl注冊(cè)一個(gè)Cleaner,當(dāng)線(xiàn)程對(duì)象不強(qiáng)可達(dá)的時(shí)候,該Cleaner線(xiàn)程會(huì)將當(dāng)前線(xiàn)程的當(dāng)前ftl進(jìn)行回收。(netty推薦如果可以用其他兩種方式,就不要再用這種方式,因?yàn)樾枰砥鹁€(xiàn)程,耗費(fèi)資源,而且多線(xiàn)程就會(huì)造成一些資源競(jìng)爭(zhēng),在netty-4.1.34版本中,已經(jīng)注釋掉了調(diào)用ObjectCleaner的代碼。)
4 ftl在netty中的使用
ftl在netty中最重要的使用,就是分配ByteBuf。基本做法是:每個(gè)線(xiàn)程都分配一塊內(nèi)存(PoolArena),當(dāng)需要分配ByteBuf時(shí),線(xiàn)程先從自己持有的PoolArena分配,如果自己無(wú)法分配,再采用全局分配。但是由于內(nèi)存資源有限,所以還是會(huì)有多個(gè)線(xiàn)程持有同一塊PoolArena的情況。不過(guò)這種方式已經(jīng)最大限度地減輕了多線(xiàn)程的資源競(jìng)爭(zhēng),提高程序效率。
具體的代碼在 PoolByteBufAllocator的內(nèi)部類(lèi)PoolThreadLocalCache中:
??final?class?PoolThreadLocalCache?extends?FastThreadLocal<PoolThreadCache>?{
????@Override
????????protected?synchronized?PoolThreadCache?initialValue()?{
????????????final?PoolArena<byte[]>?heapArena?=?leastUsedArena(heapArenas);
????????????final?PoolArena?directArena?=?leastUsedArena(directArenas);
????????????Thread?current?=?Thread.currentThread();
????????????if?(useCacheForAllThreads?||?current?instanceof?FastThreadLocalThread)?{
??????????????//?PoolThreadCache即為各個(gè)線(xiàn)程持有的內(nèi)存塊的封裝??
??????????????return?new?PoolThreadCache(
????????????????????????heapArena,?directArena,?tinyCacheSize,?smallCacheSize,?normalCacheSize,
????????????????????????DEFAULT_MAX_CACHED_BUFFER_CAPACITY,?DEFAULT_CACHE_TRIM_INTERVAL);
????????????}
????????????//?No?caching?so?just?use?0?as?sizes.
????????????return?new?PoolThreadCache(heapArena,?directArena,?0,?0,?0,?0,?0);
????????}
????}???
netty內(nèi)存池的內(nèi)存分配原理,參考我之前的文章:面試官:談?wù)凬etty內(nèi)存管理 !。
