一次List對(duì)象去重失敗,引發(fā)對(duì)Java8中distinct()的思考

作者:puppylpg
blog.csdn.net/puppylpg/article/details/78556730
list的轉(zhuǎn)map的另一種猜想
Java8使用lambda表達(dá)式進(jìn)行函數(shù)式編程可以對(duì)集合進(jìn)行非常方便的操作。一個(gè)比較常見的操作是將list轉(zhuǎn)換成map,一般使用Collectors的toMap()方法進(jìn)行轉(zhuǎn)換。一個(gè)比較常見的問題是當(dāng)list中含有相同元素的時(shí)候,如果不指定取哪一個(gè),則會(huì)拋出異常。因此,這個(gè)指定是必須的。
當(dāng)然,使用toMap()的另一個(gè)重載方法,可以直接指定。這里,我們想討論的是另一種方法:在進(jìn)行轉(zhuǎn)map的操作之前,能不能使用distinct()先把list的重復(fù)元素過濾掉,然后轉(zhuǎn)map的時(shí)候就不用考慮重復(fù)元素的問題了。
使用distinct()給list去重
直接使用distinct(),失敗
package?example.mystream;
import?lombok.AllArgsConstructor;
import?lombok.Getter;
import?lombok.NoArgsConstructor;
import?lombok.ToString;
import?java.util.Arrays;
import?java.util.List;
import?java.util.Map;
import?java.util.stream.Collectors;
public?class?ListToMap?{
????@AllArgsConstructor
????@NoArgsConstructor
????@ToString
????private?static?class?VideoInfo?{
????????@Getter
????????String?id;
????????int?width;
????????int?height;
????}
????public?static?void?main(String?[]?args)?{
????????List?list?=?Arrays.asList(new?VideoInfo("123",?1,?2),
????????????????new?VideoInfo("456",?4,?5),?new?VideoInfo("123",?1,?2));
????????//?preferred:?handle?duplicated?data?when?toMap()
????????Map?id2VideoInfo?=?list.stream().collect(
????????????????Collectors.toMap(VideoInfo::getId,?x?->?x,
????????????????????????(oldValue,?newValue)?->?newValue)
????????);
????????System.out.println("No?Duplicated1:?");
????????id2VideoInfo.forEach((x,?y)?->?System.out.println("<"?+?x?+?",?"?+?y?+?">"));
????????//?handle?duplicated?data?using?distinct(),?before?toMap()
????????Map?id2VideoInfo2?=?list.stream().distinct().collect(
????????????????Collectors.toMap(VideoInfo::getId,?x?->?x)
????????);
????????System.out.println("No?Duplicated2:?");
????????id2VideoInfo2.forEach((x,?y)?->?System.out.println("<"?+?x?+?",?"?+?y?+?">"));
????}
}
list里總共有三個(gè)元素,其中有兩個(gè)我們認(rèn)為是重復(fù)的。第一種轉(zhuǎn)換是使用toMap()直接指定了對(duì)重復(fù)key的處理情況,因此可以正常轉(zhuǎn)換成map。而第二種轉(zhuǎn)換是想先對(duì)list進(jìn)行去重,然后再轉(zhuǎn)換成map,結(jié)果還是失敗了,拋出了IllegalStateException,所以distinct()應(yīng)該是失敗了。
No?Duplicated1:?
<123,?ListToMap.VideoInfo(id=123,?width=1,?height=2)>
<456,?ListToMap.VideoInfo(id=456,?width=4,?height=5)>
Exception?in?thread?"main"?java.lang.IllegalStateException:?Duplicate?key?ListToMap.VideoInfo(id=123,?width=1,?height=2)
????at?java.util.stream.Collectors.lambda$throwingMerger$0(Collectors.java:133)
????at?java.util.HashMap.merge(HashMap.java:1253)
????at?java.util.stream.Collectors.lambda$toMap$58(Collectors.java:1320)
????at?java.util.stream.ReduceOps$3ReducingSink.accept(ReduceOps.java:169)
????at?java.util.stream.DistinctOps$1$2.accept(DistinctOps.java:175)
????at?java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948)
????at?java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
????at?java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
????at?java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
????at?java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
????at?java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
????at?example.mystream.ListToMap.main(ListToMap.java:79)
原因:distinct()依賴于equals()
查看distinct()的API,可以看到如下介紹:
Returns a stream consisting of the distinct elements (according to {@link Object#equals(Object)}) of this stream.
顯然,distinct()對(duì)對(duì)象進(jìn)行去重時(shí),是根據(jù)對(duì)象的equals()方法去處理的。如果我們的VideoInfo類不overrride超類Object的equals()方法,就會(huì)使用Object的。
但是Object的equals()方法只有在兩個(gè)對(duì)象完全相同時(shí)才返回true。而我們想要的效果是只要VideoInfo的id/width/height均相同,就認(rèn)為兩個(gè)videoInfo對(duì)象是同一個(gè)。所以我們比如重寫屬于videoInfo的equals()方法。
重寫equals()的注意事項(xiàng)
我們?cè)O(shè)計(jì)VideoInfo的equals()如下:
@Override
public?boolean?equals(Object?obj)?{
????if?(!(obj?instanceof?VideoInfo))?{
????????return?false;
????}
????VideoInfo?vi?=?(VideoInfo)?obj;
????return?this.id.equals(vi.id)
??????????&&?this.width?==?vi.width
??????????&&?this.height?==?vi.height;
}
這樣一來,只要兩個(gè)videoInfo對(duì)象的三個(gè)屬性都相同,這兩個(gè)對(duì)象就相同了。歡天喜地去運(yùn)行程序,依舊失?。hy?
《Effective Java》是本好書,連Java之父James Gosling都說,這是一本連他都需要的Java教程。在這本書中,作者指出,如果重寫了一個(gè)類的equals()方法,那么就必須一起重寫它的hashCode()方法!必須!沒有商量的余地!
必須使得重寫后的equals()滿足如下條件:
根據(jù)equals()進(jìn)行比較,相等的兩個(gè)對(duì)象,hashCode()的值也必須相同; 根據(jù)equals()進(jìn)行比較,不相等的兩個(gè)對(duì)象,hashCode()的值可以相同,也可以不同;
因?yàn)檫@是Java的規(guī)定,違背這些規(guī)定將導(dǎo)致Java程序運(yùn)行不再正常。
具體更多的細(xì)節(jié),建議大家讀讀原書,必定獲益匪淺。強(qiáng)烈推薦!
最終,我按照神書的指導(dǎo)設(shè)計(jì)VideoInfo的hashCode()方法如下:
@Override
public?int?hashCode()?{
???int?n?=?31;
???n?=?n?*?31?+?this.id.hashCode();
???n?=?n?*?31?+?this.height;
???n?=?n?*?31?+?this.width;
???return?n;
}
終于,distinct()成功過濾了list中的重復(fù)元素,此時(shí)使用兩種toMap()將list轉(zhuǎn)換成map都是沒問題的:
No?Duplicated1:?
<123,?ListToMap.VideoInfo(id=123,?width=1,?height=2)>
<456,?ListToMap.VideoInfo(id=456,?width=4,?height=5)>
No?Duplicated2:?
<123,?ListToMap.VideoInfo(id=123,?width=1,?height=2)>
<456,?ListToMap.VideoInfo(id=456,?width=4,?height=5)>
引申
既然說distinct()是調(diào)用equals()進(jìn)行比較的,那按照我的理解,list的3個(gè)元素至少需要比較3次吧。那是不是就調(diào)用了3次equals()呢?
在equals()中加入一句打印,這樣就可以知道了。加后的equals()如下:
@Override?
public?boolean?equals(Object?obj)?{
????if?(!?(obj?instanceof?VideoInfo))?{
????????return?false;
????}
????VideoInfo?vi?=?(VideoInfo)?obj;
????System.out.println("<===>?Invoke?equals()?==>?"?+?this.toString()?+?"?vs.?"?+?vi.toString());
????return?this.id.equals(vi.id)?&&?this.width?==?vi.width?&&?this.height?==?vi.height;
}
結(jié)果:
No?Duplicated1:?
<123,?ListToMap.VideoInfo(id=123,?width=1,?height=2)>
<456,?ListToMap.VideoInfo(id=456,?width=4,?height=5)>
<===>?Invoke?equals()?==>?ListToMap.VideoInfo(id=123,?width=1,?height=2)?vs.?ListToMap.VideoInfo(id=123,?width=1,?height=2)
No?Duplicated2:?
<123,?ListToMap.VideoInfo(id=123,?width=1,?height=2)>
<456,?ListToMap.VideoInfo(id=456,?width=4,?height=5)>
結(jié)果發(fā)現(xiàn)才調(diào)用了一次equals()。為什么不是3次呢?仔細(xì)想想,根據(jù)hashCode()進(jìn)行比較,hashCode()相同的情況就一次,就是list的第一個(gè)元素和第三個(gè)元素(都是VideoInfo(id=123, width=1, height=2))會(huì)出現(xiàn)hashCode()相同的情況。
所以我們是不是可以這么猜想:只有當(dāng)hashCode()返回的hashCode相同的時(shí)候,才會(huì)調(diào)用equals()進(jìn)行更進(jìn)一步的判斷。如果連hashCode()返回的hashCode都不同,那么可以認(rèn)為這兩個(gè)對(duì)象一定就是不同的了!
驗(yàn)證猜想:
更改hashCode()如下:
@Override
public?int?hashCode()?{
???return?1;
}
這樣一來,所有的對(duì)象的hashCode()返回值都是相同的。當(dāng)然,這樣搞是符合Java規(guī)范的,因?yàn)镴ava只規(guī)定equals()相同的對(duì)象的hashCode必須相同,但是不同的對(duì)象的hashCode未必會(huì)不同。
結(jié)果:
No?Duplicated1:?
<123,?ListToMap.VideoInfo(id=123,?width=1,?height=2)>
<456,?ListToMap.VideoInfo(id=456,?width=4,?height=5)>
<===>?Invoke?equals()?==>?ListToMap.VideoInfo(id=456,?width=4,?height=5)?vs.?ListToMap.VideoInfo(id=123,?width=1,?height=2)
<===>?Invoke?equals()?==>?ListToMap.VideoInfo(id=456,?width=4,?height=5)?vs.?ListToMap.VideoInfo(id=123,?width=1,?height=2)
<===>?Invoke?equals()?==>?ListToMap.VideoInfo(id=123,?width=1,?height=2)?vs.?ListToMap.VideoInfo(id=123,?width=1,?height=2)
No?Duplicated2:?
<123,?ListToMap.VideoInfo(id=123,?width=1,?height=2)>
<456,?ListToMap.VideoInfo(id=456,?width=4,?height=5)>
果然,equals()調(diào)用了三次!看來的確只有hashCode相同的時(shí)候才會(huì)調(diào)用equal()進(jìn)一步判斷兩個(gè)對(duì)象究竟是否相同;如果hashCode不相同,兩個(gè)對(duì)象顯然不相同。猜想是正確的。
結(jié)論
list轉(zhuǎn)map推薦使用toMap(),并且無論是否會(huì)出現(xiàn)重復(fù)的問題,都要指定重復(fù)后的取舍規(guī)則,不費(fèi)功夫但受益無窮;
對(duì)一個(gè)自定義的class使用distinct(),切記覆寫equals()方法;
覆寫equals(),一定要覆寫hashCode();
雖然設(shè)計(jì)出一個(gè)hashCode()可以簡單地讓其return 1,這樣并不會(huì)違反Java規(guī)定,但是這樣做會(huì)導(dǎo)致很多惡果。比如將這樣的對(duì)象存入hashMap的時(shí)候,所有的對(duì)象的hashCode都相同,最終所有對(duì)象都存儲(chǔ)在hashMap的同一個(gè)桶中,直接將hashMap惡化成了一個(gè)鏈表。從而O(1)的復(fù)雜度被整成了O(n)的,性能自然大大下降。
好書是程序猿進(jìn)步的階梯?!郀柣?。比如《Effecctive Java》。
最終參考程序:
package?example.mystream;
import?lombok.AllArgsConstructor;
import?lombok.Getter;
import?lombok.NoArgsConstructor;
import?lombok.ToString;
import?java.util.Arrays;
import?java.util.List;
import?java.util.Map;
import?java.util.stream.Collectors;
public?class?ListToMap?{
????@AllArgsConstructor
????@NoArgsConstructor
????@ToString
????private?static?class?VideoInfo?{
????????@Getter
????????String?id;
????????int?width;
????????int?height;
????????public?static?void?main(String?[]?args)?{
????????????System.out.println(new?VideoInfo("123",?1,?2).equals(new?VideoInfo("123",?1,?2)));
????????}
????????@Override
????????public?boolean?equals(Object?obj)?{
????????????if?(!(obj?instanceof?VideoInfo))?{
????????????????return?false;
????????????}
????????????VideoInfo?vi?=?(VideoInfo)?obj;
????????????return?this.id.equals(vi.id)
????????????????????&&?this.width?==?vi.width
????????????????????&&?this.height?==?vi.height;
????????}
????????/**
?????????*?If?equals()?is?override,?hashCode()?must?be?override,?too.
?????????*?1.?if?a?equals?b,?they?must?have?the?same?hashCode;
?????????*?2.?if?a?doesn't?equals?b,?they?may?have?the?same?hashCode;
?????????*?3.?hashCode?written?in?this?way?can?be?affected?by?sequence?of?the?fields;
?????????*?3.?2^5?-?1?=?31.?So?31?will?be?faster?when?do?the?multiplication,
?????????*??????because?it?can?be?replaced?by?bit-shifting:?31?*?i?=?(i?<5)?-?i.
?????????*?@return
?????????*/
????????@Override
????????public?int?hashCode()?{
????????????int?n?=?31;
????????????n?=?n?*?31?+?this.id.hashCode();
????????????n?=?n?*?31?+?this.height;
????????????n?=?n?*?31?+?this.width;
????????????return?n;
????????}
????}
????public?static?void?main(String?[]?args)?{
????????List?list?=?Arrays.asList(new?VideoInfo("123",?1,?2),
????????????????new?VideoInfo("456",?4,?5),?new?VideoInfo("123",?1,?2));
????????//?preferred:?handle?duplicated?data?when?toMap()
????????Map?id2VideoInfo?=?list.stream().collect(
????????????????Collectors.toMap(VideoInfo::getId,?x?->?x,
????????????????????????(oldValue,?newValue)?->?newValue)
????????);
????????System.out.println("No?Duplicated1:?");
????????id2VideoInfo.forEach((x,?y)?->?System.out.println("<"?+?x?+?",?"?+?y?+?">"));
????????//?handle?duplicated?data?using?distinct(),?before?toMap()
????????//?Note?that?distinct()?relies?on?equals()?in?the?object
????????//?if?you?override?equals(),?hashCode()?must?be?override?together
????????Map?id2VideoInfo2?=?list.stream().distinct().collect(
????????????????Collectors.toMap(VideoInfo::getId,?x?->?x)
????????);
????????System.out.println("No?Duplicated2:?");
????????id2VideoInfo2.forEach((x,?y)?->?System.out.println("<"?+?x?+?",?"?+?y?+?">"));
????}
}
再拓展
假設(shè)類是別人的,不能修改
以上,VideoInfo使我們自己寫的類,我們可以往里添加equals()和hashCode()方法。如果VideoInfo是我們引用的依賴中的一個(gè)類,我們無權(quán)對(duì)其進(jìn)行修改,那么是不是就沒辦法使用distinct()按照某些元素是否相同,對(duì)對(duì)象進(jìn)行自定義的過濾了呢?
使用wrapper
在stackoverflow的一個(gè)回答上,我們可以找到一個(gè)可行的方法:使用wrapper。
假設(shè)在一個(gè)依賴中(我們無權(quán)修改該類),VideoInfo定義如下:
@AllArgsConstructor
@NoArgsConstructor
@ToString
public?class?VideoInfo?{
????@Getter
????String?id;
????int?width;
????int?height;
}
使用剛剛的wrapper思路,寫程序如下(當(dāng)然,為了程序的可運(yùn)行性,還是把VideoInfo放進(jìn)來了,假設(shè)它就是不能修改的,不能為其添加任何方法):
package?example.mystream;
import?lombok.AllArgsConstructor;
import?lombok.Getter;
import?lombok.NoArgsConstructor;
import?lombok.ToString;
import?java.util.Arrays;
import?java.util.List;
import?java.util.Map;
import?java.util.stream.Collectors;
public?class?DistinctByWrapper?{
????private?static?class?VideoInfoWrapper?{
????????private?final?VideoInfo?videoInfo;
????????public?VideoInfoWrapper(VideoInfo?videoInfo)?{
????????????this.videoInfo?=?videoInfo;
????????}
????????public?VideoInfo?unwrap()?{
????????????return?videoInfo;
????????}
????????@Override
????????public?boolean?equals(Object?obj)?{
????????????if?(!(obj?instanceof?VideoInfo))?{
????????????????return?false;
????????????}
????????????VideoInfo?vi?=?(VideoInfo)?obj;
????????????return?videoInfo.id.equals(vi.id)
????????????????????&&?videoInfo.width?==?vi.width
????????????????????&&?videoInfo.height?==?vi.height;
????????}
????????@Override
????????public?int?hashCode()?{
????????????int?n?=?31;
????????????n?=?n?*?31?+?videoInfo.id.hashCode();
????????????n?=?n?*?31?+?videoInfo.height;
????????????n?=?n?*?31?+?videoInfo.width;
????????????return?n;
????????}
????}
????public?static?void?main(String?[]?args)?{
????????List?list?=?Arrays.asList(new?VideoInfo("123",?1,?2),
????????????????new?VideoInfo("456",?4,?5),?new?VideoInfo("123",?1,?2));
????????//?VideoInfo?--map()-->?VideoInfoWrapper?---->?distinct():?VideoInfoWrapper?--map()-->?VideoInfo
????????Map?id2VideoInfo?=?list.stream()
????????????????.map(VideoInfoWrapper::new).distinct().map(VideoInfoWrapper::unwrap)
????????????????.collect(
????????????????Collectors.toMap(VideoInfo::getId,?x?->?x,
????????????????????????(oldValue,?newValue)?->?newValue)
????????);
????????id2VideoInfo.forEach((x,?y)?->?System.out.println("<"?+?x?+?",?"?+?y?+?">"));
????}
}
/**
?*?Assume?that?VideoInfo?is?a?class?that?we?can't?modify
?*/
@AllArgsConstructor
@NoArgsConstructor
@ToString
class?VideoInfo?{
????@Getter
????String?id;
????int?width;
????int?height;
}
整個(gè)wrapper的思路無非就是構(gòu)造另一個(gè)類VideoInfoWrapper,把hashCode()和equals()添加到wrapper中,這樣便可以按照自定義規(guī)則對(duì)wrapper對(duì)象進(jìn)行自定義的過濾。
我們沒法自定義過濾VideoInfo,但是我們可以自定義過濾VideoInfoWrapper??!
之后要做的,就是將VideoInfo全部轉(zhuǎn)化為VideoInfoWrapper,然后過濾掉某些VideoInfoWrapper,再將剩下的VideoInfoWrapper轉(zhuǎn)回VideoInfo,以此達(dá)到過濾VideoInfo的目的。很巧妙!
使用“filter() + 自定義函數(shù)”取代distinct()
另一種更精妙的實(shí)現(xiàn)方式是自定義一個(gè)函數(shù):
????private?static??Predicate?distinctByKey(Function?super?T,?Object>?keyExtractor)? {
????????Map (輸入元素的類型是T及其父類,keyExtracctor是映射函數(shù),返回Object,整個(gè)傳入的函數(shù)的功能應(yīng)該是提取key的。distinctByKey函數(shù)返回的是Predicate函數(shù),類型為T。)
這個(gè)函數(shù)傳入一個(gè)函數(shù)(lambda),對(duì)傳入的對(duì)象提取key,然后嘗試將key放入concurrentHashMap,如果能放進(jìn)去,說明此key之前沒出現(xiàn)過,函數(shù)返回false;如果不能放進(jìn)去,說明這個(gè)key和之前的某個(gè)key重復(fù)了,函數(shù)返回true。
這個(gè)函數(shù)最終作為filter()函數(shù)的入?yún)?。根?jù)Java API可知filter(func)過濾的規(guī)則為:如果func為true,則過濾,否則不過濾。因此,通過“filter() + 自定義的函數(shù)”,凡是重復(fù)的key都返回true,并被filter()過濾掉,最終留下的都是不重復(fù)的。
最終實(shí)現(xiàn)的程序如下
package?example.mystream;
import?lombok.AllArgsConstructor;
import?lombok.Getter;
import?lombok.NoArgsConstructor;
import?lombok.ToString;
import?java.util.Arrays;
import?java.util.List;
import?java.util.Map;
import?java.util.concurrent.ConcurrentHashMap;
import?java.util.function.Function;
import?java.util.function.Predicate;
import?java.util.stream.Collectors;
public?class?DistinctByFilterAndLambda?{
????public?static?void?main(String[]?args)?{
????????List?list?=?Arrays.asList(new?VideoInfo("123",?1,?2),
????????????????new?VideoInfo("456",?4,?5),?new?VideoInfo("123",?1,?2));
????????//?Get?distinct?only
????????Map?id2VideoInfo?=?list.stream().filter(distinctByKey(vi?->?vi.getId())).collect(
????????????????Collectors.toMap(VideoInfo::getId,?x?->?x,
????????????????????????(oldValue,?newValue)?->?newValue)
????????);
????????id2VideoInfo.forEach((x,?y)?->?System.out.println("<"?+?x?+?",?"?+?y?+?">"));
????}
????/**
?????*?If?a?key?could?not?be?put?into?ConcurrentHashMap,?that?means?the?key?is?duplicated
?????*?@param?keyExtractor?a?mapping?function?to?produce?keys
?????*?@param??the?type?of?the?input?elements
?????*?@return?true?if?key?is?duplicated;?else?return?false
?????*/
????private?static??Predicate?distinctByKey(Function?super?T,?Object>?keyExtractor)? {
????????Map 