Java 排序遇到的神坑,我替你踩了!

Java技術(shù)棧
www.javastack.cn
關(guān)注閱讀更多優(yōu)質(zhì)文章
來源:https://blog.51cto.com/nxlhero/2515850
問題描述
一個(gè)開發(fā)人員寫了一段明顯有問題的排序代碼,大致如下:
import?java.util.ArrayList;
import?java.util.Collections;
import?java.util.Comparator;
import?java.util.HashMap;
public?class?Test?{
????public?static?void?main(String[]?args)?throws?InterruptedException?{
????????//測試數(shù)據(jù):List里放Map,按Map里的name字段排序
????????HashMap?a?=?new?HashMap();
????????a.put("name",?"二");
????????HashMap?b?=?new?HashMap();
????????b.put("name",?"一");
????????HashMap?c?=?new?HashMap();
????????c.put("name",?"一");
????????HashMap?d?=?new?HashMap();
????????d.put("name",?"四");
????????HashMap?e?=?new?HashMap();
????????e.put("name",?"二");
????????HashMap?f?=?new?HashMap();
????????f.put("name",?"三");
????????ArrayList>?list?=?new?ArrayList<>();
????????list.add(a);
????????list.add(b);
????????list.add(c);
????????list.add(d);
????????list.add(e);
????????list.add(f);
????????//排序:明顯有問題,因?yàn)橹环祷?1和0,也就是比較的時(shí)候永遠(yuǎn)是小于等于
????????Collections.sort(list,?new?Comparator>()?{
????????????@Override
????????????public?int?compare(HashMap?o1,?HashMap?o2)?{
????????????????String?n1?=?o1.get("name");
????????????????String?n2?=?o2.get("name");
????????????????if?(n1.equals("一"))?{
????????????????????return?-1;
????????????????}
????????????????if?(n1.equals("二")?&&?!n2.equals("一"))?{
????????????????????return?-1;
????????????????}
????????????????if?(n1.equals("三")?&&?!"一二".contains(n2))?{
????????????????????return?-1;
????????????????}
????????????????if?(n1.equals("四")?&&?!"一二三".contains(n2))?{
????????????????????return?-1;
????????????????}
????????????????return?0;
????????????}
????????});
????????for(HashMap?x?:?list)?{
????????????System.out.print(x.get("name"));
????????}
????}
}
按理這個(gè)排序是有問題的,但是不管怎么改變測試數(shù)據(jù),排序結(jié)果都是對的(測試數(shù)據(jù)量較小),上面代碼的輸出結(jié)果如下,用的jdk是1.7:
一一二二三四
但是,生產(chǎn)上是有問題的。
分析
Collections.sort,最終調(diào)用了Arrays.sort,在1.7中,Arrays.sort做了修改。
public?static??void?sort(T[]?a,?Comparator?super?T>?c)?{
????if?(c?==?null)?{
????????sort(a);
????}?else?{
????????if?(LegacyMergeSort.userRequested)
????????????legacyMergeSort(a,?c);
????????else
????????????TimSort.sort(a,?0,?a.length,?c,?null,?0,?0);
????}
}
如果配置了java.util.Arrays.useLegacyMergeSort這個(gè)參數(shù),那么就走老的LegacyMergeSort,否則就走新的TimSort。
我們在代碼里加上下面一句話,輸出結(jié)果就是亂序的,這符合預(yù)期。
System.setProperty("java.util.Arrays.useLegacyMergeSort",?"true");
檢查了一下生產(chǎn)上JVM的參數(shù),果然加了這個(gè)參數(shù)。關(guān)注公眾號(hào)Java技術(shù)棧回復(fù):JVM46,可以獲取一份 JVM?調(diào)優(yōu)教程。
但是為什么走TimSort的結(jié)果是對的呢?繼續(xù)分析TimSort的代碼,發(fā)現(xiàn)有一個(gè)特殊情況的處理:
//?If?array?is?small,?do?a?"mini-TimSort"?with?no?merges
if?(nRemaining?????int?initRunLen?=?countRunAndMakeAscending(a,?lo,?hi,?c);
????binarySort(a,?lo,?hi,?lo?+?initRunLen,?c);
????return;
}
也就是在數(shù)組小于32的時(shí)候,進(jìn)入這個(gè)里面,然后沒有歸并。那我們先來測試一下大于32的情況。
public?class?Test?{?
????public?static?void?main(String[]?args)?throws?InterruptedException?{????
????????ArrayList>?list?=?new?ArrayList<>();
????????String[]?xx?=?{"一","二","三","四"};
????????for(int?i?=?0;?i?35;?i++)?{
????????????HashMap?x?=?new?HashMap();
????????????x.put("name",?xx[(i+17)%4]);
????????????list.add(x);
????????}
????????Collections.sort(list,?new?Comparator>()?{
????????????@Override
????????????public?int?compare(HashMap?o1,?HashMap?o2)?{
????????????????String?n1?=?o1.get("name");
????????????????String?n2?=?o2.get("name");
????????????????if?(n1.equals("一"))?{
????????????????????return?-1;
????????????????}
????????????????if?(n1.equals("二")?&&?!n2.equals("一"))?{
????????????????????return?-1;
????????????????}
????????????????if?(n1.equals("三")?&&?!"一二".contains(n2))?{
????????????????????return?-1;
????????????????}
????????????????if?(n1.equals("四")?&&?!"一二三".contains(n2))?{
????????????????????return?-1;
????????????????}
????????????????return?0;
????????????}
????????});
????????for(HashMap?x?:?list)?{
????????????System.out.print(x.get("name"));
????????}
????}
}
這次果然翻車了。
一一一一二二二二二三三三三三四四四四一一一一二二二二三三三三四四四四四
我們通過代碼來看一下為什么小于32的時(shí)候排序成功了。
推薦閱讀:剛寫完排序算法,就被開除了…
首先,我們的比較函數(shù),只有在真正小于或者等于情況下返回了-1,其余情況返回了0,包括大于的情況也返回了0。
比如
| 兩個(gè)值 | 結(jié)果 |
|---|---|
| 一一 | -1 |
| 一二 | -1 |
| 三二 | 0 |
| 四四 | -1 |
| 三一 | 0 |
為了簡化,下面用阿拉伯?dāng)?shù)字代替
以211423為例,
if?(nRemaining?????int?initRunLen?=?countRunAndMakeAscending(a,?lo,?hi,?c);
????binarySort(a,?lo,?hi,?lo?+?initRunLen,?c);
????return;
}
第一步,是找到嚴(yán)格遞增或者遞減的最大長度,如果是升序,就不處理,降序的話,就reverse。
211423經(jīng)過處理后變成了112 423,最大遞減長度為3(因?yàn)?和1相比的結(jié)果為-1,所以也被當(dāng)作嚴(yán)格遞減),然后211被reverse成112
private?static??int?countRunAndMakeAscending(T[]?a,?int?lo,?int?hi,
????????????????????????????????????????????????Comparator?super?T>?c)?{
????assert?lo?????int?runHi?=?lo?+?1;
????if?(runHi?==?hi)
????????return?1;
????//?Find?end?of?run,?and?reverse?range?if?descending
????if?(c.compare(a[runHi++],?a[lo])?0)?{?//?Descending
????????while?(runHi?????????????runHi++;
????????reverseRange(a,?lo,?runHi);
????}?else?{??????????????????????????????//?Ascending
????????while?(runHi?=?0)
????????????runHi++;
????}
????return?runHi?-?lo;
}
接下來,從第四個(gè)位置開始,找到它的位置,移動(dòng)數(shù)據(jù),讓每一個(gè)數(shù)字找到合適的位置,具體的代碼如下:
private?static??void?binarySort(T[]?a,?int?lo,?int?hi,?int?start,
???????????????????????????????????Comparator?super?T>?c)?{
????assert?lo?<=?start?&&?start?<=?hi;
????if?(start?==?lo)
????????start++;
????for?(?;?start?????????T?pivot?=?a[start];
????????//?Set?left?(and?right)?to?the?index?where?a[start]?(pivot)?belongs
????????int?left?=?lo;
????????int?right?=?start;
????????assert?left?<=?right;
????????/*
?????????*?Invariants:
?????????*???pivot?>=?all?in?[lo,?left).
?????????*???pivot??all?in?[right,?start).
?????????*/
????????while?(left?????????????int?mid?=?(left?+?right)?>>>?1;
????????????if?(c.compare(pivot,?a[mid])?0)
????????????????right?=?mid;
????????????else
????????????????left?=?mid?+?1;
????????}
????????assert?left?==?right;
????????int?n?=?start?-?left;??//?The?number?of?elements?to?move
????????//?Switch?is?just?an?optimization?for?arraycopy?in?default?case
????????switch?(n)?{
????????????case?2:??a[left?+?2]?=?a[left?+?1];
????????????case?1:??a[left?+?1]?=?a[left];
?????????????????????break;
????????????default:?System.arraycopy(a,?left,?a,?left?+?1,?n);
????????}
????????a[left]?=?pivot;
????}
}
第一次:112 4 23, 在左邊找到合適4的位置,結(jié)果為1124 23
第二次:1124 2 3, 在左邊找到2合適的位置,結(jié)果11224 3
第三次:11224 3,在左邊找到3合適的位置,結(jié)果為112234,結(jié)束
但是在countRunAndMakeAscending這個(gè)函數(shù)里用到了>=0。我們看一下這種情況,也就是數(shù)組的開頭是遞增的時(shí)候,會(huì)用到>=0
private?static??int?countRunAndMakeAscending(T[]?a,?int?lo,?int?hi,
????????????????????????????????????????????????Comparator?super?T>?c)?{
????assert?lo?????int?runHi?=?lo?+?1;
????if?(runHi?==?hi)
????????return?1;
????//?Find?end?of?run,?and?reverse?range?if?descending
????if?(c.compare(a[runHi++],?a[lo])?0)?{?//?Descending
????????while?(runHi?????????????runHi++;
????????reverseRange(a,?lo,?runHi);
????}?else?{??????????????????????????????//?Ascending
????????while?(runHi?=?0)
????????????runHi++;
????}
????return?runHi?-?lo;
}
假設(shè)輸入的是1234123,前邊2和1相比結(jié)果是0,3和2也是0,4和3也是0,1和4是-1,所以最大遞增序列是1234,同時(shí)不用reverse,傳給下一個(gè)函數(shù)的輸入為1234 123,結(jié)果三次插入,結(jié)果也是對的。
總結(jié)
綜上分析可以得出結(jié)論,就是因?yàn)樵趈dk 1.7中,如果數(shù)組小于32個(gè)元素,加入對于小于的比較都是-1, 其他的都是0,那么結(jié)果是正確的,這是因?yàn)樗惴ū旧淼奶匦浴5谴笥?2時(shí),就不對了,會(huì)看到分段排好序了,這是因?yàn)闅w并的時(shí)候比較結(jié)果都是0,導(dǎo)致沒有做歸并。
其實(shí)sort的Comparator是有坑的,必須把所有情況都考慮周到,而且要滿足以下特性:
1 ) 自反性:x , y 的比較結(jié)果和 y , x 的比較結(jié)果相反。2 ) 傳遞性:x > y , y > z ,則 x > z 。3 ) 對稱性:x = y ,則 x , z 比較結(jié)果和 y , z 比較結(jié)果相同。
上面的Comparator如果要寫的對,應(yīng)該這么寫,把所有情況列出來,當(dāng)然也可以通過一些條件簡化,但是簡化的后果就是上面的結(jié)果,需要充分測試。
Collections.sort(list,?new?Comparator>()?{
????@Override
????public?int?compare(HashMap?o1,?HashMap?o2)?{
????????String?n1?=?o1.get("name");
????????String?n2?=?o2.get("name");
????????if?(n1.equals("一")?&&?n2.equals("一"))?{
????????????return?0;
????????}
????????if?(n1.equals("一")?&&?n2.equals("二"))?{
????????????return?-1;
????????}
????????if?(n1.equals("一")?&&?n2.equals("三"))?{
????????????return?-1;
????????}
????????if?(n1.equals("一")?&&?n2.equals("四"))?{
????????????return?-1;
????????}
????????if?(n1.equals("二")?&&?n2.equals("一"))?{
????????????return?1;
????????}
????????......
????}
});






關(guān)注Java技術(shù)棧看更多干貨


