淺談Trie字典樹
Trie字典樹,又被稱為前綴樹,一種可以高效進(jìn)行大量字符串的保存、統(tǒng)計(jì)、排序等的數(shù)據(jù)結(jié)構(gòu)

基本原理
對于字典樹而言,顧名思義其首先是一棵樹。然后將每個字符串拆分為若干個字符依次存儲到樹的各節(jié)點(diǎn)中。其有三個特殊的性質(zhì):
根節(jié)點(diǎn)不包含字符,除根節(jié)點(diǎn)外每一個節(jié)點(diǎn)都只包含一個字符 從根節(jié)點(diǎn)到某一節(jié)點(diǎn),將路徑上經(jīng)過的字符連接起來,就構(gòu)成了一個字符串 每個節(jié)點(diǎn)的所有子節(jié)點(diǎn)包含的字符都不相同
假設(shè)存在hi、dog、da三個字符串,這里通過Trie樹進(jìn)行保存存儲,則最終如下所示??梢钥吹絋rie樹的核心在于利用字符串的公共前綴減少查詢時間

而在字典樹的實(shí)現(xiàn)過程中,常見的功能包括向Trie樹中插入一個字符串、判斷Trie樹中是否存在某字符串、判斷Trie樹中是否存在指定字符串前綴等,與此同時還可以提供統(tǒng)計(jì)、排序等高級功能
實(shí)踐
實(shí)現(xiàn)Trie字典樹
學(xué)習(xí)過程中要善于理論聯(lián)系實(shí)際。故在介紹完Trie樹的基本原理后,我們來實(shí)現(xiàn)一個Trie樹。這里以LeetCode的第208題——實(shí)現(xiàn)Trie(前綴樹)為例
請你實(shí)現(xiàn) Trie 類:
Trie(): 初始化前綴樹對象 void insert(String word): 向前綴樹中插入字符串word boolean search(String word): 如果字符串word在前綴樹中,返回true(即在檢索之前已經(jīng)插入);否則返回false boolean startsWith(String prefix): 如果之前已經(jīng)插入的字符串word的前綴之一為prefix,返回true;否則返回 false Note: word 和 prefix 僅由小寫英文字母組成
可以看到對于本題而言,由于僅存在26個小寫英文字母。故對于子節(jié)點(diǎn)可以通過數(shù)組進(jìn)行保存,其中0~25索引 對應(yīng)于 a~z 字符。同時在節(jié)點(diǎn)增加一個標(biāo)識endFlag,用于指示當(dāng)前字符是否為字符串的最后一個字符。這樣查詢指定字符串時,如果其最后一個字符對應(yīng)節(jié)點(diǎn)的endFlag為true,則表示該字符串查詢成功。而對于查詢字符串前綴而言,只要我們能夠在Trie樹中查詢到相應(yīng)的節(jié)點(diǎn)存在,即可判定為查詢成功。Java實(shí)現(xiàn)如下所示
/**
?*?Trie字典樹
?*/
public?class?Trie?{
????/**
?????*?字典樹的根節(jié)點(diǎn)
?????*/
????private?TrieNode?root;
????public?Trie()?{
????????root?=?new?TrieNode();
????}
????/**
?????*?字典樹中插入字符串?word
?????*?@param?word
?????*/
????public?void?insert(String?word)?{
????????TrieNode?current?=?root;
????????char[]?chars?=?word.toCharArray();
????????for?(int?i=0;?i????????????char?ch?=?chars[i];
????????????int?index?=?calcIndex(ch);
????????????TrieNode[]?childs?=?current.getChilds();
????????????if(?childs[index]==null?)?{
????????????????childs[index]?=?new?TrieNode();
????????????}
????????????current?=?childs[index];
????????}
????????current.setEndFlag(?true?);
????}
????/**
?????*?判斷字符串是否存在于字典樹
?????*?@param?word
?????*?@return
?????*/
????public?boolean?search(String?word)?{
????????TrieNode?current?=?root;
????????char[]?chars?=?word.toCharArray();
????????for(int?i=0;?i????????????char?ch?=?chars[i];
????????????int?index?=?calcIndex(ch);
????????????TrieNode[]?childs?=?current.getChilds();
????????????if(?childs[index]==null?)?{
????????????????return?false;
????????????}
????????????current?=?childs[index];
????????}
????????return?current.isEndFlag();
????}
????/**
?????*?判斷前綴是否存在于字典樹中
?????*?@param?prefix
?????*?@return
?????*/
????public?boolean?startsWith(String?prefix)?{
????????TrieNode?current?=?root;
????????char[]?chars?=?prefix.toCharArray();
????????for(int?i=0;?i????????????char?ch?=?chars[i];
????????????int?index?=?calcIndex(ch);
????????????TrieNode[]?childs?=?current.getChilds();
????????????if(?childs[index]?==?null?)?{
????????????????return?false;
????????????}
????????????current?=?childs[index];
????????}
????????return?true;
????}
????/**
?????*?根據(jù)字符計(jì)算索引
?????*?0~25索引?對應(yīng)于?a~z?字符
?????*?@param?ch
?????*?@return
?????*/
????private?static?int?calcIndex(char?ch)?{
????????return?ch?-?'a';
????}
????/**
?????*?Trie字典樹節(jié)點(diǎn)
?????*/
????public?static?class?TrieNode?{
????????/**
?????????*?子節(jié)點(diǎn)數(shù)組,?0~25索引?對應(yīng)于?a~z?字符
?????????*/
????????private?TrieNode[]?childs;
????????/**
?????????*?當(dāng)前字符是否為字符串的最后一個字符
?????????*/
????????private?boolean?endFlag;
????????public?TrieNode()?{
????????????childs?=?new?TrieNode[26];
????????????endFlag?=?false;
????????}
????????public?TrieNode[]?getChilds()?{
????????????return?childs;
????????}
????????public?boolean?isEndFlag()?{
????????????return?endFlag;
????????}
????????public?void?setEndFlag(boolean?endFlag)?{
????????????this.endFlag?=?endFlag;
????????}
????}
}
前K個高頻單詞
學(xué)習(xí)過程中要善于理論聯(lián)系實(shí)際。這里以LeetCode的第692題——前K個高頻單詞為例
給定一個單詞列表words和一個整數(shù)k,返回前k個出現(xiàn)次數(shù)最多的單詞。返回的答案應(yīng)該按單詞出現(xiàn)頻率由高到低排序;如果不同的單詞有相同出現(xiàn)頻率,按字典順序排序
示例 1:
輸入: words = ["i", "love", "leetcode", "i", "love", "coding"], k = 2
輸出: ["i", "love"]?
解析: "i" 和 "love" 為出現(xiàn)次數(shù)最多的兩個單詞,均為2次。注意,按字母順序 "i" 在 "love" 之前
示例 2:
輸入: ["the", "day", "is", "sunny", "the", "the", "the", "sunny", "is", "is"], k = 4
輸出: ["the", "is", "sunny", "day"]?
解析: "the", "is", "sunny" 和 "day" 是出現(xiàn)次數(shù)最多的四個單詞,出現(xiàn)次數(shù)依次為 4, 3, 2 和 1 次
注意:
1 <= words.length <= 500 1 <= words[i] <= 10 words[i]?由小寫英文字母組成 k 的取值范圍是?[1, 不同 words[i] 的數(shù)量]
本題關(guān)鍵在于統(tǒng)計(jì)字符串的數(shù)量、獲取Top K個字符串。前者可以先將字符串保存到字典樹中,同時為了方便統(tǒng)計(jì),我們對于樹的節(jié)點(diǎn)增加完整字符串、計(jì)數(shù)值的字段。后者可以對字典樹進(jìn)行DFS深度優(yōu)先遍歷,并結(jié)合最小堆獲取Top K個字符串。最后將最小堆中的元素進(jìn)行輸出即可。Java實(shí)現(xiàn)如下所示
public?class?Solution?{
????/**
?????*?獲取字符串?dāng)?shù)組中Top?K字符串列表
?????*?@param?words?字符串?dāng)?shù)組
?????*?@param?k?
?????*?@return
?????*/????
????public?List?topKFrequent(String[]?words,?int?k)? {
????????//?1.?遍歷字符串存儲到字典樹中
????????Trie?trie?=?new?Trie();
????????for(String?word?:?words)?{
????????????trie.insert(word);
????????}
????????//?2.?通過最小堆、DFS獲取Top?K個字符串
????????PriorityQueue?minHeap?=?new?PriorityQueue<>();
????????dfs(minHeap,?k,?trie.getRoot());
????????//?3.?將最小堆的元素按降序輸出
????????LinkedList?res?=?new?LinkedList<>();
????????while(?!minHeap.isEmpty()?)?{
????????????Trie.TrieNode?min?=?minHeap.poll();
????????????res.addFirst(?min.getStr()?);
????????}
????????return?res;
????}
????/**
?????*?DFS搜索字典樹
?????*?@param?minHeap
?????*?@param?k
?????*?@param?current
?????*/
????private?void?dfs(PriorityQueue?minHeap,?int?k,?Trie.TrieNode?current) ?{
????????if(?current==null)?{
????????????return;
????????}
????????//?字典樹當(dāng)前節(jié)點(diǎn)為完整的字符串
????????if(?current.getStr()!=null?)?{
????????????//?最小堆中元素的數(shù)量未達(dá)到K,?則直接加入
????????????if(?minHeap.size()?????????????????minHeap.offer(?current?);
????????????}?else?if(?current.compareTo(?minHeap.peek()?)>0?)?{????//?當(dāng)前節(jié)點(diǎn)比堆中最小的元素大,?則加入
????????????????//?將當(dāng)前節(jié)點(diǎn)加入最小堆
????????????????minHeap.offer(?current?);
????????????????//?將堆中最小的元素移出堆,?保證堆中數(shù)量始終為K
????????????????minHeap.poll();
????????????}
????????}
????????//?DFS搜索字典樹
????????for(int?i=0;?i<26;?i++)?{
????????????Trie.TrieNode[]?childs?=?current.getChilds();
????????????dfs(minHeap,?k,?childs[i]);
????????}
????}
}
/**
?*?Trie字典樹
?*/
class?Trie?{
????/**
?????*?字典樹的根節(jié)點(diǎn)
?????*/
????private?TrieNode?root;
????public?Trie()?{
????????root?=?new?TrieNode();
????}
????public?TrieNode?getRoot()?{
????????return?root;
????}
????/**
?????*?字典樹中插入字符串?word
?????*?@param?word
?????*/
????public?void?insert(String?word)?{
????????TrieNode?current?=?root;
????????char[]?chars?=?word.toCharArray();
????????for?(int?i=0;?i????????????char?ch?=?chars[i];
????????????int?index?=?calcIndex(ch);
????????????TrieNode[]?childs?=?current.childs;
????????????if(?childs[index]==null?)?{
????????????????childs[index]?=?new?TrieNode();
????????????}
????????????current?=?childs[index];
????????}
????????//?保存完整字符串信息
????????current.str?=?word;
????????//?計(jì)數(shù)值+1
????????current.count?+=?1;
????}
????/**
?????*?根據(jù)字符計(jì)算索引
?????*?0~25索引?對應(yīng)于?a~z?字符
?????*?@param?ch
?????*?@return
?????*/
????private?static?int?calcIndex(char?ch)?{
????????return?ch?-?'a';
????}
????/**
?????*?Trie字典樹節(jié)點(diǎn)
?????*/
????public?static?class?TrieNode?implements?Comparable<TrieNode>?{
????????/**
?????????*?子節(jié)點(diǎn)數(shù)組,?0~25索引?對應(yīng)于?a~z?字符
?????????*/
????????private?TrieNode[]?childs;
????????/**
?????????*?字符串
?????????*/
????????private?String?str;
????????/**
?????????*?計(jì)數(shù)值
?????????*/
????????private?int?count;
????????public?TrieNode()?{
????????????childs?=?new?TrieNode[26];
????????????str?=?null;
????????????count?=?0;
????????}
????????public?TrieNode[]?getChilds()?{
????????????return?childs;
????????}
????????public?String?getStr()?{
????????????return?str;
????????}
????????public?int?getCount()?{
????????????return?count;
????????}
????????@Override
????????public?int?compareTo(TrieNode?o)?{
????????????//?排序規(guī)則:?先比較字符串的頻率
????????????int?res?=?this.count?-?o.count;
????????????if(?res!=0?)?{
????????????????return?res;
????????????}
????????????//?排序規(guī)則:?頻率相同,?按字典序排序
????????????res?=?o.str.compareTo(this.str);
????????????return?res;
????????}
????}
} 