天天躁日日躁AAAAXXXX,69视频免费在线观看,午夜三区,男人色色天堂,精品欧美一级片大片AAAA级,久久黄色网,久热在线精品视频,久久免费美女操B视频

Trie樹簡介

在計(jì)算機(jī)科學(xué)中，trie，又稱前綴樹或字典樹，是一種有序樹，用于保存關(guān)聯(lián)數(shù)組，其中的鍵通常是字符串。與二叉查找樹不同，鍵不是直接保存在節(jié)點(diǎn)中，而是由節(jié)點(diǎn)在樹中的位置決定。一個(gè)節(jié)點(diǎn)的所有子孫都有相同的前綴，也就是這個(gè)節(jié)點(diǎn)對(duì)應(yīng)的字符串，而根節(jié)點(diǎn)對(duì)應(yīng)空字符串。一般情況下，不是所有的節(jié)點(diǎn)都有對(duì)應(yīng)的值，只有葉子節(jié)點(diǎn)和部分內(nèi)部節(jié)點(diǎn)所對(duì)應(yīng)的鍵才有相關(guān)的值。

Trie這個(gè)術(shù)語來自于retrieval。根據(jù)詞源學(xué)，trie的發(fā)明者Edward Fredkin把它讀作/?tri?/ "tree"。但是，其他作者把它讀作/?tra?/ "try"。

在圖示中，鍵標(biāo)注在節(jié)點(diǎn)中，值標(biāo)注在節(jié)點(diǎn)之下。每一個(gè)完整的英文單詞對(duì)應(yīng)一個(gè)特定的整數(shù)。Trie可以看作是一個(gè)確定有限狀態(tài)自動(dòng)機(jī)，盡管邊上的符號(hào)一般是隱含在分支的順序中的。Eg.一個(gè)保存了8個(gè)單詞的字典樹的結(jié)構(gòu)如下圖所示，8個(gè)單詞分別是：“A”，“to”，“tea”，“ted”，“ten”，“i” ，“in”，“inn”。

另外，單詞查找樹，Trie樹，是一種樹形結(jié)構(gòu)，是一種哈希樹的變種。典型應(yīng)用是用于統(tǒng)計(jì)，排序和保存大量的字符串（但不僅限于字符串），所以經(jīng)常被搜索引擎系統(tǒng)用于文本詞頻統(tǒng)計(jì)。它的優(yōu)點(diǎn)是：利用字符串的公共前綴來減少查詢時(shí)間，最大限度地減少無謂的字符串比較，查詢效率比哈希樹高。

Trie樹性質(zhì)

它有3個(gè)基本性質(zhì)：

根節(jié)點(diǎn)不包含字符，除根節(jié)點(diǎn)外每一個(gè)節(jié)點(diǎn)都只包含一個(gè)字符；
從根節(jié)點(diǎn)到某一節(jié)點(diǎn)，路徑上經(jīng)過的字符連接起來，為該節(jié)點(diǎn)對(duì)應(yīng)的字符串；
每個(gè)節(jié)點(diǎn)的所有子節(jié)點(diǎn)包含的字符都不相同。

基本操作

其基本操作有：查找、插入和刪除,當(dāng)然刪除操作比較少見。

實(shí)現(xiàn)方法

搜索字典項(xiàng)目的方法為：

(1)從根結(jié)點(diǎn)開始一次搜索；
(2) 取得要查找關(guān)鍵詞的第一個(gè)字母，并根據(jù)該字母選擇對(duì)應(yīng)的子樹并轉(zhuǎn)到該子樹繼續(xù)進(jìn)行檢索；
(3) 在相應(yīng)的子樹上，取得要查找關(guān)鍵詞的第二個(gè)字母,并進(jìn)一步選擇對(duì)應(yīng)的子樹進(jìn)行檢索。
(4) 迭代過程……
(5) 在某個(gè)結(jié)點(diǎn)處，關(guān)鍵詞的所有字母已被取出，則讀取附在該結(jié)點(diǎn)上的信息，即完成查找。其他操作類似處理

實(shí)現(xiàn) Trie (前綴樹)

關(guān)于Trie樹實(shí)現(xiàn)，可以移步看下LeetCode208. 實(shí)現(xiàn) Trie (前綴樹)

輸入
["Trie", "insert", "search", "search", "startsWith", "insert", "search"]
[[], ["apple"], ["apple"], ["app"], ["app"], ["app"], ["app"]]
輸出
[null, null, true, false, true, null, true]

解釋
Trie trie = new Trie();
trie.insert("apple");
trie.search("apple"); ? // 返回 True
trie.search("app"); ? ? // 返回 False
trie.startsWith("app"); // 返回 True
trie.insert("app");
trie.search("app"); ? ? // 返回 True

來源：力扣（LeetCode）
鏈接：https://leetcode-cn.com/problems/implement-trie-prefix-tree

具體實(shí)現(xiàn)如下：

class TrieNode(object):
 ? ?def __init__(self):
 ? ? ? ?"""
 ? ? ? ?Initialize your data structure here.
 ? ? ? ?"""
 ? ? ? ?self.data = {}
 ? ? ? ?self.is_word = False


class Trie:
 ? ?def __init__(self):
 ? ? ? ?"""
 ? ? ? ?Initialize your data structure here.
 ? ? ? ?"""
 ? ? ? ?self.root = TrieNode()

 ? ?def insert(self, word):
 ? ? ? ?"""
 ? ? ? ?Inserts a word into the trie.
 ? ? ? ?:type word: str
 ? ? ? ?:rtype: void
 ? ? ? ?"""
 ? ? ? ?node = self.root
 ? ? ? ?for chars in word:
 ? ? ? ? ? ?child = node.data.get(chars)
 ? ? ? ? ? ?if not child:
 ? ? ? ? ? ? ? ?node.data[chars] = TrieNode()
 ? ? ? ? ? ?node = node.data[chars]
 ? ? ? ?node.is_word = True

 ? ?def search(self, word):
 ? ? ? ?"""
 ? ? ? ?Returns if the word is in the trie.
 ? ? ? ?:type word: str
 ? ? ? ?:rtype: bool
 ? ? ? ?"""
 ? ? ? ?node = self.root
 ? ? ? ?for chars in word:
 ? ? ? ? ? ?node = node.data.get(chars)
 ? ? ? ? ? ?if not node:
 ? ? ? ? ? ? ? ?return False
 ? ? ? ?return node.is_word ?# 判斷單詞是否是完整的存在在trie樹中

 ? ?def startsWith(self, prefix):
 ? ? ? ?"""
 ? ? ? ?Returns if there is any word in the trie that starts with the given prefix.
 ? ? ? ?:type prefix: str
 ? ? ? ?:rtype: bool
 ? ? ? ?"""
 ? ? ? ?node = self.root
 ? ? ? ?for chars in prefix:
 ? ? ? ? ? ?node = node.data.get(chars)
 ? ? ? ? ? ?if not node:
 ? ? ? ? ? ? ? ?return False
 ? ? ? ?return True

 ? ?def get_start(self, prefix):
 ? ? ? ?"""
 ? ? ? ? ?Returns words started with prefix
 ? ? ? ? ?返回以prefix開頭的所有words
 ? ? ? ? ?如果prefix是一個(gè)word，那么直接返回該prefix
 ? ? ? ? ?:param prefix:
 ? ? ? ? ?:return: words (list)
 ? ? ? ?"""

 ? ? ? ?def get_key(pre, pre_node):
 ? ? ? ? ? ?word_list = []
 ? ? ? ? ? ?if pre_node.is_word:
 ? ? ? ? ? ? ? ?word_list.append(pre)
 ? ? ? ? ? ?for x in pre_node.data.keys():
 ? ? ? ? ? ? ? ?word_list.extend(get_key(pre + str(x), pre_node.data.get(x)))
 ? ? ? ? ? ?return word_list

 ? ? ? ?words = []
 ? ? ? ?if not self.startsWith(prefix):
 ? ? ? ? ? ?return words
 ? ? ? ?if self.search(prefix):
 ? ? ? ? ? ?words.append(prefix)
 ? ? ? ? ? ?return words
 ? ? ? ?node = self.root
 ? ? ? ?for chars in prefix:
 ? ? ? ? ? ?node = node.data.get(chars)
 ? ? ? ?return get_key(prefix, node)


if __name__ == '__main__':
 ? ?trie = Trie()

 ? ?print('trie.insert("apple"):', trie.insert("apple"))
 ? ?print('trie.insert("appal"):', trie.insert("appal"))
 ? ?print('trie.insert("appear"):', trie.insert("appear"))
 ? ?print('trie.insert("apply"):', trie.insert("apply"))
 ? ?print('trie.insert("appulse"):', trie.insert("appulse"))

 ? ?print('trie.search("apple"):', trie.search("apple")) ?# 返回 True
 ? ?print('trie.search("app"):', trie.search("app")) ?# 返回 False

 ? ?print('trie.startsWith("app"):', trie.startsWith("app")) ?# 返回 True
 ? ?print('trie.insert("app"):', trie.insert("app"))
 ? ?print('trie.search("app"):', trie.search("app"))

 ? ?print('trie.search("app"):', trie.get_start("app"))
 ? ?print('trie.search("ap"):', trie.get_start('ap'))

結(jié)果輸出如下：

F:\ProgramData\Anaconda3\python.exe F:/Projects/nlp-trie/main.py
trie.insert("apple"): None
trie.insert("appal"): None
trie.insert("appear"): None
trie.insert("apply"): None
trie.insert("appulse"): None
trie.search("apple"): True
trie.search("app"): False
trie.startsWith("app"): True
trie.insert("app"): None
trie.search("app"): True
trie.search("app"): ['app']
trie.search("ap"): ['app', 'apple', 'apply', 'appal', 'appear', 'appulse']

Process finished with exit code 0

應(yīng)用

輸入框提示/自動(dòng)補(bǔ)全：trie 常用于搜索提示。如當(dāng)輸入一個(gè)網(wǎng)址，可以自動(dòng)搜索出可能的選擇。當(dāng)沒有完全匹配的搜索結(jié)果，可以返回前綴最相似的可能。
字符串檢索、模糊匹配
文本預(yù)測(cè)、自動(dòng)完成，see also，拼寫檢查
在NLP中的應(yīng)用，主要有基于字典樹的文本分詞、短語提取、實(shí)體提取等

優(yōu)缺點(diǎn)

優(yōu)點(diǎn)：

可以最大限度地減少無謂的字符串比較，故可以用于詞頻統(tǒng)計(jì)和大量字符串排序。跟哈希表比較：

最壞情況時(shí)間復(fù)雜度比hash表好
沒有沖突，除非一個(gè)key對(duì)應(yīng)多個(gè)值（除key外的其他信息）
自帶排序功能（類似Radix Sort），中序遍歷trie可以得到排序。

缺點(diǎn)：

雖然不同單詞共享前綴，但其實(shí)trie是一個(gè)以空間換時(shí)間的算法。其每一個(gè)字符都可能包含至多字符集大小數(shù)目的指針。
如果數(shù)據(jù)存儲(chǔ)在外部存儲(chǔ)器等較慢位置，Trie會(huì)較hash速度慢（hash訪問O(1)次外存，Trie訪問O(樹高)）。
長的浮點(diǎn)數(shù)等會(huì)讓鏈變得很長?？捎胋itwise trie改進(jìn)。

時(shí)間復(fù)雜度

時(shí)間復(fù)雜度：創(chuàng)建時(shí)間復(fù)雜度為O(L)，查詢時(shí)間復(fù)雜度是O(logL)，查詢時(shí)間復(fù)雜度最壞情況下是O(L)，L是字符串的長度。

參考資料

字典樹百度百科
Trie wikipedia
Trie Python實(shí)現(xiàn)
Trie樹學(xué)習(xí)及python實(shí)現(xiàn)
python 實(shí)現(xiàn) trie(字典) 樹
Trie樹（字典樹）詳細(xì)知識(shí)點(diǎn)及其應(yīng)用


往期精彩回顧




適合初學(xué)者入門人工智能的路線及資料下載
(圖文+視頻)機(jī)器學(xué)習(xí)入門系列下載
中國大學(xué)慕課《機(jī)器學(xué)習(xí)》（黃海廣主講）
機(jī)器學(xué)習(xí)及深度學(xué)習(xí)筆記等資料打印
《統(tǒng)計(jì)學(xué)習(xí)方法》的代碼復(fù)現(xiàn)專輯
機(jī)器學(xué)習(xí)交流qq群955171419，加入微信群請(qǐng)掃碼：

【NLP】一種好用的樹結(jié)構(gòu)：Trie樹