<kbd id="afajh"><form id="afajh"></form></kbd>
<strong id="afajh"><dl id="afajh"></dl></strong>
    <del id="afajh"><form id="afajh"></form></del>
        1. <th id="afajh"><progress id="afajh"></progress></th>
          <b id="afajh"><abbr id="afajh"></abbr></b>
          <th id="afajh"><progress id="afajh"></progress></th>

          聽說GNN大有可為,從這篇開始學(xué)以致用

          共 10601字,需瀏覽 22分鐘

           ·

          2021-07-14 01:03

          作者:十方


          說到GNN,各位煉丹師會想到哪些算法呢?不管想到哪些算法,我們真正用到過哪些呢?確實我們可能都看過GNN相關(guān)論文,但是還是缺乏實戰(zhàn)經(jīng)驗的.特別是對于推薦系統(tǒng)而言,我們又該如何應(yīng)用這些模型呢?下面我們就從DeepWalk這篇論文開始,先講原理,再講實戰(zhàn),最后講應(yīng)用.


          GNN相關(guān)背景知識


          GNN的本質(zhì),是要學(xué)習(xí)網(wǎng)絡(luò)中每個節(jié)點的表達的,這些潛在的表達對圖中每個節(jié)點的“社交”關(guān)系進行了編碼,把離散值的節(jié)點編碼成稠密向量,后續(xù)可用于分類回歸,或者作為下游任務(wù)的特征.Deepwalk充分利用了隨機游走提取的“句子”,去學(xué)習(xí)句子中每個單詞的表達.Deepwalk原文就提到了在數(shù)據(jù)稀疏的情況下可以把F1-scores提升10%,在一些實驗中,能夠用更少的訓(xùn)練數(shù)據(jù)獲得了更好的效果.看下圖的例子:

          Deepwalk


          問題定義:先把問題定義為給社交網(wǎng)絡(luò)中的每個節(jié)點進行分類,圖可以表達為G=<V,E>,V就是圖上所有節(jié)點,E是所有邊.有一部分有l(wèi)abel的數(shù)據(jù)GL=(V,E,X,Y),X就是節(jié)點的特征,Y就是分類的label.在傳統(tǒng)機器學(xué)習(xí)算法中,我們都是直接學(xué)習(xí)(X,Y),并沒有充分利用節(jié)點間的依賴關(guān)系.Deepwalk可以捕捉圖上的拓撲關(guān)系,通過無監(jiān)督方法學(xué)習(xí)每個節(jié)點的特征,學(xué)到的圖特征和標(biāo)簽的分布是相互獨立的.


          隨機游走:選定一個根節(jié)點,“隨機”走出一條路徑,基于相鄰的節(jié)點必然相似,我們就可以用這種策略去挖掘網(wǎng)絡(luò)的社群信息.隨機游走很方便并行,可以同時提取一張圖上各個部分的信息.即使圖有小的改動,這些路徑也不需要重新計算.和word的出現(xiàn)頻率類似,通過隨機游走得到的節(jié)點訪問頻率都符合冪律分布,所以我們就可以用NLP相關(guān)方法對隨機游走結(jié)果做相似處理,如下圖所示:

          所以在隨機游走后,我們只需要通過下公式,學(xué)習(xí)節(jié)點向量即可:

          該公式就是skip-gram,通過某個節(jié)點學(xué)習(xí)它左右的節(jié)點.我們都知道skip-gram用于文本時的語料庫就是一個個句子,現(xiàn)在我們提取圖的句子.如下所示:

          算法很簡單,把所有節(jié)點順序打亂(加速收斂),然后遍歷這些節(jié)點隨機游走出序列,再通過skipgram算法去擬合每個節(jié)點的向量.如此反復(fù).注:這里的隨機是均勻分布去隨機.當(dāng)然有些圖會有些“副產(chǎn)物”,比如用戶瀏覽網(wǎng)頁的順序,可以直接輸入到模型.


          接下來我們看下deepwalks的核心代碼:

          # 代碼來源 # https://github.com/phanein/deepwalk# Random walkwith open(f, 'w') as fout:  for walk in graph.build_deepwalk_corpus_iter(G=G, # 圖                                               num_paths=num_paths, # 路徑數(shù)                                               path_length=path_length, # 路徑長度                                               alpha=alpha, #                                               rand=rand): #    fout.write(u"{}\n".format(u" ".join(v for v in walk)))
          class Graph(defaultdict):  """  Efficient basic implementation of nx  這里我們看到,  該類繼承defaultdict,  圖其實可以簡單的表示為dict,  key為節(jié)點,value為與之相連的節(jié)點  """   def __init__(self): super(Graph, self).__init__(list)
          def nodes(self): return self.keys()
          def adjacency_iter(self): return self.iteritems()
          def subgraph(self, nodes={}):   # 提取子圖 subgraph = Graph() for n in nodes: if n in self: subgraph[n] = [x for x in self[n] if x in nodes] return subgraph
          def make_undirected(self):    #因為是無向圖,所以v in self[u]并且 u in self[v] t0 = time()
          for v in list(self): for other in self[v]: if v != other: self[other].append(v) t1 = time() logger.info('make_directed: added missing edges {}s'.format(t1-t0))
          self.make_consistent() return self
          def make_consistent(self):    # 去重 t0 = time() for k in iterkeys(self): self[k] = list(sorted(set(self[k]))) t1 = time() logger.info('make_consistent: made consistent in {}s'.format(t1-t0))
          self.remove_self_loops()
          return self
          def remove_self_loops(self):    # 自已不會與自己有連接 removed = 0 t0 = time()
          for x in self: if x in self[x]: self[x].remove(x) removed += 1 t1 = time()
          logger.info('remove_self_loops: removed {} loops in {}s'.format(removed, (t1-t0))) return self
          def check_self_loops(self): for x in self: for y in self[x]: if x == y: return True return False
          def has_edge(self, v1, v2):    # 兩個節(jié)點是否有邊 if v2 in self[v1] or v1 in self[v2]: return True return False
          def degree(self, nodes=None):    # 節(jié)點的度數(shù) if isinstance(nodes, Iterable): return {v:len(self[v]) for v in nodes} else: return len(self[nodes])
          def order(self): "Returns the number of nodes in the graph" return len(self)
          def number_of_edges(self):  # 圖中邊的數(shù)量 "Returns the number of nodes in the graph" return sum([self.degree(x) for x in self.keys()])/2
          def number_of_nodes(self): # 圖中結(jié)點數(shù)量 "Returns the number of nodes in the graph" return self.order()
          # 核心代碼 def random_walk(self, path_length, alpha=0, rand=random.Random(), start=None): """ Returns a truncated random walk. path_length: Length of the random walk. alpha: probability of restarts. start: the start node of the random walk. """ G = self if start: path = [start] else: # Sampling is uniform w.r.t V, and not w.r.t E path = [rand.choice(list(G.keys()))]
          while len(path) < path_length: cur = path[-1] if len(G[cur]) > 0: if rand.random() >= alpha:          path.append(rand.choice(G[cur])) # 相鄰節(jié)點隨機選 else:          path.append(path[0]) # 有一定概率選擇回到起點 else: break return [str(node) for node in path]
          # TODO add build_walks in here
          def build_deepwalk_corpus(G, num_paths, path_length, alpha=0, rand=random.Random(0)): walks = []
          nodes = list(G.nodes())  # 這里和上面論文算法流程對應(yīng) for cnt in range(num_paths): # 外循環(huán),相當(dāng)于要迭代多少epoch    rand.shuffle(nodes) # 打亂nodes順序,加速收斂 for node in nodes: # 每個node都會產(chǎn)生一條路徑 walks.append(G.random_walk(path_length, rand=rand, alpha=alpha, start=node)) return walks
          def build_deepwalk_corpus_iter(G, num_paths, path_length, alpha=0, rand=random.Random(0)):  # 流式處理用 walks = []
          nodes = list(G.nodes())
          for cnt in range(num_paths): rand.shuffle(nodes) for node in nodes: yield G.random_walk(path_length, rand=rand, alpha=alpha, start=node)

          def clique(size): return from_adjlist(permutations(range(1,size+1)))# http://stackoverflow.com/questions/312443/how-do-you-split-a-list-into-evenly-sized-chunks-in-pythondef grouper(n, iterable, padvalue=None): "grouper(3, 'abcdefg', 'x') --> ('a','b','c'), ('d','e','f'), ('g','x','x')" return zip_longest(*[iter(iterable)]*n, fillvalue=padvalue)
          def parse_adjacencylist(f): adjlist = [] for l in f: if l and l[0] != "#": introw = [int(x) for x in l.strip().split()] row = [introw[0]] row.extend(set(sorted(introw[1:]))) adjlist.extend([row]) return adjlist
          def parse_adjacencylist_unchecked(f): adjlist = [] for l in f: if l and l[0] != "#": adjlist.extend([[int(x) for x in l.strip().split()]]) return adjlist
          def load_adjacencylist(file_, undirected=False, chunksize=10000, unchecked=True):
          if unchecked: parse_func = parse_adjacencylist_unchecked convert_func = from_adjlist_unchecked else: parse_func = parse_adjacencylist convert_func = from_adjlist
          adjlist = []
          t0 = time() total = 0 with open(file_) as f: for idx, adj_chunk in enumerate(map(parse_func, grouper(int(chunksize), f))): adjlist.extend(adj_chunk) total += len(adj_chunk) t1 = time()
          logger.info('Parsed {} edges with {} chunks in {}s'.format(total, idx, t1-t0))
          t0 = time() G = convert_func(adjlist) t1 = time()
          logger.info('Converted edges to graph in {}s'.format(t1-t0))
          if undirected: t0 = time() G = G.make_undirected() t1 = time() logger.info('Made graph undirected in {}s'.format(t1-t0))
          return G

          def load_edgelist(file_, undirected=True): G = Graph() with open(file_) as f: for l in f: x, y = l.strip().split()[:2] x = int(x) y = int(y) G[x].append(y) if undirected: G[y].append(x) G.make_consistent() return G

          def load_matfile(file_, variable_name="network", undirected=True): mat_varables = loadmat(file_) mat_matrix = mat_varables[variable_name]
          return from_numpy(mat_matrix, undirected)

          def from_networkx(G_input, undirected=True): G = Graph()
          for idx, x in enumerate(G_input.nodes()): for y in iterkeys(G_input[x]): G[x].append(y)
          if undirected: G.make_undirected()
          return G

          def from_numpy(x, undirected=True): G = Graph()
          if issparse(x): cx = x.tocoo() for i,j,v in zip(cx.row, cx.col, cx.data): G[i].append(j) else: raise Exception("Dense matrices not yet supported.")
          if undirected: G.make_undirected()
          G.make_consistent() return G

          def from_adjlist(adjlist): G = Graph() for row in adjlist: node = row[0] neighbors = row[1:] G[node] = list(sorted(set(neighbors)))
          return G

          def from_adjlist_unchecked(adjlist): G = Graph() for row in adjlist: node = row[0] neighbors = row[1:] G[node] = neighbors
          return G

          至于skipgram,大家可以直接用gensim工具即可.

          from gensim.models import Word2Vecfrom gensim.models.word2vec import Vocab
          logger = logging.getLogger("deepwalk")
          class Skipgram(Word2Vec): """A subclass to allow more customization of the Word2Vec internals."""
          def __init__(self, vocabulary_counts=None, **kwargs):
          self.vocabulary_counts = None
          kwargs["min_count"] = kwargs.get("min_count", 0) kwargs["workers"] = kwargs.get("workers", cpu_count()) kwargs["size"] = kwargs.get("size", 128) kwargs["sentences"] = kwargs.get("sentences", None) kwargs["window"] = kwargs.get("window", 10) kwargs["sg"] = 1 kwargs["hs"] = 1
          if vocabulary_counts != None: self.vocabulary_counts = vocabulary_counts
          super(Skipgram, self).__init__(**kwargs)


          應(yīng)用


          在推薦場景中,無論是推薦商品還是廣告,用戶和item其實都可以通過點擊/轉(zhuǎn)化/購買等行為構(gòu)建二部圖,在此二部圖中進行隨機游走,學(xué)習(xí)每個節(jié)點的向量,在特定場景,缺乏特征和標(biāo)簽的情況下,可以通過user2user或者item2iterm的方式,很好的泛化到其他的標(biāo)簽.GNN提取的向量也可以用于下游雙塔召回模型或者排序模型.如果有社交網(wǎng)絡(luò),通過挖掘人與人直接的關(guān)系提取特征,供下游任務(wù)也是個不錯的選擇.當(dāng)然大家也可以嘗試在一些推薦比賽中用于豐富特征.

          往期精彩回顧




          本站qq群851320808,加入微信群請掃碼:
          瀏覽 45
          點贊
          評論
          收藏
          分享

          手機掃一掃分享

          分享
          舉報
          評論
          圖片
          表情
          推薦
          點贊
          評論
          收藏
          分享

          手機掃一掃分享

          分享
          舉報
          <kbd id="afajh"><form id="afajh"></form></kbd>
          <strong id="afajh"><dl id="afajh"></dl></strong>
            <del id="afajh"><form id="afajh"></form></del>
                1. <th id="afajh"><progress id="afajh"></progress></th>
                  <b id="afajh"><abbr id="afajh"></abbr></b>
                  <th id="afajh"><progress id="afajh"></progress></th>
                  天堂av一区在线观看 | 亚洲无码毛片基地 | 国产成人MV | 一区二区三区国产乱伦 | 三级片香港三级片久久久 |