亚洲色无码专区观看在线观,欧美性猛交XXXX乱大交,人妻人妇一级精品毛片,国产v亚洲v天堂 a 无码99,麻豆爱爱视频,淫色婷婷,伊人色色,欧美爆操逼

??本文將介紹如何在Neo4j中批量插入節(jié)點(diǎn)、關(guān)系，提升圖譜構(gòu)建的效率。
??在講解批量插入節(jié)點(diǎn)、關(guān)系前，我們需要了解下節(jié)點(diǎn)重復(fù)創(chuàng)建問題。

節(jié)點(diǎn)重復(fù)創(chuàng)建

??在Neo4j中，我們?nèi)绻麑?duì)同一個(gè)節(jié)點(diǎn)進(jìn)行重復(fù)插入，則圖譜中會(huì)存在多個(gè)重復(fù)節(jié)點(diǎn)，這是因?yàn)镹eo4j本身自帶一個(gè)自增id造成的。
??我們來創(chuàng)建name為Google、address為U.S.的節(jié)點(diǎn)。使用CQL語句如下：

create (company: Company{name: "Google", address: "U.S."});
create (company: Company{name: "Google", address: "U.S."});

可以看到，圖譜中存在兩個(gè)貌似一模一樣的節(jié)點(diǎn)，如下圖：

實(shí)際上，這兩個(gè)節(jié)點(diǎn)只有id不同，這個(gè)id是Neo4j自帶的id，由系統(tǒng)生成。

??避免重復(fù)創(chuàng)建節(jié)點(diǎn)的辦法如下：

使用MERGE命令代替
在創(chuàng)建節(jié)點(diǎn)前，先查詢下圖譜中該節(jié)點(diǎn)是否存在

數(shù)據(jù)集

??我們選用OpenKG中的行政區(qū)相鄰數(shù)據(jù)集（數(shù)據(jù)有點(diǎn)兒小問題，需要自己改動(dòng)下），訪問網(wǎng)址為：http://www.openkg.cn/dataset/xzqh，我們想要構(gòu)建的示例圖譜（局部）如下：

??該圖譜中一共有個(gè)2834個(gè)節(jié)點(diǎn)（其中城市節(jié)點(diǎn)2801個(gè)，省份節(jié)點(diǎn)33個(gè)），18807條關(guān)系。

單個(gè)節(jié)點(diǎn)、關(guān)系創(chuàng)建

??首先我們采用單個(gè)節(jié)點(diǎn)、關(guān)系依次創(chuàng)建，觀察其圖譜構(gòu)建的耗時(shí)。示例Python代碼如下：

# -*- coding: utf-8 -*-
import json
import time

from py2neo import Graph, Node, Relationship
from py2neo import NodeMatcher, RelationshipMatcher

# 連接Neo4j
url = "http://localhost:7474"
username = "neo4j"
password = "password"
graph = Graph(url, auth=(username, password))
print("neo4j info: {}".format(str(graph)))

# 讀取數(shù)據(jù)
with open("data.json", "r", encoding="utf-8") as f:
    data_dict = json.loads(f.read())
nodes = data_dict["nodes"]
relations = data_dict["relations"]

# 創(chuàng)建節(jié)點(diǎn)
s_time = time.time()
create_node_cnt = 0
node_matcer = NodeMatcher(graph)
for node in nodes:
    label = node["label"]
    name = node["name"]
    find_node = node_matcer.match(label, name=name).first()
    if find_node is None:
        attrs = {k: v for k, v in node.items() if k != "label"}
        node = Node(label, **attrs)
        graph.create(node)
        create_node_cnt += 1
        print(f"create {create_node_cnt} nodes.")

# 創(chuàng)建關(guān)系
create_rel_cnt = 0
relation_matcher = RelationshipMatcher(graph)
for relation in relations:
    s_node, s_label = relation["subject"], relation["subject_type"]
    e_node, e_label = relation["object"], relation["object_type"]
    rel = relation["predicate"]
    start_node = node_matcer.match(s_label, name=s_node).first()
    end_node = node_matcer.match(e_label, name=e_node).first()
    if start_node is not None and end_node is not None:
        r_type = relation_matcher.match([start_node, end_node], r_type=rel).first()
        if r_type is None:
            graph.create(Relationship(start_node, rel, end_node))
            create_rel_cnt += 1
            print(f"create {create_rel_cnt} relations.")

# 輸出信息
e_time = time.time()
print(f"create {create_node_cnt} nodes, create {create_rel_cnt} relations.")
print(f"cost time: {round((e_time-s_time)*1000, 4)}ms")

上述創(chuàng)建圖譜腳本共耗時(shí)802.1秒。
??無疑上述操作過程是非常耗時(shí)的，在創(chuàng)建節(jié)點(diǎn)時(shí)，需要先查詢每個(gè)節(jié)點(diǎn)在圖譜中是否存在，不存在則創(chuàng)建該節(jié)點(diǎn)；在創(chuàng)建關(guān)系時(shí)，需要先查詢兩個(gè)節(jié)點(diǎn)是否存在，如節(jié)點(diǎn)存在，而關(guān)系不存在，則創(chuàng)建該關(guān)系。在整個(gè)操作過程中，需要頻繁地查詢圖譜、創(chuàng)建節(jié)點(diǎn)、創(chuàng)建關(guān)系，這無疑是該腳本耗時(shí)的地方所在。

批量節(jié)點(diǎn)、關(guān)系創(chuàng)建

??通過創(chuàng)建子圖（Subgraph），我們可以實(shí)現(xiàn)批量創(chuàng)建節(jié)點(diǎn)、關(guān)系，這樣可以提升圖譜構(gòu)建的效率。批量節(jié)點(diǎn)、關(guān)系創(chuàng)建的Python代碼如下：

# -*- coding: utf-8 -*-
import json
import time

from py2neo import Graph, Node, Relationship, Subgraph
from py2neo import RelationshipMatcher

# 連接Neo4j
url = "http://localhost:7474"
username = "neo4j"
password = "password"
graph = Graph(url, auth=(username, password))
print("neo4j info: {}".format(str(graph)))

# 讀取數(shù)據(jù)
with open("data.json", "r", encoding="utf-8") as f:
    data_dict = json.loads(f.read())
nodes = data_dict["nodes"]
relations = data_dict["relations"]

# 查詢city和province節(jié)點(diǎn)是否在圖譜中
cql = "match (n:province) return (n.name);"
province_names = [_["(n.name)"] for _ in graph.run(cql).data()]
cql = "match (n:city) return (n.name);"
city_names = [_["(n.name)"] for _ in graph.run(cql).data()]

# 創(chuàng)建節(jié)點(diǎn)
s_time = time.time()
create_node_cnt = 0
create_nodes = []
for node in nodes:
    label = node["label"]
    name = node["name"]
    if label == "city" and name not in city_names:
        attrs = {k: v for k, v in node.items() if k != "label"}
        create_nodes.append(Node(label, **attrs))
        create_node_cnt += 1
    elif label == "province" and name not in province_names:
        attrs = {k: v for k, v in node.items() if k != "label"}
        create_nodes.append(Node(label, **attrs))
        create_node_cnt += 1

# 批量創(chuàng)建節(jié)點(diǎn)
batch_size = 50
if create_nodes:
    for i in range(len(create_nodes)//50 + 1):
        subgraph = Subgraph(create_nodes[i*batch_size: (i+1)*batch_size])
        graph.create(subgraph)
        print(f"create {(i+1)*batch_size} nodes")

# 創(chuàng)建關(guān)系
cql = "match (n:province) return (n);"
province_nodes = [_["n"] for _ in graph.run(cql).data()]
cql = "match (n:city) return (n);"
city_nodes = [_["n"] for _ in graph.run(cql).data()]
city_dict = {_["name"]: _ for _ in city_nodes}
province_dict = {_["name"]: _ for _ in province_nodes}
create_rel_cnt = 0
create_relations = []
rel_matcher = RelationshipMatcher(graph)
for relation in relations:
    s_node, s_label = relation["subject"], relation["subject_type"]
    e_node, e_label = relation["object"], relation["object_type"]
    rel = relation["predicate"]
    start_node, end_node = None, None
    if s_label == "city":
        start_node = city_dict.get(s_node, None)
    if e_label == "city":
        end_node = city_dict.get(e_node, None)
    elif e_label == "province":
        end_node = province_dict.get(e_node, None)
    if start_node is not None and end_node is not None:
        r_type = rel_matcher.match([start_node, end_node], r_type=rel).first()
        if r_type is None:
            create_relations.append(Relationship(start_node, rel, end_node))
            create_rel_cnt += 1

# 批量創(chuàng)建關(guān)系
batch_size = 50
if create_relations:
    for i in range(len(create_relations)//50 + 1):
        subgraph = Subgraph(relationships=create_relations[i*batch_size: (i+1)*batch_size])
        graph.create(subgraph)
        print(f"create {(i+1)*batch_size} relations")

# 輸出信息
e_time = time.time()
print(f"create {create_node_cnt} nodes, create {create_rel_cnt} relations.")
print(f"cost time: {round((e_time-s_time)*1000, 4)}ms")

初次運(yùn)行該腳本時(shí)，創(chuàng)建整個(gè)圖譜（創(chuàng)建2834個(gè)節(jié)點(diǎn)，18807條關(guān)系）需要95.1秒。
再次運(yùn)行該腳本時(shí)，創(chuàng)建整個(gè)圖譜（創(chuàng)建0個(gè)節(jié)點(diǎn)，0條關(guān)系）需要522.5秒。

??注意，上述腳本的耗時(shí)需要體現(xiàn)在查詢所有city和province節(jié)點(diǎn)，并返回這些節(jié)點(diǎn)的查詢過程中，構(gòu)建節(jié)點(diǎn)、關(guān)系是很快的，但為了避免重復(fù)插入節(jié)點(diǎn)和關(guān)系，這一步查詢所有city和province節(jié)點(diǎn)是很有必要的。當(dāng)然，如果節(jié)點(diǎn)數(shù)量過大時(shí)，應(yīng)考慮其他方案，因?yàn)椴樵兯泄?jié)點(diǎn)也是很耗時(shí)并且消耗內(nèi)存的。

總結(jié)

??本文主要介紹了如何在Neo4j中批量創(chuàng)建節(jié)點(diǎn)、關(guān)系，從而提升圖譜構(gòu)建效率。
??2021年7月15日于上海浦東，此日上海酷暑逼人~

Neo4j入門（二）批量插入節(jié)點(diǎn)、關(guān)系

節(jié)點(diǎn)重復(fù)創(chuàng)建

數(shù)據(jù)集

單個(gè)節(jié)點(diǎn)、關(guān)系創(chuàng)建

批量節(jié)點(diǎn)、關(guān)系創(chuàng)建

總結(jié)