點擊上方藍色字體，選擇“設(shè)為星標”

回復(fù)”資源“獲取更多資源

點擊右側(cè)關(guān)注，大數(shù)據(jù)開發(fā)領(lǐng)域最強公眾號！

大數(shù)據(jù)真好玩

點擊右側(cè)關(guān)注，大數(shù)據(jù)真好玩！

一、索引數(shù)據(jù)

1. 使用映射定義文檔

映射里包含了一個索引的文檔中所有字段的定義，并告訴ES如何索引一篇文檔的多個字段。例如，如果一個字段包含日期，可以定義哪種日期格式是可以接受的。映射的概念類似于DB中的表字段定義。

ES會自動識別字段，并根據(jù)數(shù)據(jù)相應(yīng)地調(diào)整映射。但是在生產(chǎn)應(yīng)用中，通常需要預(yù)先定義自己的映射，而不依賴于自動的字段識別。向類型的_mapping接口發(fā)送HTTP GET請求可以獲得字段當(dāng)前的映射：

curl '172.16.1.127:9200/get-together/_doc/_mapping?pretty'

（1）自動映射
索引新文檔時ES可以自動創(chuàng)建映射，例如下面的命令會自動創(chuàng)建my_index索引，在其中索引一個ID為1的文檔，該文檔有name和date兩個字段：

curl -XPUT '172.16.1.127:9200/my_index/_doc/1?pretty' -H 'Content-Type: application/json' -d '{? "name": "Late Night with Elasticsearch",? "date": "2013-10-25T19:00"}'

查看自動生成的映射：

curl '172.16.1.127:9200/my_index/_doc/_mapping?pretty'

結(jié)果如下：

{? "my_index" : {? ? "mappings" : {? ? ? "_doc" : {? ? ? ? "properties" : {? ? ? ? ? "date" : {? ? ? ? ? ? "type" : "date"? ? ? ? ? },? ? ? ? ? "name" : {? ? ? ? ? ? "type" : "text",? ? ? ? ? ? "fields" : {? ? ? ? ? ? ? "keyword" : {? ? ? ? ? ? ? ? "type" : "keyword",? ? ? ? ? ? ? ? "ignore_above" : 256? ? ? ? ? ? ? }? ? ? ? ? ? }? ? ? ? ? }? ? ? ? }? ? ? }? ? }? }}

為不同目的以不同方式索引相同字段通常很有用。這是多領(lǐng)域的目的。例如，字符串字段可以映射為全文搜索的文本字段，也可以映射為排序或聚合的keyword字段。如上例中的fields允許對同一索引中的同名字段具有不同的設(shè)置。對于字符串?dāng)?shù)據(jù)，ES缺省映射為text和keyword兩種類型。

（2）手工定義新映射

可以在創(chuàng)建索引后，插入文當(dāng)前定義映射，就像建表一樣：

curl -XPUT '172.16.1.127:9200/my_index?pretty' curl -XPUT '172.16.1.127:9200/my_index/_mapping/_doc?pretty' -H 'Content-Type: application/json' -d '{? "_doc": {? ? "properties": {? ? ? "date": {? ? ? ? "type": "date"? ? ? },? ? ? "name": {? ? ? ? "type": "text",? ? ? ? "fields": {? ? ? ? ? "keyword": {? ? ? ? ? ? "type": "keyword",? ? ? ? ? ? "ignore_above": 256? ? ? ? ? }? ? ? ? }? ? ? }? ? }? }}'

索引創(chuàng)建后，可以修改映射，例如在my_index中增加host字段：

curl -XPUT '172.16.1.127:9200/my_index/_mapping/_doc?pretty' -H 'Content-Type: application/json' -d '{? "_doc": {? ? "properties": {? ? ? "host": {? ? ? ? "type": "text"? ? ? }? ? }? }}'

如果在現(xiàn)有基礎(chǔ)上再設(shè)置一個映射，ES會將兩者合并，例如上面的命令執(zhí)行后，得到的映射如下：

{? "my_index" : {? ? "mappings" : {? ? ? "_doc" : {? ? ? ? "properties" : {? ? ? ? ? "date" : {? ? ? ? ? ? "type" : "date"? ? ? ? ? },? ? ? ? ? "host" : {? ? ? ? ? ? "type" : "text"? ? ? ? ? },? ? ? ? ? "name" : {? ? ? ? ? ? "type" : "text",? ? ? ? ? ? "fields" : {? ? ? ? ? ? ? "keyword" : {? ? ? ? ? ? ? ? "type" : "keyword",? ? ? ? ? ? ? ? "ignore_above" : 256? ? ? ? ? ? ? }? ? ? ? ? ? }? ? ? ? ? }? ? ? ? }? ? ? }? ? }? }}

正如所見，這個映射目前含有兩個來自初始映射的字段，外加定義的一個新字段。隨著新字段的加入，初始的映射被擴展了，在任何時候都可以進行這樣的操作。ES將此稱為映射合并。但是，不能改變現(xiàn)有字段的數(shù)據(jù)類型：

curl -XPUT '172.16.1.127:9200/my_index/_mapping/_doc?pretty' -H 'Content-Type: application/json' -d '{? "_doc": {? ? "properties": {? ? ? "host": {? ? ? ? "type": "long"? ? ? }? ? }? }}'

將返回以下錯誤：

{? "error" : {? ? "root_cause" : [? ? ? {? ? ? ? "type" : "remote_transport_exception",? ? ? ? "reason" : "[node126][172.16.1.126:9300][indices:admin/mapping/put]"? ? ? }? ? ],? ? "type" : "illegal_argument_exception",? ? "reason" : "mapper [host] of different type, current_type [text], merged_type [long]"? },? "status" : 400}

修改字段類型意味著ES必須重新索引數(shù)據(jù)。正確的映射，理想情況下只需要增加，而無需修改。為了定義這樣的映射，來看看ES中可為字段選擇的數(shù)據(jù)類型。

2. 基本數(shù)據(jù)類型

（1）字符串
如果在索引字符，字段就應(yīng)該是text類型，在索引中有很多選項來分析它們。解析文本、轉(zhuǎn)變文本、將其分解為基本元素使得搜索更為相關(guān)。這個過程在ES中叫做“analysis”。先看看分析的基本原理，下面的命令在my_index中索引一篇文檔：

curl -XPUT '172.16.1.127:9200/my_index/_doc/1?pretty' -H 'Content-Type: application/json' -d '{? "name": "Late Night with Elasticsearch",? "date": "2013-10-25T19:00"}'

當(dāng)這篇文檔索引后，在name字段里搜索單詞late：

curl '172.16.1.127:9200/my_index/_doc/_search?pretty' -H 'Content-Type: application/json' -d '{? "query": {? ? "query_string": {? ? ? "query": "late"? ? }? }}'

搜索發(fā)現(xiàn)了索引中的“Late Night with Elasticsearch”文檔。ES通過分析連接了字符串“l(fā)ate”和“Late Night with Elasticsearch”。如圖1所示，當(dāng)索引“Late Night with Elasticsearch”時，默認的分析器將所有字符串轉(zhuǎn)化為小寫，然后將字符串分解為單詞。

圖1 在默認的分析器將字符串分解為詞條后，隨后的搜索匹配了那些詞條

分析過程生成了4個詞條，即late、night、with和elasticsearch。查詢的字符串經(jīng)過同樣的處理。因為查詢生成的late詞條和文檔生成的late詞條匹配了，所以文檔1匹配上了搜索。這種匹配有點像SQL中的where lower(name) like concat('%',lower('late'),'%')。

一個詞條是文本中的一個單詞，是搜索的基本單位。如果只想嚴格匹配某個字段，就像SQL中的where name = 'late'，應(yīng)該將整個字段作為一個單詞對待。ES對文本類型的keyword字段不做分析，而是將整個字符串當(dāng)做單獨的詞條進行索引。下面的查詢不會返回文檔：

curl '172.16.1.127:9200/my_index/_doc/_search?pretty' -H 'Content-Type: application/json' -d '{? "query": {? ? "term": {? ? ? "name.keyword": "late"? ? }? }}'

但嚴格匹配時將返回文檔1：

curl '172.16.1.127:9200/my_index/_doc/_search?pretty' -H 'Content-Type: application/json' -d '{? "query": {? ? "term": {? ? ? "name.keyword": "Late Night with Elasticsearch"? ? }? }}'

（2）數(shù)字
數(shù)值類型可以是浮點數(shù)或非浮點數(shù)。如果不需要小數(shù)，可以選擇byte、short、int或long。如果確實需要小數(shù)，選擇可以是float或double。這些類型對應(yīng)于Java的原始數(shù)據(jù)類型，對于它們的選擇會影響索引的大小，以及能夠索引的取值范圍。例如，long需要64位，而short只需要16位，但short只能存儲-32768到32767之間的數(shù)字。

如果不知道所需要的整型數(shù)字取值范圍，或者是浮點數(shù)字的精度，讓ES自動檢測映射更為安全：為整數(shù)值分配long，為浮點數(shù)值分配double。索引可能變得更大更慢，因為這兩種類型占據(jù)更多的空間，但在索引過程中ES不會發(fā)生超出范圍的錯誤。

（3）日期
date類型用于存儲日期和時間。它是這樣運作的：通常提供一個表示日期的字符串，例如2013-10-25T19:00。然后，ES解析這個字符串，將其作為long的數(shù)值存入Lucene的索引。該long型數(shù)值是從1970年1月1日 00:00:00 UTC 到所提供時間之間已經(jīng)過去的毫秒數(shù)。

搜索文檔時仍然提供date字符串，ES將這些字符串解析并按照數(shù)值來處理。這樣做的原因是和字符串相比，數(shù)值在存儲和處理時更快。

date字符串的數(shù)據(jù)格式是通過format選項來定義的，ES默認解析ISO 8601的時間戳。使用format選項來指定日期格式的時候，有以下兩種選擇：

使用預(yù)定義的日期格式。參見https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-date-format.html#built-in-date-formats
設(shè)置自己定制的格式。

curl -XPUT '172.16.1.127:9200/my_index/_mapping/_doc?pretty' -H 'Content-Type: application/json' -d ' {? "properties": {? ? "next_event": {? ? ? "type": "date",? ? ? "format": "MMM DD YYYY"? ? }? }}'

next_event字段使用定制的日期格式，其它日期被自動檢測，不顯式定義。

curl -XPUT '172.16.1.127:9200/my_index/_doc/1?pretty' -H 'Content-Type: application/json' -d '{? "name": "Elasticsearch News",? "first_occurence": "2011-04-03",? "next_event": "Oct 25 2013"}'

查看映射：

curl '172.16.1.127:9200/my_index/_doc/_mapping?pretty'

結(jié)果返回：

{? "my_index" : {? ? "mappings" : {? ? ? "_doc" : {? ? ? ? "properties" : {? ? ? ? ? "date" : {? ? ? ? ? ? "type" : "date"? ? ? ? ? },? ? ? ? ? "first_occurence" : {? ? ? ? ? ? "type" : "date"? ? ? ? ? },? ? ? ? ? "host" : {? ? ? ? ? ? "type" : "text"? ? ? ? ? },? ? ? ? ? "name" : {? ? ? ? ? ? "type" : "text",? ? ? ? ? ? "fields" : {? ? ? ? ? ? ? "keyword" : {? ? ? ? ? ? ? ? "type" : "keyword",? ? ? ? ? ? ? ? "ignore_above" : 256? ? ? ? ? ? ? }? ? ? ? ? ? }? ? ? ? ? },? ? ? ? ? "next_event" : {? ? ? ? ? ? "type" : "date",? ? ? ? ? ? "format" : "MMM DD YYYY"? ? ? ? ? }? ? ? ? }? ? ? }? ? }? }}

（4）布爾
boolean類型用于存儲文檔中的true/false，例如：

curl -XPUT '172.16.1.127:9200/my_index/_doc/1?pretty' -H 'Content-Type: application/json' -d '{? "name": "Broadcasted Elasticsearch News",? "downloadable": true}'

downloadable字段被自動地映射為boolean，在Lucene的索引中被存儲為T和F。和date一樣，ES解析源文檔中提供的值，將true和false分別轉(zhuǎn)化為T和F。（5）數(shù)組
所有基本類型都支持數(shù)組，無須修改映射，既可以使用單一值，也可以使用數(shù)組：

curl -XPUT '172.16.1.127:9200/blog/posts/1?pretty' -H 'Content-Type: application/json' -d '{? "tags": ["first", "initial"]}' curl -XPUT '172.16.1.127:9200/blog/posts/2?pretty' -H 'Content-Type: application/json' -d '{"tags": "second"}' curl 'localhost:9200/blog/_mapping/posts?pretty'

結(jié)果返回：

{? "blog" : {? ? "mappings" : {? ? ? "posts" : {? ? ? ? "properties" : {? ? ? ? ? "tags" : {? ? ? ? ? ? "type" : "text",? ? ? ? ? ? "fields" : {? ? ? ? ? ? ? "keyword" : {? ? ? ? ? ? ? ? "type" : "keyword",? ? ? ? ? ? ? ? "ignore_above" : 256? ? ? ? ? ? ? }? ? ? ? ? ? }? ? ? ? ? }? ? ? ? }? ? ? }? ? }? }}

可以看到，映射中并不定義數(shù)組，而是定義為基本類型。對于Lucene內(nèi)部而言，單值和數(shù)組兩者基本是一致的，在同一字段中索引多少詞條完全取決于提供了多少值。

3. 多字段

數(shù)組允許用一個設(shè)置索引多項數(shù)據(jù)，而多字段允許使用不同的設(shè)置，對同一項數(shù)據(jù)索引多次。例如：

curl -XPUT '172.16.1.127:9200/blog/_mapping/posts?pretty' -H 'Content-Type: application/json' -d '{? "posts": {? ? "properties": {? ? ? "tags": {? ? ? ? "type": "text",? ? ? ? "index": true,? ? ? ? "fields": {? ? ? ? ? "verbatim": {? ? ? ? ? ? "type": "text",? ? ? ? ? ? "index": false? ? ? ? ? }? ? ? ? }? ? ? }? ? }? }}'

無須重新索引數(shù)據(jù)，就能將單字段升級到多字段。反之是不行的，一旦字段已經(jīng)存在，就不能將其抹去：

curl -XPUT '172.16.1.127:9200/blog/_mapping/posts?pretty' -H 'Content-Type: application/json' -d '{? "posts": {? ? "properties": {? ? ? "tags": {? ? ? ? "type": "text"? ? ? }? ? }? }}' curl 'localhost:9200/blog/_mapping/posts?pretty'

結(jié)果如下：

{? "blog" : {? ? "mappings" : {? ? ? "posts" : {? ? ? ? "properties" : {? ? ? ? ? "tags" : {? ? ? ? ? ? "type" : "text",? ? ? ? ? ? "fields" : {? ? ? ? ? ? ? "keyword" : {? ? ? ? ? ? ? ? "type" : "keyword",? ? ? ? ? ? ? ? "ignore_above" : 256? ? ? ? ? ? ? },? ? ? ? ? ? ? "verbatim" : {? ? ? ? ? ? ? ? "type" : "text",? ? ? ? ? ? ? ? "index" : false? ? ? ? ? ? ? }? ? ? ? ? ? }? ? ? ? ? }? ? ? ? }? ? ? }? ? }? }}

因為修改映射時ES只是執(zhí)行映射合并，所以并不會去掉verbatim字段。

4. 預(yù)定義字段

預(yù)定義字段與自定義字段在三個方面有所不同：

通常不用部署預(yù)定義的字段。
字段名揭示了相關(guān)字段的功能。
總是以下劃線（_）開頭。

（1）_source

_source字段按照原有格式來存儲原有文檔。搜索的時候會獲得_source的JSON：

curl '172.16.1.127:9200/get-together/_doc/1?pretty'

結(jié)果返回：

{? "_index" : "get-together",? "_type" : "_doc",? "_id" : "1",? "_version" : 3,? "found" : true,? "_source" : {? ? "relationship_type" : "group",? ? "name" : "Denver Clojure",? ? "organizer" : [? ? ? "Daniel",? ? ? "Lee"? ? ],? ? "description" : "Group of Clojure enthusiasts from Denver who want to hack on code together and learn more about Clojure",? ? "created_on" : "2012-06-15",? ? "tags" : [? ? ? "clojure",? ? ? "denver",? ? ? "functional programming",? ? ? "jvm",? ? ? "java"? ? ],? ? "members" : [? ? ? "Lee",? ? ? "Daniel",? ? ? "Mike"? ? ],? ? "location_group" : "Denver, Colorado, USA"? }}

搜索時可以要求ES只返回指定的字段，而不是整個_source。

curl -XGET '172.16.1.127:9200/get-together/_search?pretty' -H 'Content-Type: application/json' -d '{? "query": {? ? "terms": {? ? ? "_id": [? ? ? ? "1"? ? ? ]? ? }? },? "_source": [? ? "name",? ? "organizer"? ]}'

結(jié)果返回：

{? "took" : 6,? "timed_out" : false,? "_shards" : {? ? "total" : 2,? ? "successful" : 2,? ? "skipped" : 0,? ? "failed" : 0? },? "hits" : {? ? "total" : 1,? ? "max_score" : 1.0,? ? "hits" : [? ? ? {? ? ? ? "_index" : "get-together",? ? ? ? "_type" : "_doc",? ? ? ? "_id" : "1",? ? ? ? "_score" : 1.0,? ? ? ? "_source" : {? ? ? ? ? "organizer" : [? ? ? ? ? ? "Daniel",? ? ? ? ? ? "Lee"? ? ? ? ? ],? ? ? ? ? "name" : "Denver Clojure"? ? ? ? }? ? ? }? ? ]? }}

功能類似于如下SQL：

select name, organizer from get-together where id=1;

（2）_all
_source字段存儲所有信息，而_all是索引所有的信息。_all字段將所有字段的值連接成一個大字符串，使用空格作為分隔符，然后對其進行分析和索引，但不進行存儲。這意味著可以把它作為搜索條件，但不能返回它。_all字段允許在不知道哪個字段包含值的情況下搜索文檔中的值。? ? ? ??

如果不指定字段名，系統(tǒng)默認將會在_all上搜索，下面的兩條命令是等價的，返回相同的結(jié)果：

curl '172.16.1.127:9200/get-together/_search?q=elasticsearch&pretty' curl -X GET '172.16.1.127:9200/get-together/_search?pretty' -H 'Content-Type: application/json' -d'{? "query": {? ? "query_string": {? ? ? "query": "elasticsearch"? ? }? }}'

（3）_index、_type、_id
ES用這三個字段識別單個文檔。ID可以由用戶手動提供：

curl -XPUT '172.16.1.127:9200/manual_id/_doc/1st?pretty' -H 'Content-Type: application/json' -d '{? "name": "Elasticsearch Denver"}'

可以在回復(fù)中看到ID：

{? "_index" : "manual_id",? "_type" : "_doc",? "_id" : "1st",? "_version" : 1,? "result" : "created",? "_shards" : {? ? "total" : 2,? ? "successful" : 1,? ? "failed" : 0? },? "_seq_no" : 0,? "_primary_term" : 1}

也可以由ES自動生成唯一ID：

curl -XPOST '172.16.1.127:9200/logs/_doc/?pretty' -H 'Content-Type: application/json' -d '{  "message": "I have an automatic id"}'

可以在回復(fù)中看到自動生成的ID：

{? "_index" : "logs",? "_type" : "_doc",? "_id" : "iEbXOmgBWHJVyzwYQ9ho",? "_version" : 1,? "result" : "created",? "_shards" : {? ? "total" : 2,? ? "successful" : 1,? ? "failed" : 0? },? "_seq_no" : 0,? "_primary_term" : 1}

除了_id和_type，ES還在文檔中存儲索引的名稱?？梢栽谒阉骰蛘呤荊ET請求中看到_index。

curl '172.16.1.127:9200/_search?q=_index:get-together&pretty'curl '172.16.1.127:9200/_search?q=_index:blog&pretty'

二、更新數(shù)據(jù)

ES中更新文檔有兩種方法，一是PUT一篇不同的文檔到相同的地方（索引、類型和ID），功能上類似于SQL中的replace into；二是使用更新API。例如執(zhí)行類似SQL中的如下功能：

update get-together set organizer='Roy' where id=2;

圖2 文檔的更新包括取回文檔、處理文檔、并重新索引文檔，直至先前的文檔被覆蓋

如圖2所示，ES進行了如下操作（從上至下）：

從_source字段檢索現(xiàn)有文檔。
進行指定的修改。
刪除舊的文檔，在其原有位置索引新的文檔。

1. 使用更新API

（1）發(fā)送部分文檔

curl -XPOST '172.16.1.127:9200/get-together/_doc/2/_update?pretty' -H 'Content-Type: application/json' -d '{? "doc": {? ? "organizer": "Roy"? }}'

這條命令設(shè)置了在doc下指定的字段，將其值設(shè)置為所提供的值。它并不考慮這些字段之前的值，也不考慮這些字段之前是否存在。如果之前整個文檔是不存在的，那么更新操作會失敗，并提示文檔缺失。

（2）使用upsert
為了處理更新時文檔并不存在的情況，可以使用upsert。這個單詞是關(guān)系數(shù)據(jù)庫中update和insert的混成詞。如果被更新的文檔不存在，可以在JSON的upsert部分中添加一個初始文檔用于索引：

curl -XPOST '172.16.1.127:9200/get-together/_doc/2/_update?pretty' -H 'Content-Type: application/json' -d '{? "doc": {? ? "organizer": "Roy"? },? "upsert": {? ? "name": "Elasticsearch Denver",? ? "organizer": "Roy"? }}'

（3）通過腳本更新文檔
? ? ? ? 一個更新腳本具有以下三項重要元素：

默認的腳本語言是painless。
由于更新要獲得現(xiàn)有文檔的_source內(nèi)容，修改并重新索引新的文檔，因此腳本會修改_source中的字段。使用ctx._source來引用_source，使用ctx._source[字段名]來引用某個指定的字段。
如果需要變量，推薦在params下作為參數(shù)單獨定義，和腳本本身分開。這是因為腳本需要編譯，一旦編譯完成，就會被緩存。如果使用不同的參數(shù)，多次運行同樣的腳本，腳本只需要編譯一次。之后的運行都會從緩存中獲取現(xiàn)有的腳本。相比每次不同的腳本，這樣運行會更快，因為不同的腳本每次都需要編譯。這個思想和Oracle的綁定變量與軟編譯概念異曲同工。

curl -XPUT '172.16.1.127:9200/online-shop/_doc/1?pretty' -H 'Content-Type: application/json' -d ' {? "caption": "Learning Elasticsearch",? "price": 15}' curl -XPOST '172.16.1.127:9200/online-shop/_doc/1/_update?pretty' -H 'Content-Type: application/json' -d '{? "script": {? ? "source": "ctx._source.price += params.price_diff",? ? "params": {? ? ? "price_diff": 10? ? }? }}' curl -XGET '172.16.1.127:9200/online-shop/_doc/1?pretty'

結(jié)果返回：

{? "_index" : "online-shop",? "_type" : "_doc",? "_id" : "1",? "_version" : 2,? "found" : true,? "_source" : {? ? "caption" : "Learning Elasticsearch",? ? "price" : 25? }}

price已經(jīng)改為25。

2. 通過版本實現(xiàn)并發(fā)控制

ES本身沒有事務(wù)概念，但由于ES的文檔更新是先取出再更改，所以并發(fā)更新文檔時同樣存在數(shù)據(jù)庫領(lǐng)域中所謂的“第二類丟失更新”問題。如圖3所示，在其它更新獲取原有文檔并進行修改期間，有可能另一個更新重新索引了這篇文檔。如果沒有并發(fā)控制，第二次的重新索引將會覆蓋第一次更新所做的修改。

圖3 沒有并發(fā)控制，修改就可能會丟失

ES使用文檔的_version字段進行并發(fā)控制。它采用一種樂觀鎖定防止第二類丟失更新，思想類似于Oracle 11g的Row Version。理論上可以使用下面的代碼重現(xiàn)圖3所示的流程，但遺憾的是，6.4.3版本的ES使用painless作為腳本語言，其中不支持Thread.sleep方法，因此執(zhí)行這段代碼會失敗。

/***curl -XPOST 'localhost:9200/online-shop/shirts/1/_update' -d '{"script": "Thread.sleep(10000); ctx._source.price = 2"}' &% curl -XPOST 'localhost:9200/online-shop/shirts/1/_update' -d '{"script": "ctx._source.caption = \"Knowing Elasticsearch\""}'***/

這里使用下面的代碼來演示version的作用：

curl -XGET "172.16.1.127:9200/online-shop/_doc/1?version=2&pretty"curl -XPOST '172.16.1.127:9200/online-shop/_doc/1/_update?pretty' -H 'Content-Type: application/json' -d '{? "script": "ctx._source.caption = \"Knowing Elasticsearch\""}'curl -XGET "172.16.1.127:9200/online-shop/_doc/1?version=2&pretty"

當(dāng)最后一個命令查詢已經(jīng)被更新的版本數(shù)據(jù)時，會報以下錯誤：

{? "error" : {? ? "root_cause" : [? ? ? {? ? ? ? "type" : "version_conflict_engine_exception",? ? ? ? "reason" : "[_doc][1]: version conflict, current version [3] is different than the one provided [2]",? ? ? ? "index_uuid" : "b6z8mwmRQ1ambP9g5rv9vQ",? ? ? ? "shard" : "3",? ? ? ? "index" : "online-shop"? ? ? }? ? ],? ? "type" : "version_conflict_engine_exception",? ? "reason" : "[_doc][1]: version conflict, current version [3] is different than the one provided [2]",? ? "index_uuid" : "b6z8mwmRQ1ambP9g5rv9vQ",? ? "shard" : "3",? ? "index" : "online-shop"? },? "status" : 409}

使用版本控制并發(fā)后的流程如圖4所示。

圖4 通過版本來控制并發(fā)，預(yù)防了一個更新覆蓋另一個更新

當(dāng)版本沖突出現(xiàn)的時候，可以通過retry_on_conflict參數(shù)，讓ES自動重試：

curl -XPOST '172.16.1.127:9200/online-shop/_doc/1/_update?retry_on_conflict=3&pretty' -H 'Content-Type: application/json' -d '{? "script": "ctx._source.price = 2"}'

更新文檔的另一個方法是不使用更新API，而是在同一個索引、類型和ID之處索引一個新的文檔。這樣的操作會覆蓋現(xiàn)有文檔，這種情況仍然可用版本字段來進行并發(fā)控制。為了實現(xiàn)這一點，要設(shè)置HTTP請求中的version參數(shù)。例如當(dāng)前版本為4，重新索引的請求命令如下：

curl -XPUT "172.16.1.127:9200/online-shop/_doc/1?version=6&pretty" -H 'Content-Type: application/json' -d '{? "caption": "I Know about Elasticsearch Versioning",? "price": 5}'

如果更新時的版本實際上已經(jīng)不是4，那么這個操作就會拋出版本沖突的異常并失敗。

三、刪除數(shù)據(jù)

1. 刪除文檔

刪除單個或一組文檔時，ES只是將它們標記為刪除，所以它們不會在出現(xiàn)于搜索結(jié)果中，稍后ES通過異步的方式將它們徹底從索引中刪除。

curl -XDELETE '172.16.1.127:9200/online-shop/_doc/1?pretty'

也可以使用版本來管理刪除操作的并發(fā)，但刪除的版本控制有個特殊情況。一旦刪除了文檔，它就不復(fù)存在了，于是一個更新操作很容易重新創(chuàng)建該文檔，盡管這是不應(yīng)該發(fā)生的（假設(shè)更新的版本要比刪除的版本更低）。為了防止這樣的問題發(fā)生，ES將在一段時間內(nèi)保留這篇文檔的版本，如此它就能拒絕版本比刪除操作更低的更新操作了。這個時間段默認是60秒，可以通過index.gc_deletes來修改它。

可以查詢某個索引中的文檔并刪除它們：

curl -X POST "172.16.1.127:9200/my_index/_delete_by_query?pretty" -H 'Content-Type: application/json' -d'{? "query": {?? ? "query_string": {? ? ? "query": "elasticsearch"? ? }? }}'

2. 刪除索引

# 刪除一個索引curl -XDELETE "172.16.1.127:9200/blog?&pretty" # 刪除多個索引curl -XDELETE "172.16.1.127:9200/my_index,manual_id?&pretty"

刪除索引是很快的，因為它基本上就是移除了和索引分片相關(guān)的文件。和刪除單獨的文檔相比，刪除文件系統(tǒng)中的文件更快。從執(zhí)行時間上看，其實數(shù)據(jù)庫也一樣，通常drop table比delete快得多。刪除索引的時候，文件只是被標記為已刪除，在分段進行合并時，它們才會被刪除。這里的合并是指將多個Lucene小分段組合為一個更大分段的過程。

3. 關(guān)閉索引

除了刪除索引，還可以選擇關(guān)閉它們。如果關(guān)閉一個索引，就無法通過ES讀寫其中的數(shù)據(jù)。當(dāng)使用應(yīng)用日志這樣的流式數(shù)據(jù)時，此操作非常有用?？梢躁P(guān)閉舊的索引釋放ES資源，但又不刪除它們以防后續(xù)使用。

# 關(guān)閉索引curl -XPOST '172.16.1.127:9200/logs/_close?pretty' # 打開索引curl -XPOST '172.16.1.127:9200/logs/_open?pretty'

一旦索引被關(guān)閉，它在ES中內(nèi)存中唯一的痕跡是其元數(shù)據(jù)，如索引名以及分片的位置?？梢灾匦麓蜷_被關(guān)閉的索引，然后在其中再次搜索。

版權(quán)聲明：

本文為大數(shù)據(jù)技術(shù)與架構(gòu)整理，原作者獨家授權(quán)。未經(jīng)原作者允許轉(zhuǎn)載追究侵權(quán)責(zé)任。

編輯｜冷眼丶

微信公眾號｜import_bigdata

歡迎點贊+收藏+轉(zhuǎn)發(fā)朋友圈素質(zhì)三連

文章不錯？點個【在看】吧！??

觸類旁通Elasticsearch之吊打同行系列：操作篇

一、索引數(shù)據(jù)

1. 使用映射定義文檔

2. 基本數(shù)據(jù)類型

3. 多字段

4. 預(yù)定義字段

二、更新數(shù)據(jù)

1. 使用更新API

2. 通過版本實現(xiàn)并發(fā)控制

三、刪除數(shù)據(jù)

1. 刪除文檔

2. 刪除索引

3. 關(guān)閉索引

二、更新數(shù)據(jù)