elasticsearch搜索語法梳理 · 壹

前言
elasticsearch的核心在搜索,搜索的核心在搜索語法,所以今天我們來梳理下elasticsearch的一些搜索語法,今天主要探討搜索,主要包括兩方面的內(nèi)容,一方面是普通的query,也就是數(shù)據(jù)的檢索,另一方面就是內(nèi)容的聚合,也就是傳統(tǒng)sql中的分組。
好了,下面我們就來看下這兩種搜索查詢的具體操作吧。
搜索語法
全文搜索
全文搜索的規(guī)則是凡是包括我們檢索內(nèi)容的任一單詞都會匹配,同時會將結(jié)果予以展示,它并不關(guān)心單詞順序,也就是包含即匹配。
在下面我們的搜索表達(dá)式,它會搜索about中包括go和reading的所有內(nèi)容。
搜索語法:
curl -X GET "localhost:9200/megacorp/employee/_search?pretty" -H 'Content-Type: application/json' -d'
{
"query" : {
"match" : {
"about" : "go reading"
}
}
}
'
返回結(jié)果:
7{
"took" : 220,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 7,
"relation" : "eq"
},
"max_score" : 0.9161128,
"hits" : [
{
"_index" : "megacorp",
"_type" : "employee",
"_id" : "1",
"_score" : 0.9161128,
"_source" : {
"first_name" : "John",
"last_name" : "Smith",
"age" : 15,
"about" : "I love to go reading",
"interests" : [
"sports",
"music"
]
}
},
{
"_index" : "megacorp",
"_type" : "employee",
"_id" : "6",
"_score" : 0.9161128,
"_source" : {
"first_name" : "John",
"last_name" : "Smith",
"age" : 25,
"about" : "I love to reading go",
"interests" : [
"sports",
"music"
]
}
},
{
"_index" : "megacorp",
"_type" : "employee",
"_id" : "7",
"_score" : 0.8500352,
"_source" : {
"first_name" : "John",
"last_name" : "Smith",
"age" : 25,
"about" : "I love to reading and go",
"interests" : [
"sports",
"music"
]
}
},
{
"_index" : "megacorp",
"_type" : "employee",
"_id" : "8",
"_score" : 0.8500352,
"_source" : {
"first_name" : "John",
"last_name" : "Smith",
"age" : 25,
"about" : "I love to go and reading",
"interests" : [
"sports",
"music"
]
}
},
{
"_index" : "megacorp",
"_type" : "employee",
"_id" : "5",
"_score" : 0.6706225,
"_source" : {
"first_name" : "John",
"last_name" : "Smith",
"age" : 25,
"about" : "I love to reading",
"interests" : [
"sports",
"music"
]
}
},
{
"_index" : "megacorp",
"_type" : "employee",
"_id" : "2",
"_score" : 0.2761543,
"_source" : {
"first_name" : "John",
"last_name" : "Smith",
"age" : 25,
"about" : "I love to climbing go rock",
"interests" : [
"sports",
"music"
]
}
},
{
"_index" : "megacorp",
"_type" : "employee",
"_id" : "3",
"_score" : 0.2761543,
"_source" : {
"first_name" : "John",
"last_name" : "Smith",
"age" : 25,
"about" : "I love to go rock climbing",
"interests" : [
"sports",
"music"
]
}
}
]
}
}
從結(jié)果中,我們可以看到,go reading和reading go這樣的內(nèi)容都被匹配到了,同時他們的匹配度都是相同的,當(dāng)然他們也是匹配度最高的。而且檢索內(nèi)容還匹配到了包含go和reading的內(nèi)容,但是包含reading匹配項(xiàng)的匹配度比包含go匹配項(xiàng)的匹配度要高。
目前還不清楚它這個匹配度是如何計(jì)算的,現(xiàn)在只要知道_score越高,表示匹配度越高就行了。
短語搜索
相比于全文搜索,短語搜索屬于更精確的搜索。我們前面剛說過,全文搜索會將內(nèi)容拆分之后進(jìn)行搜索,屬于更模糊的搜索,但是短語搜索必須匹配到完整的短語才算。所以對于下面的檢索語句,它只會匹配包含go reading的內(nèi)容:
curl -X GET "localhost:9200/megacorp/employee/_search?pretty" -H 'Content-Type: application/json' -d'
{
"query" : {
"match_phrase" : {
"about" : "go reading"
}
}
}
'
返回結(jié)果:
{
"took" : 5,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.1097687,
"hits" : [
{
"_index" : "megacorp",
"_type" : "employee",
"_id" : "1",
"_score" : 1.1097687,
"_source" : {
"first_name" : "John",
"last_name" : "Smith",
"age" : 15,
"about" : "I love to go reading",
"interests" : [
"sports",
"music"
]
}
}
]
}
}
從結(jié)果我們可以看出來,最終結(jié)果只匹配到了包括go reading這個短語的內(nèi)容,包含其中任一單詞的并沒有被匹配到。這也說明,短語匹配必須匹配完整短語
高亮搜索
高亮搜索簡單來說,就是將我們檢索到的內(nèi)容進(jìn)行高亮處理,默認(rèn)情況下,會將檢索到的內(nèi)容加上em標(biāo)簽:
curl -X GET "localhost:9200/megacorp/employee/_search?pretty" -H 'Content-Type: application/json' -d'
{
"query" : {
"match_phrase" : {
"about" : "rock climbing"
}
},
"highlight": {
"fields" : {
"about" : {}
}
}
}
'
返回結(jié)果:
{
"took" : 116,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 1.8434994,
"hits" : [
{
"_index" : "megacorp",
"_type" : "employee",
"_id" : "4",
"_score" : 1.8434994,
"_source" : {
"first_name" : "John",
"last_name" : "Smith",
"age" : 25,
"about" : "I love to rock climbing",
"interests" : [
"sports",
"music"
]
},
"highlight" : {
"about" : [
"I love to <em>rock</em> <em>climbing</em>"
]
}
},
{
"_index" : "megacorp",
"_type" : "employee",
"_id" : "3",
"_score" : 1.7099125,
"_source" : {
"first_name" : "John",
"last_name" : "Smith",
"age" : 25,
"about" : "I love to go rock climbing",
"interests" : [
"sports",
"music"
]
},
"highlight" : {
"about" : [
"I love to go <em>rock</em> <em>climbing</em>"
]
}
}
]
}
}
當(dāng)然,高亮樣式是支持自定義的:
curl -X GET "localhost:9200/megacorp/employee/_search?pretty" -H 'Content-Type: application/json' -d'
{
"query" : {
"match_phrase" : {
"about" : "rock climbing"
}
},
"highlight": {
"pre_tags" : ["<p style=\"color:red\">"],
"post_tags" : ["</p>"],
"fields" : {
"about" : {}
}
}
}
'
返回結(jié)果:
{
...
"hits" : {
...
"hits" : [
{
...
},
"highlight" : {
"about" : [
"I love to <p style=\"color:red\">rock</p> <p style=\"color:red\">climbing</p>"
]
}
},
{
...
},
"highlight" : {
"about" : [
"I love to go <p style=\"color:red\">rock</p> <p style=\"color:red\">climbing</p>"
]
}
}
]
}
}
數(shù)據(jù)分析
下面我們就來看下簡單的數(shù)據(jù)統(tǒng)計(jì),下面的表達(dá)式是按age進(jìn)行分組,統(tǒng)計(jì)數(shù)據(jù)。這在es的專業(yè)術(shù)語叫聚合查詢,類似于傳統(tǒng)SQL中的group by,和傳統(tǒng)的SQL很像:
curl -X GET "localhost:9200/megacorp/employee/_search?pretty" -H 'Content-Type: application/json' -d'
{
"aggs": {
"all_interests": {
"terms": { "field": "age" }
}
}
}
'
返回結(jié)果:
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 9,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "megacorp",
"_type" : "employee",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"first_name" : "John",
"last_name" : "Smith",
"age" : 15,
"about" : "I love to go reading",
"interests" : [
"sports",
"music"
]
}
},
{
"_index" : "megacorp",
"_type" : "employee",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"first_name" : "John",
"last_name" : "Smith",
"age" : 25,
"about" : "I love to climbing go rock",
"interests" : [
"sports",
"music"
]
}
},
{
"_index" : "megacorp",
"_type" : "employee",
"_id" : "3",
"_score" : 1.0,
"_source" : {
"first_name" : "John",
"last_name" : "Smith",
"age" : 25,
"about" : "I love to go rock climbing",
"interests" : [
"sports",
"music"
]
}
},
{
"_index" : "megacorp",
"_type" : "employee",
"_id" : "4",
"_score" : 1.0,
"_source" : {
"first_name" : "John",
"last_name" : "Smith",
"age" : 25,
"about" : "I love to rock climbing",
"interests" : [
"sports",
"music"
]
}
},
{
"_index" : "megacorp",
"_type" : "employee",
"_id" : "5",
"_score" : 1.0,
"_source" : {
"first_name" : "John",
"last_name" : "Smith",
"age" : 25,
"about" : "I love to reading",
"interests" : [
"sports",
"music"
]
}
},
{
"_index" : "megacorp",
"_type" : "employee",
"_id" : "6",
"_score" : 1.0,
"_source" : {
"first_name" : "John",
"last_name" : "Smith",
"age" : 25,
"about" : "I love to reading go",
"interests" : [
"sports",
"music"
]
}
},
{
"_index" : "megacorp",
"_type" : "employee",
"_id" : "7",
"_score" : 1.0,
"_source" : {
"first_name" : "John",
"last_name" : "Smith",
"age" : 25,
"about" : "I love to reading and go",
"interests" : [
"sports",
"music"
]
}
},
{
"_index" : "megacorp",
"_type" : "employee",
"_id" : "8",
"_score" : 1.0,
"_source" : {
"first_name" : "John",
"last_name" : "Smith",
"age" : 25,
"about" : "I love to go and reading",
"interests" : [
"sports",
"music"
]
}
},
{
"_index" : "megacorp",
"_type" : "employee",
"_id" : "9",
"_score" : 1.0,
"_source" : {
"first_name" : "John",
"last_name" : "Smith",
"age" : 25,
"about" : "I love to do something",
"interests" : [
"sports",
"music"
]
}
}
]
},
"aggregations" : {
"all_interests" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : 25,
"doc_count" : 8
},
{
"key" : 15,
"doc_count" : 1
}
]
}
}
}
從返回結(jié)果中,我們可以看出,age 為25的數(shù)據(jù)有8條,age為15的數(shù)據(jù)有1條。
query和aggs組合使用
這里的聚合查詢aggs是可以和query同時存在的,,就和select name, count(*) from user group by name一樣:
curl -X GET "localhost:9200/megacorp/employee/_search?pretty" -H 'Content-Type: application/json' -d'
{
"query" : {
"match" : {
"about" : "go reading"
}
},
"aggs": {
"all_interests": {
"terms": { "field": "age" }
}
}
}
'
返回結(jié)果:
{
"took" : 10,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 7,
"relation" : "eq"
},
"max_score" : 1.1097689,
"hits" : [
{
"_index" : "megacorp",
"_type" : "employee",
"_id" : "1",
"_score" : 1.1097689,
"_source" : {
"first_name" : "John",
"last_name" : "Smith",
"age" : 15,
"about" : "I love to go reading",
"interests" : [
"sports",
"music"
]
}
},
{
"_index" : "megacorp",
"_type" : "employee",
"_id" : "6",
"_score" : 1.1097689,
"_source" : {
"first_name" : "John",
"last_name" : "Smith",
"age" : 25,
"about" : "I love to reading go",
"interests" : [
"sports",
"music"
]
}
},
{
"_index" : "megacorp",
"_type" : "employee",
"_id" : "7",
"_score" : 1.0293508,
"_source" : {
"first_name" : "John",
"last_name" : "Smith",
"age" : 25,
"about" : "I love to reading and go",
"interests" : [
"sports",
"music"
]
}
},
{
"_index" : "megacorp",
"_type" : "employee",
"_id" : "8",
"_score" : 1.0293508,
"_source" : {
"first_name" : "John",
"last_name" : "Smith",
"age" : 25,
"about" : "I love to go and reading",
"interests" : [
"sports",
"music"
]
}
},
{
"_index" : "megacorp",
"_type" : "employee",
"_id" : "5",
"_score" : 0.7753851,
"_source" : {
"first_name" : "John",
"last_name" : "Smith",
"age" : 25,
"about" : "I love to reading",
"interests" : [
"sports",
"music"
]
}
},
{
"_index" : "megacorp",
"_type" : "employee",
"_id" : "2",
"_score" : 0.36634043,
"_source" : {
"first_name" : "John",
"last_name" : "Smith",
"age" : 25,
"about" : "I love to climbing go rock",
"interests" : [
"sports",
"music"
]
}
},
{
"_index" : "megacorp",
"_type" : "employee",
"_id" : "3",
"_score" : 0.36634043,
"_source" : {
"first_name" : "John",
"last_name" : "Smith",
"age" : 25,
"about" : "I love to go rock climbing",
"interests" : [
"sports",
"music"
]
}
}
]
},
"aggregations" : {
"all_interests" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : 25,
"doc_count" : 6
},
{
"key" : 15,
"doc_count" : 1
}
]
}
}
}
在返回結(jié)果中aggregations就是聚合查詢的結(jié)果,buckets(桶)表示最后聚合結(jié)果,每個通(bucket)表示一條聚合結(jié)果。
更多集合檢索用法
聚合查詢的規(guī)則本身也是支持多個規(guī)則組合使用的,我們在上面的聚合查詢中又加入了平均值的計(jì)算:
curl -X GET "localhost:9200/megacorp/employee/_search?pretty" -H 'Content-Type: application/json' -d'
{
"query" : {
"match" : {
"about" : "go reading"
}
},
"aggs": {
"all_interests": {
"terms": { "field": "age" }
},
"avg_age" : {
"avg" : { "field" : "age" }
}
}
}
'
返回結(jié)果:
{
"took" : 16,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 7,
"relation" : "eq"
},
"max_score" : 1.1097689,
"hits" : [
{
"_index" : "megacorp",
"_type" : "employee",
"_id" : "1",
"_score" : 1.1097689,
"_source" : {
"first_name" : "John",
"last_name" : "Smith",
"age" : 15,
"about" : "I love to go reading",
"interests" : [
"sports",
"music"
]
}
},
{
"_index" : "megacorp",
"_type" : "employee",
"_id" : "6",
"_score" : 1.1097689,
"_source" : {
"first_name" : "John",
"last_name" : "Smith",
"age" : 25,
"about" : "I love to reading go",
"interests" : [
"sports",
"music"
]
}
},
{
"_index" : "megacorp",
"_type" : "employee",
"_id" : "7",
"_score" : 1.0293508,
"_source" : {
"first_name" : "John",
"last_name" : "Smith",
"age" : 25,
"about" : "I love to reading and go",
"interests" : [
"sports",
"music"
]
}
},
{
"_index" : "megacorp",
"_type" : "employee",
"_id" : "8",
"_score" : 1.0293508,
"_source" : {
"first_name" : "John",
"last_name" : "Smith",
"age" : 25,
"about" : "I love to go and reading",
"interests" : [
"sports",
"music"
]
}
},
{
"_index" : "megacorp",
"_type" : "employee",
"_id" : "5",
"_score" : 0.7753851,
"_source" : {
"first_name" : "John",
"last_name" : "Smith",
"age" : 25,
"about" : "I love to reading",
"interests" : [
"sports",
"music"
]
}
},
{
"_index" : "megacorp",
"_type" : "employee",
"_id" : "2",
"_score" : 0.36634043,
"_source" : {
"first_name" : "John",
"last_name" : "Smith",
"age" : 25,
"about" : "I love to climbing go rock",
"interests" : [
"sports",
"music"
]
}
},
{
"_index" : "megacorp",
"_type" : "employee",
"_id" : "3",
"_score" : 0.36634043,
"_source" : {
"first_name" : "John",
"last_name" : "Smith",
"age" : 25,
"about" : "I love to go rock climbing",
"interests" : [
"sports",
"music"
]
}
}
]
},
"aggregations" : {
"avg_age" : {
"value" : 23.571428571428573
},
"all_interests" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : 25,
"doc_count" : 6
},
{
"key" : 15,
"doc_count" : 1
}
]
}
}
}
總結(jié)
實(shí)話實(shí)說,elasticsearch的搜索語法確實(shí)還是很復(fù)雜的,截止到今天,按照官方文檔內(nèi)容的章節(jié)安排,搜索部分的入門內(nèi)容已經(jīng)結(jié)束了,但是我怎么覺得我像剛?cè)腴T一樣,還是對elasticsearch沒有全面的認(rèn)識。我看了下剩余的內(nèi)容,咋感覺現(xiàn)在才算正式開始了——深入搜索、處理人類語言、聚合、地理位置、數(shù)據(jù)建模、監(jiān)控等都沒開始學(xué)習(xí)呢?
好吧,我承認(rèn)以前對elasticsearch的認(rèn)知太淺顯了,它確實(shí)是一套獨(dú)立的知識體系(而不是一門語言),從某種程度上說,elasticsearch重新定義了搜索,所以還是好好學(xué)習(xí)吧,干就對了!
今天周末,稍微放縱了下,刷了半天的劇,然后快到晚上才開始梳理相關(guān)內(nèi)容,不過某種程度上我覺得我們還是比較自覺的,任務(wù)也算順利完成了,明天得早點(diǎn)開始了。加油吧,少年!
最后,預(yù)告下明天要更新的內(nèi)容,除了我們既定的elasticsearch之外,我還要完成7月份內(nèi)容的更新,所以明天的內(nèi)容還是比較多的。
好了,大家伙晚安吧!
- END -