<kbd id="afajh"><form id="afajh"></form></kbd>
<strong id="afajh"><dl id="afajh"></dl></strong>
    <del id="afajh"><form id="afajh"></form></del>
        1. <th id="afajh"><progress id="afajh"></progress></th>
          <b id="afajh"><abbr id="afajh"></abbr></b>
          <th id="afajh"><progress id="afajh"></progress></th>

          基于 ElasticSearch 實現(xiàn)站內全文搜索

          共 15039字,需瀏覽 31分鐘

           ·

          2021-11-08 14:13

          來源:blog.csdn.net/weixin_44671737/
          article/details/114456257

          • 摘要
          • 1 技術選型
            • 1.1 ElasticSearch
            • 1.2 springBoot
            • 1.3 ik分詞器
          • 2 環(huán)境準備
          • 3 項目架構
          • 4 實現(xiàn)效果
            • 4.1 搜索頁面
            • 4.2 搜索結果頁面
          • 5 具體代碼實現(xiàn)
            • 5.1 全文檢索的實現(xiàn)對象
            • 5.2 客戶端配置
            • 5.3 業(yè)務代碼編寫
            • 5.4 對外接口
            • 5.5 頁面
          • 6 小結


          摘要

          對于一家公司而言,數(shù)據(jù)量越來越多,如果快速去查找這些信息是一個很難的問題,在計算機領域有一個專門的領域IR(Information Retrival)研究如果獲取信息,做信息檢索。在國內的如百度這樣的搜索引擎也屬于這個領域,要自己實現(xiàn)一個搜索引擎是非常難的,不過信息查找對每一個公司都非常重要,對于開發(fā)人員也可以選則一些市場上的開源項目來構建自己的站內搜索引擎,本文將通過ElasticSearch來構建一個這樣的信息檢索項目。

          1 技術選型

          • 搜索引擎服務使用ElasticSearch
          • 提供的對外web服務選則springboot web

          1.1 ElasticSearch

          Elasticsearch是一個基于Lucene的搜索服務器。它提供了一個分布式多用戶能力的全文搜索引擎,基于RESTful web接口。Elasticsearch是用Java語言開發(fā)的,并作為Apache許可條款下的開放源碼發(fā)布,是一種流行的企業(yè)級搜索引擎。Elasticsearch用于云計算中,能夠達到實時搜索,穩(wěn)定,可靠,快速,安裝使用方便。

          官方客戶端在Java、.NET(C#)、PHP、Python、Apache Groovy、Ruby和許多其他語言中都是可用的。根據(jù)DB-Engines的排名顯示,Elasticsearch是最受歡迎的企業(yè)搜索引擎,其次是Apache Solr,也是基于Lucene。1

          現(xiàn)在開源的搜索引擎在市面上最常見的就是ElasticSearch和Solr,二者都是基于Lucene的實現(xiàn),其中ElasticSearch相對更加重量級,在分布式環(huán)境表現(xiàn)也更好,二者的選則需考慮具體的業(yè)務場景和數(shù)據(jù)量級。對于數(shù)據(jù)量不大的情況下,完全需要使用像Lucene這樣的搜索引擎服務,通過關系型數(shù)據(jù)庫檢索即可。

          1.2 springBoot

          Spring Boot makes it easy to create stand-alone, production-grade Spring based Applications that you can “just run”.2

          現(xiàn)在springBoot在做web開發(fā)上是絕對的主流,其不僅僅是開發(fā)上的優(yōu)勢,在布署,運維各個方面都有著非常不錯的表現(xiàn),并且spring生態(tài)圈的影響力太大了,可以找到各種成熟的解決方案。

          1.3 ik分詞器

          elasticSearch本身不支持中文的分詞,需要安裝中文分詞插件,如果需要做中文的信息檢索,中文分詞是基礎,此處選則了ik,下載好后放入elasticSearch的安裝位置的plugin目錄即可。

          2 環(huán)境準備

          需要安裝好elastiSearch以及kibana(可選),并且需要lk分詞插件。

          • 安裝elasticSearch elasticsearch官網(wǎng). 筆者使用的是7.5.1。
          • ik插件下載 ik插件github地址. 注意下載和你下載elasticsearch版本一樣的ik插件。
          • 將ik插件放入elasticsearch安裝目錄下的plugins包下,新建報名ik,將下載好的插件解壓到該目錄下即可,啟動es的時候會自動加載該插件。
          圖片
          • 搭建springboot項目 idea ->new project ->spring initializer
          圖片

          3 項目架構

          • 獲取數(shù)據(jù)使用ik分詞插件
          • 將數(shù)據(jù)存儲在es引擎中
          • 通過es檢索方式對存儲的數(shù)據(jù)進行檢索
          • 使用es的java客戶端提供外部服務
          圖片

          4 實現(xiàn)效果

          4.1 搜索頁面

          簡單實現(xiàn)一個類似百度的搜索框即可。

          圖片

          4.2 搜索結果頁面

          圖片

          點擊第一個搜索結果是我個人的某一篇博文,為了避免數(shù)據(jù)版權問題,筆者在es引擎中存放的全是個人的博客數(shù)據(jù)。

          圖片

          5 具體代碼實現(xiàn)

          5.1 全文檢索的實現(xiàn)對象

          按照博文的基本信息定義了如下實體類,主要需要知道每一個博文的url,通過檢索出來的文章具體查看要跳轉到該url。

          package?com.lbh.es.entity;

          import?com.fasterxml.jackson.annotation.JsonIgnore;

          import?javax.persistence.*;

          /**
          ?*?PUT?articles
          ?*?{
          ?*?"mappings":
          ?*?{"properties":{
          ?*?"author":{"type":"text"},
          ?*?"content":{"type":"text","analyzer":"ik_max_word","search_analyzer":"ik_smart"},
          ?*?"title":{"type":"text","analyzer":"ik_max_word","search_analyzer":"ik_smart"},
          ?*?"createDate":{"type":"date","format":"yyyy-MM-dd?HH:mm:ss||yyyy-MM-dd"},
          ?*?"url":{"type":"text"}
          ?*?}?},
          ?*?"settings":{
          ?*?????"index":{
          ?*???????"number_of_shards":1,
          ?*???????"number_of_replicas":2
          ?*?????}
          ?*???}
          ?*?}
          ?*?---------------------------------------------------------------------------------------------------------------------
          ?*?Copyright(c)[email protected]
          ?*?@author?liubinhao
          ?*?@date?2021/3/3
          ?*/

          @Entity
          @Table(name?=?"es_article")
          public?class?ArticleEntity?{
          ????@Id
          ????@JsonIgnore
          ????@GeneratedValue(strategy?=?GenerationType.IDENTITY)
          ????private?long?id;
          ????@Column(name?=?"author")
          ????private?String?author;
          ????@Column(name?=?"content",columnDefinition="TEXT")
          ????private?String?content;
          ????@Column(name?=?"title")
          ????private?String?title;
          ????@Column(name?=?"createDate")
          ????private?String?createDate;
          ????@Column(name?=?"url")
          ????private?String?url;

          ????public?String?getAuthor()?{
          ????????return?author;
          ????}

          ????public?void?setAuthor(String?author)?{
          ????????this.author?=?author;
          ????}

          ????public?String?getContent()?{
          ????????return?content;
          ????}

          ????public?void?setContent(String?content)?{
          ????????this.content?=?content;
          ????}

          ????public?String?getTitle()?{
          ????????return?title;
          ????}

          ????public?void?setTitle(String?title)?{
          ????????this.title?=?title;
          ????}

          ????public?String?getCreateDate()?{
          ????????return?createDate;
          ????}

          ????public?void?setCreateDate(String?createDate)?{
          ????????this.createDate?=?createDate;
          ????}

          ????public?String?getUrl()?{
          ????????return?url;
          ????}

          ????public?void?setUrl(String?url)?{
          ????????this.url?=?url;
          ????}
          }

          5.2 客戶端配置

          通過java配置es的客戶端。

          package?com.lbh.es.config;

          import?org.apache.http.HttpHost;
          import?org.elasticsearch.client.RestClient;
          import?org.elasticsearch.client.RestClientBuilder;
          import?org.elasticsearch.client.RestHighLevelClient;
          import?org.springframework.beans.factory.annotation.Value;
          import?org.springframework.context.annotation.Bean;
          import?org.springframework.context.annotation.Configuration;

          import?java.util.ArrayList;
          import?java.util.List;

          /**
          ?*?Copyright(c)[email protected]
          ?*?@author?liubinhao
          ?*?@date?2021/3/3
          ?*/

          @Configuration
          public?class?EsConfig?{

          ????@Value("${elasticsearch.schema}")
          ????private?String?schema;
          ????@Value("${elasticsearch.address}")
          ????private?String?address;
          ????@Value("${elasticsearch.connectTimeout}")
          ????private?int?connectTimeout;
          ????@Value("${elasticsearch.socketTimeout}")
          ????private?int?socketTimeout;
          ????@Value("${elasticsearch.connectionRequestTimeout}")
          ????private?int?tryConnTimeout;
          ????@Value("${elasticsearch.maxConnectNum}")
          ????private?int?maxConnNum;
          ????@Value("${elasticsearch.maxConnectPerRoute}")
          ????private?int?maxConnectPerRoute;

          ????@Bean
          ????public?RestHighLevelClient?restHighLevelClient()?{
          ????????//?拆分地址
          ????????List?hostLists?=?new?ArrayList<>();
          ????????String[]?hostList?=?address.split(",");
          ????????for?(String?addr?:?hostList)?{
          ????????????String?host?=?addr.split(":")[0];
          ????????????String?port?=?addr.split(":")[1];
          ????????????hostLists.add(new?HttpHost(host,?Integer.parseInt(port),?schema));
          ????????}
          ????????//?轉換成?HttpHost?數(shù)組
          ????????HttpHost[]?httpHost?=?hostLists.toArray(new?HttpHost[]{});
          ????????//?構建連接對象
          ????????RestClientBuilder?builder?=?RestClient.builder(httpHost);
          ????????//?異步連接延時配置
          ????????builder.setRequestConfigCallback(requestConfigBuilder?->?{
          ????????????requestConfigBuilder.setConnectTimeout(connectTimeout);
          ????????????requestConfigBuilder.setSocketTimeout(socketTimeout);
          ????????????requestConfigBuilder.setConnectionRequestTimeout(tryConnTimeout);
          ????????????return?requestConfigBuilder;
          ????????});
          ????????//?異步連接數(shù)配置
          ????????builder.setHttpClientConfigCallback(httpClientBuilder?->?{
          ????????????httpClientBuilder.setMaxConnTotal(maxConnNum);
          ????????????httpClientBuilder.setMaxConnPerRoute(maxConnectPerRoute);
          ????????????return?httpClientBuilder;
          ????????});
          ????????return?new?RestHighLevelClient(builder);
          ????}

          }

          5.3 業(yè)務代碼編寫

          包括一些檢索文章的信息,可以從文章標題,文章內容以及作者信息這些維度來查看相關信息。

          package?com.lbh.es.service;

          import?com.google.gson.Gson;
          import?com.lbh.es.entity.ArticleEntity;
          import?com.lbh.es.repository.ArticleRepository;
          import?org.elasticsearch.action.admin.indices.delete.DeleteIndexRequest;
          import?org.elasticsearch.action.get.GetRequest;
          import?org.elasticsearch.action.get.GetResponse;
          import?org.elasticsearch.action.index.IndexRequest;
          import?org.elasticsearch.action.index.IndexResponse;
          import?org.elasticsearch.action.search.SearchRequest;
          import?org.elasticsearch.action.search.SearchResponse;
          import?org.elasticsearch.action.support.master.AcknowledgedResponse;
          import?org.elasticsearch.client.RequestOptions;
          import?org.elasticsearch.client.RestHighLevelClient;
          import?org.elasticsearch.client.indices.CreateIndexRequest;
          import?org.elasticsearch.client.indices.CreateIndexResponse;
          import?org.elasticsearch.common.settings.Settings;
          import?org.elasticsearch.common.xcontent.XContentType;
          import?org.elasticsearch.index.query.QueryBuilders;
          import?org.elasticsearch.search.SearchHit;
          import?org.elasticsearch.search.builder.SearchSourceBuilder;
          import?org.springframework.stereotype.Service;

          import?javax.annotation.Resource;
          import?java.io.IOException;

          import?java.util.*;

          /**
          ?*?Copyright(c)[email protected]
          ?*?@author?liubinhao
          ?*?@date?2021/3/3
          ?*/

          @Service
          public?class?ArticleService?{

          ????private?static?final?String?ARTICLE_INDEX?=?"article";

          ????@Resource
          ????private?RestHighLevelClient?client;
          ????@Resource
          ????private?ArticleRepository?articleRepository;

          ????public?boolean?createIndexOfArticle(){
          ????????Settings?settings?=?Settings.builder()
          ????????????????.put("index.number_of_shards",?1)
          ????????????????.put("index.number_of_replicas",?1)
          ????????????????.build();
          //?{"properties":{"author":{"type":"text"},
          //?"content":{"type":"text","analyzer":"ik_max_word","search_analyzer":"ik_smart"}
          //?,"title":{"type":"text","analyzer":"ik_max_word","search_analyzer":"ik_smart"},
          //?,"createDate":{"type":"date","format":"yyyy-MM-dd?HH:mm:ss||yyyy-MM-dd"}
          //?}
          ????????String?mapping?=?"{\"properties\":{\"author\":{\"type\":\"text\"},\n"?+
          ????????????????"\"content\":{\"type\":\"text\",\"analyzer\":\"ik_max_word\",\"search_analyzer\":\"ik_smart\"}\n"?+
          ????????????????",\"title\":{\"type\":\"text\",\"analyzer\":\"ik_max_word\",\"search_analyzer\":\"ik_smart\"}\n"?+
          ????????????????",\"createDate\":{\"type\":\"date\",\"format\":\"yyyy-MM-dd?HH:mm:ss||yyyy-MM-dd\"}\n"?+
          ????????????????"},\"url\":{\"type\":\"text\"}\n"?+
          ????????????????"}";
          ????????CreateIndexRequest?indexRequest?=?new?CreateIndexRequest(ARTICLE_INDEX)
          ????????????????.settings(settings).mapping(mapping,XContentType.JSON);
          ????????CreateIndexResponse?response?=?null;
          ????????try?{
          ????????????response?=?client.indices().create(indexRequest,?RequestOptions.DEFAULT);
          ????????}?catch?(IOException?e)?{
          ????????????e.printStackTrace();
          ????????}
          ????????if?(response!=null)?{
          ????????????System.err.println(response.isAcknowledged()???"success"?:?"default");
          ????????????return?response.isAcknowledged();
          ????????}?else?{
          ????????????return?false;
          ????????}
          ????}

          ????public?boolean?deleteArticle(){
          ????????DeleteIndexRequest?request?=?new?DeleteIndexRequest(ARTICLE_INDEX);
          ????????try?{
          ????????????AcknowledgedResponse?response?=?client.indices().delete(request,?RequestOptions.DEFAULT);
          ????????????return?response.isAcknowledged();
          ????????}?catch?(IOException?e)?{
          ????????????e.printStackTrace();
          ????????}
          ????????return?false;
          ????}

          ????public?IndexResponse?addArticle(ArticleEntity?article){
          ????????Gson?gson?=?new?Gson();
          ????????String?s?=?gson.toJson(article);
          ????????//創(chuàng)建索引創(chuàng)建對象
          ????????IndexRequest?indexRequest?=?new?IndexRequest(ARTICLE_INDEX);
          ????????//文檔內容
          ????????indexRequest.source(s,XContentType.JSON);
          ????????//通過client進行http的請求
          ????????IndexResponse?re?=?null;
          ????????try?{
          ????????????re?=?client.index(indexRequest,?RequestOptions.DEFAULT);
          ????????}?catch?(IOException?e)?{
          ????????????e.printStackTrace();
          ????????}
          ????????return?re;
          ????}

          ????public?void?transferFromMysql(){
          ????????articleRepository.findAll().forEach(this::addArticle);
          ????}

          ????public?List?queryByKey(String?keyword){
          ????????SearchRequest?request?=?new?SearchRequest();
          ????????/*
          ?????????*?創(chuàng)建??搜索內容參數(shù)設置對象:SearchSourceBuilder
          ?????????*?相對于matchQuery,multiMatchQuery針對的是多個fi eld,也就是說,當multiMatchQuery中,fieldNames參數(shù)只有一個時,其作用與matchQuery相當;
          ?????????*?而當fieldNames有多個參數(shù)時,如field1和field2,那查詢的結果中,要么field1中包含text,要么field2中包含text。
          ?????????*/

          ????????SearchSourceBuilder?searchSourceBuilder?=?new?SearchSourceBuilder();

          ????????searchSourceBuilder.query(QueryBuilders
          ????????????????.multiMatchQuery(keyword,?"author","content","title"));
          ????????request.source(searchSourceBuilder);
          ????????List?result?=?new?ArrayList<>();
          ????????try?{
          ????????????SearchResponse?search?=?client.search(request,?RequestOptions.DEFAULT);
          ????????????for?(SearchHit?hit:search.getHits()){
          ????????????????Map?map?=?hit.getSourceAsMap();
          ????????????????ArticleEntity?item?=?new?ArticleEntity();
          ????????????????item.setAuthor((String)?map.get("author"));
          ????????????????item.setContent((String)?map.get("content"));
          ????????????????item.setTitle((String)?map.get("title"));
          ????????????????item.setUrl((String)?map.get("url"));
          ????????????????result.add(item);
          ????????????}
          ????????????return?result;
          ????????}?catch?(IOException?e)?{
          ????????????e.printStackTrace();
          ????????}
          ????????return?null;
          ????}

          ????public?ArticleEntity?queryById(String?indexId){
          ????????GetRequest?request?=?new?GetRequest(ARTICLE_INDEX,?indexId);
          ????????GetResponse?response?=?null;
          ????????try?{
          ????????????response?=?client.get(request,?RequestOptions.DEFAULT);
          ????????}?catch?(IOException?e)?{
          ????????????e.printStackTrace();
          ????????}
          ????????if?(response!=null&&response.isExists()){
          ????????????Gson?gson?=?new?Gson();
          ????????????return?gson.fromJson(response.getSourceAsString(),ArticleEntity.class);
          ????????}
          ????????return?null;
          ????}
          }

          5.4 對外接口

          和使用springboot開發(fā)web程序相同。

          package?com.lbh.es.controller;

          import?com.lbh.es.entity.ArticleEntity;
          import?com.lbh.es.service.ArticleService;
          import?org.elasticsearch.action.index.IndexResponse;
          import?org.springframework.web.bind.annotation.*;

          import?javax.annotation.Resource;
          import?java.util.List;

          /**
          ?*?Copyright(c)[email protected]
          ?*?@author?liubinhao
          ?*?@date?2021/3/3
          ?*/

          @RestController
          @RequestMapping("article")
          public?class?ArticleController?{

          ????@Resource
          ????private?ArticleService?articleService;

          ????@GetMapping("/create")
          ????public?boolean?create(){
          ????????return?articleService.createIndexOfArticle();
          ????}

          ????@GetMapping("/delete")
          ????public?boolean?delete()?{
          ????????return?articleService.deleteArticle();
          ????}

          ????@PostMapping("/add")
          ????public?IndexResponse?add(@RequestBody?ArticleEntity?article){
          ????????return?articleService.addArticle(article);
          ????}

          ????@GetMapping("/fransfer")
          ????public?String?transfer(){
          ????????articleService.transferFromMysql();
          ????????return?"successful";
          ????}

          ????@GetMapping("/query")
          ????public?List?query(String?keyword){
          ????????return?articleService.queryByKey(keyword);
          ????}
          }

          5.5 頁面

          此處頁面使用thymeleaf,主要原因是筆者真滴不會前端,只懂一丟丟簡單的h5,就隨便做了一個可以展示的頁面。

          搜索頁面


          "en"?xmlns:th="http://www.thymeleaf.org">

          ????"UTF-8"?/>
          ????"viewport"?content="width=device-width,?initial-scale=1.0"?/>
          ????YiyiDu
          ????
          ????
          ????
          ????class="input/css">
          ????????.input?{
          ????????????width:?33%;
          ????????????height:?45px;
          ????????????vertical-align:?top;
          ????????????box-sizing:?border-box;
          ????????????border:?2px?solid?rgb(207,?205,?205);
          ????????????border-right:?2px?solid?rgb(62,?88,?206);
          ????????????border-bottom-left-radius:?10px;
          ????????????border-top-left-radius:?10px;
          ????????????outline:?none;
          ????????????margin:?0;
          ????????????display:?inline-block;
          ????????????background:?url(/static/img/camera.jpg?watermark/2/text/5YWs5LyX5Y-377ya6IqL6YGT5rqQ56CB/font/5a6L5L2T/fontsize/400/fill/cmVk)?no-repeat?0?0;
          ????????????background-position:?565px?7px;
          ????????????background-size:?28px;
          ????????????padding-right:?49px;
          ????????????padding-top:?10px;
          ????????????padding-bottom:?10px;
          ????????????line-height:?16px;
          ????????}
          ????
          ????
          ????class="button/css">
          ????????.button?{
          ????????????height:?45px;
          ????????????width:?130px;
          ????????????vertical-align:?middle;
          ????????????text-indent:?-8px;
          ????????????padding-left:?-8px;
          ????????????background-color:?rgb(62,?88,?206);
          ????????????color:?white;
          ????????????font-size:?18px;
          ????????????outline:?none;
          ????????????border:?none;
          ????????????border-bottom-right-radius:?10px;
          ????????????border-top-right-radius:?10px;
          ????????????margin:?0;
          ????????????padding:?0;
          ????????}
          ????




          ????"font-size:?0px;">
          ????????"center"?style="margin-top:?0px;">
          ????????????"../static/img/yyd.png"?th:src?=?"@{/static/img/yyd.png}"??alt="一億度"?width="280px"?class="pic"?/>
          ????????

          ????????"center">
          ????????????
          ????????????"/home/query">
          ????????????????"text"?class="input"?name="keyword"?/>
          ????????????????"submit"?class="button"?value="一億度下"?/>
          ????????????
          ????????

          ????



          搜索結果頁面


          "en"?xmlns:th="http://www.thymeleaf.org">

          ????"stylesheet"?href="https://cdn.staticfile.org/twitter-bootstrap/4.3.1/css/bootstrap.min.css">
          ????"UTF-8">
          ????xx-manager


          "search.html">
          class="container?my-2">
          ????"article?:?${articles}">
          ????????"${article.url}">"${article.author}+${article.content}">
          ????

          "footer.html">


          6 小結

          上班擼代碼,下班繼續(xù)擼代碼寫博客,花了兩天研究了以下es,其實這個玩意兒還是挺有意思的,現(xiàn)在IR領域最基礎的還是基于統(tǒng)計學的,所以對于es這類搜索引擎而言在大數(shù)據(jù)的情況下具有良好的表現(xiàn)。每一次寫實戰(zhàn)筆者其實都感覺有些無從下手,因為不知道做啥?所以也希望得到一些有意思的點子筆者會將實戰(zhàn)做出來。

          瀏覽 47
          點贊
          評論
          收藏
          分享

          手機掃一掃分享

          分享
          舉報
          評論
          圖片
          表情
          推薦
          點贊
          評論
          收藏
          分享

          手機掃一掃分享

          分享
          舉報
          <kbd id="afajh"><form id="afajh"></form></kbd>
          <strong id="afajh"><dl id="afajh"></dl></strong>
            <del id="afajh"><form id="afajh"></form></del>
                1. <th id="afajh"><progress id="afajh"></progress></th>
                  <b id="afajh"><abbr id="afajh"></abbr></b>
                  <th id="afajh"><progress id="afajh"></progress></th>
                  啪啪视频在线观看 | 97色色婷婷五月天 | 日逼视频 | 国产亚洲视频5区 | 天天日天天添天天爽 |