1 Loki

1.8.1 502 BadGateWay
1.8.2 Ingester not ready: instance xx:9095 in state JOINING
1.8.3 too many unhealthy instances in the ring
1.8.4 Data source connected
1.6.1 Logstash作為日志收集客戶端
1.4.1 k8s部署
1.4.2 裸機(jī)部署
1.4.1.1 創(chuàng)建配置文件
1.4.1.2 創(chuàng)建DaemonSet文件
1.4.1.3 創(chuàng)建promtail應(yīng)用
1.3.1 AllInOne部署模式
1.3.2 裸機(jī)部署
1.3.1.1 k8s部署
1.3.1.2 創(chuàng)建configmap
1.3.1.3 創(chuàng)建持久化存儲
1.3.1.4 創(chuàng)建應(yīng)用
1.3.1.5 驗(yàn)證部署結(jié)果
1.2.1 日志解析格式
1.2.2 日志搜集架構(gòu)模式
1.2.3 Loki部署模式
1.1 引言
1.2 Loki工作方式
1.3 服務(wù)端部署
1.4 Promtail部署
1.5 數(shù)據(jù)源
1.6 其他客戶端配置
1.7 Helm安裝
1.8 故障解決方案

1 Loki

1.1 引言

Loki 是一個輕量級的日志收集、分析的應(yīng)用，采用的是promtail的方式來獲取日志內(nèi)容并送到loki里面進(jìn)行存儲，最終在grafana的datasource里面添加數(shù)據(jù)源進(jìn)行日志的展示、查詢。

官方文檔：https://kubernetes.io/docs/concepts/security/pod-security-policy

loki的持久化存儲支持azure、gcs、s3、swift、local這5中類型，其中常用的是s3、local。另外，它還支持很多種日志搜集類型，像最常用的logstash、fluentbit也在官方支持的列表中。

優(yōu)點(diǎn)：

支持的客戶端，如Promtail，F(xiàn)luentbit，F(xiàn)luentd，Vector，Logstash和Grafana Agent
首選代理Promtail，可以多來源提取日志，包括本地日志文件，systemd，Windows事件日志，Docker日志記錄驅(qū)動程序等
沒有日志格式要求，包括JSON，XML，CSV，logfmt，非結(jié)構(gòu)化文本
使用與查詢指標(biāo)相同的語法查詢?nèi)罩?/p>
日志查詢時允許動態(tài)篩選和轉(zhuǎn)換日志行
可以輕松地計(jì)算日志中的需要的指標(biāo)
引入時的最小索引意味著您可以在查詢時動態(tài)地對日志進(jìn)行切片和切塊，以便在出現(xiàn)新問題時回答它們
云原生支持，使用Prometheus形式抓取數(shù)據(jù)

各日志收集組件簡單對比

名稱	安裝的組件	優(yōu)點(diǎn)
ELK/EFK	elasticsearch、logstash， kibana、filebeat、kafka/redis	支持自定義grok正則解析復(fù)雜日志內(nèi)容；dashboard支持主富的可視化展示
Loki	grafana、loki、promtail	占用資源??；grafana原生支持；查詢速度快

1.2 Loki工作方式

1.2.1 日志解析格式

從上面的圖中我們可以看到，它在解析日志的時候是以index為主的，index包括時間戳和pod的部分label（其他label為filename、containers等），其余的是日志內(nèi)容。具體查詢效果如下：

{app="loki",namespace="kube-public"}為索引

1.2.2 日志搜集架構(gòu)模式

在使用過程中，官方推薦使用promtail做為agent以DaemonSet方式部署在kubernetes的worker節(jié)點(diǎn)上搜集日志。另外也可以用上面提到的其他日志收集工具來收取，這篇文章在結(jié)尾處會附上其他工具的配置方式。

1.2.3 Loki部署模式

Loki由許多組件微服務(wù)構(gòu)建而成，微服務(wù)組件有5個。在這5個里面添加緩存用來把數(shù)據(jù)放起來加快查詢。數(shù)據(jù)放在共享存儲里面配置memberlist_config部分并在實(shí)例之間共享狀態(tài)，將Loki進(jìn)行無限橫向擴(kuò)展。

在配置完memberlist_config部分后采用輪詢的方式查找數(shù)據(jù)。為了使用方便官方把所有的微服務(wù)編譯成一個二進(jìn)制，可以通過命令行參數(shù)-target控制，支持all、read、write，我們在部署時根據(jù)日志量的大小可以指定不同模式

all（讀寫模式）
服務(wù)啟動后，我們做的數(shù)據(jù)查詢、數(shù)據(jù)寫入都是來自這一個節(jié)點(diǎn)
read/write（讀寫分離模式）
在讀寫分離模式下運(yùn)行時fronted-query查詢會將流量轉(zhuǎn)發(fā)到read節(jié)點(diǎn)上。讀節(jié)點(diǎn)上保留了querier、ruler、fronted，寫節(jié)點(diǎn)上保留了distributor、ingester
微服務(wù)模式運(yùn)行
微服務(wù)模式運(yùn)行下，通過不同的配置參數(shù)啟動為不同的角色，每一個進(jìn)程都引用它的目標(biāo)角色服務(wù)。

組件名稱	功能
分發(fā)器/調(diào)度器(distributor)	驗(yàn)證數(shù)據(jù)合規(guī):數(shù)據(jù)排序; hash一致性， QPS限制，轉(zhuǎn)發(fā)，數(shù)據(jù)副本保證不丟失
收集器(ingester)	時間戳排序: 文件系統(tǒng)支持: WAL預(yù)寫
查詢前端 (query frontend)	提供頁面操作，向后端存儲發(fā)出數(shù)據(jù)查詢；查詢隊(duì)列 (query-queueing) 能夠防止大數(shù)據(jù)量查詢時觸發(fā)0OM；查詢分割 (query-split) 可以分割大批量查詢最后進(jìn)行數(shù)據(jù)聚臺
查詢器Querier	使用loggl語言在后端存儲中查詢?nèi)罩?/td>
緩存	將查詢到的日志緩存起來共后續(xù)使用，如果數(shù)據(jù)不完整重新查詢?nèi)笔У臄?shù)據(jù)

1.3 服務(wù)端部署

在部署之前需要準(zhǔn)備好一個k8s集群才行哦

應(yīng)用	鏡像
loki	grafana/loki:2.5.0
promtail	grafana/promtail:2.5.0

1.3.1 AllInOne部署模式

1.3.1.1 k8s部署

我們從github上下載的程序是沒有配置文件的，需要提前將文件準(zhǔn)備一份。這里提供了一份完整的allInOne配置文件，部分內(nèi)容進(jìn)行了優(yōu)化。

配置文件內(nèi)容如下所示

auth_enabled: false
target: all
ballast_bytes: 20480
server:
  grpc_listen_port: 9095
  http_listen_port: 3100
  graceful_shutdown_timeout: 20s
  grpc_listen_address: "0.0.0.0"
  grpc_listen_network: "tcp"
  grpc_server_max_concurrent_streams: 100
  grpc_server_max_recv_msg_size: 4194304
  grpc_server_max_send_msg_size: 4194304
  http_server_idle_timeout: 2m
  http_listen_address: "0.0.0.0"
  http_listen_network: "tcp"
  http_server_read_timeout: 30s
  http_server_write_timeout: 20s
  log_source_ips_enabled: true
  # http_path_prefix如果需要更改，在推送日志的時候前綴都需要加指定的內(nèi)容
  # http_path_prefix: "/"
  register_instrumentation: true
  log_format: json
  log_level: info
distributor:
  ring:
    heartbeat_timeout: 3s
    kvstore:
      prefix: collectors/
      store: memberlist
      # 需要提前創(chuàng)建好consul集群
    #   consul:
    #     http_client_timeout: 20s
    #     consistent_reads: true
    #     host: 127.0.0.1:8500
    #     watch_burst_size: 2
    #     watch_rate_limit: 2
querier:
  engine:
    max_look_back_period: 20s 
    timeout: 3m0s 
  extra_query_delay: 100ms 
  max_concurrent: 10 
  multi_tenant_queries_enabled: true
  query_ingester_only: false
  query_ingesters_within: 3h0m0s
  query_store_only: false
  query_timeout: 5m0s
  tail_max_duration: 1h0s
query_scheduler:
  max_outstanding_requests_per_tenant: 2048
  grpc_client_config:
    max_recv_msg_size: 104857600
    max_send_msg_size: 16777216
    grpc_compression: gzip
    rate_limit: 0
    rate_limit_burst: 0
    backoff_on_ratelimits: false
    backoff_config:
      min_period: 50ms
      max_period: 15s
      max_retries: 5 
  use_scheduler_ring: true
  scheduler_ring:
    kvstore:
      store: memberlist
      prefix: "collectors/"
    heartbeat_period: 30s
    heartbeat_timeout: 1m0s
    # 默認(rèn)第一個網(wǎng)卡的名稱
    # instance_interface_names
    # instance_addr: 127.0.0.1
    # 默認(rèn)server.grpc-listen-port
    instance_port: 9095
frontend:
  max_outstanding_per_tenant: 4096
  querier_forget_delay: 1h0s
  compress_responses: true
  log_queries_longer_than: 2m0s
  max_body_size: 104857600
  query_stats_enabled: true
  scheduler_dns_lookup_period: 10s 
  scheduler_worker_concurrency: 15
query_range:
  align_queries_with_step: true
  cache_results: true
  parallelise_shardable_queries: true
  max_retries: 3
  results_cache:
    cache:
      enable_fifocache: false
      default_validity: 30s 
      background:
        writeback_buffer: 10000
      redis:
        endpoint: 127.0.0.1:6379
        timeout: 1s
        expiration: 0s 
        db: 9
        pool_size: 128 
        password: 1521Qyx6^
        tls_enabled: false
        tls_insecure_skip_verify: true
        idle_timeout: 10s 
        max_connection_age: 8h
ruler:
  enable_api: true
  enable_sharding: true
  alertmanager_refresh_interval: 1m
  disable_rule_group_label: false
  evaluation_interval: 1m0s
  flush_period: 3m0s
  for_grace_period: 20m0s
  for_outage_tolerance: 1h0s
  notification_queue_capacity: 10000
  notification_timeout: 4s
  poll_interval: 10m0s
  query_stats_enabled: true
  remote_write:
    config_refresh_period: 10s
    enabled: false
  resend_delay: 2m0s
  rule_path: /rulers
  search_pending_for: 5m0s
  storage:
    local:
      directory: /data/loki/rulers
    type: configdb
  sharding_strategy: default
  wal_cleaner:
    period:  240h
    min_age: 12h0m0s
  wal:
    dir: /data/loki/ruler_wal
    max_age: 4h0m0s
    min_age: 5m0s
    truncate_frequency: 1h0m0s
  ring:
    kvstore:
      store: memberlist
      prefix: "collectors/"
    heartbeat_period: 5s
    heartbeat_timeout: 1m0s
    # instance_addr: "127.0.0.1"
    # instance_id: "miyamoto.en0"
    # instance_interface_names: ["en0","lo0"]
    instance_port: 9500
    num_tokens: 100
ingester_client:
  pool_config:
    health_check_ingesters: false
    client_cleanup_period: 10s 
    remote_timeout: 3s
  remote_timeout: 5s 
ingester:
  autoforget_unhealthy: true
  chunk_encoding: gzip
  chunk_target_size: 1572864
  max_transfer_retries: 0
  sync_min_utilization: 3.5
  sync_period: 20s
  flush_check_period: 30s 
  flush_op_timeout: 10m0s
  chunk_retain_period: 1m30s
  chunk_block_size: 262144
  chunk_idle_period: 1h0s
  max_returned_stream_errors: 20
  concurrent_flushes: 3
  index_shards: 32
  max_chunk_age: 2h0m0s
  query_store_max_look_back_period: 3h30m30s
  wal:
    enabled: true
    dir: /data/loki/wal 
    flush_on_shutdown: true
    checkpoint_duration: 15m
    replay_memory_ceiling: 2GB
  lifecycler:
    ring:
      kvstore:
        store: memberlist
        prefix: "collectors/"
      heartbeat_timeout: 30s 
      replication_factor: 1
    num_tokens: 128
    heartbeat_period: 5s 
    join_after: 5s 
    observe_period: 1m0s
    # interface_names: ["en0","lo0"]
    final_sleep: 10s 
    min_ready_duration: 15s
storage_config:
  boltdb:
    directory: /data/loki/boltdb 
  boltdb_shipper:
    active_index_directory: /data/loki/active_index
    build_per_tenant_index: true
    cache_location: /data/loki/cache 
    cache_ttl: 48h
    resync_interval: 5m
    query_ready_num_days: 5
    index_gateway_client:
      grpc_client_config:
  filesystem:
    directory: /data/loki/chunks
chunk_store_config:
  chunk_cache_config:
    enable_fifocache: true
    default_validity: 30s
    background:
      writeback_buffer: 10000
    redis:
      endpoint: 192.168.3.56:6379
      timeout: 1s
      expiration: 0s 
      db: 8 
      pool_size: 128 
      password: 1521Qyx6^
      tls_enabled: false
      tls_insecure_skip_verify: true
      idle_timeout: 10s 
      max_connection_age: 8h
    fifocache:
      ttl: 1h
      validity: 30m0s
      max_size_items: 2000
      max_size_bytes: 500MB
  write_dedupe_cache_config:
    enable_fifocache: true
    default_validity: 30s 
    background:
      writeback_buffer: 10000
    redis:
      endpoint: 127.0.0.1:6379
      timeout: 1s
      expiration: 0s 
      db: 7
      pool_size: 128 
      password: 1521Qyx6^
      tls_enabled: false
      tls_insecure_skip_verify: true
      idle_timeout: 10s 
      max_connection_age: 8h
    fifocache:
      ttl: 1h
      validity: 30m0s
      max_size_items: 2000
      max_size_bytes: 500MB
  cache_lookups_older_than: 10s 
# 壓縮碎片索引
compactor:
  shared_store: filesystem
  shared_store_key_prefix: index/
  working_directory: /data/loki/compactor
  compaction_interval: 10m0s
  retention_enabled: true
  retention_delete_delay: 2h0m0s
  retention_delete_worker_count: 150
  delete_request_cancel_period: 24h0m0s
  max_compaction_parallelism: 2
  # compactor_ring:
frontend_worker:
  match_max_concurrent: true
  parallelism: 10
  dns_lookup_duration: 5s 
# runtime_config 這里沒有配置任何信息
# runtime_config:
common:
  storage:
    filesystem:
      chunks_directory: /data/loki/chunks
      fules_directory: /data/loki/rulers
  replication_factor: 3
  persist_tokens: false
  # instance_interface_names: ["en0","eth0","ens33"]
analytics:
  reporting_enabled: false
limits_config:
  ingestion_rate_strategy: global
  ingestion_rate_mb: 100
  ingestion_burst_size_mb: 18
  max_label_name_length: 2096
  max_label_value_length: 2048
  max_label_names_per_series: 60
  enforce_metric_name: true
  max_entries_limit_per_query: 5000
  reject_old_samples: true
  reject_old_samples_max_age: 168h
  creation_grace_period: 20m0s
  max_global_streams_per_user: 5000
  unordered_writes: true
  max_chunks_per_query: 200000
  max_query_length: 721h
  max_query_parallelism: 64 
  max_query_series: 700
  cardinality_limit: 100000
  max_streams_matchers_per_query: 1000 
  max_concurrent_tail_requests: 10 
  ruler_evaluation_delay_duration: 3s 
  ruler_max_rules_per_rule_group: 0
  ruler_max_rule_groups_per_tenant: 0
  retention_period: 700h
  per_tenant_override_period: 20s 
  max_cache_freshness_per_query: 2m0s
  max_queriers_per_tenant: 0
  per_stream_rate_limit: 6MB
  per_stream_rate_limit_burst: 50MB
  max_query_lookback: 0
  ruler_remote_write_disabled: false
  min_sharding_lookback: 0s
  split_queries_by_interval: 10m0s
  max_line_size: 30mb
  max_line_size_truncate: false
  max_streams_per_user: 0

# memberlist_conig模塊配置gossip用于在分發(fā)服務(wù)器、攝取器和查詢器之間發(fā)現(xiàn)和連接。
# 所有三個組件的配置都是唯一的，以確保單個共享環(huán)。
# 至少定義了1個join_members配置后，將自動為分發(fā)服務(wù)器、攝取器和ring 配置memberlist類型的kvstore
memberlist:
  randomize_node_name: true
  stream_timeout: 5s 
  retransmit_factor: 4
  join_members:
  - 'loki-memberlist'
  abort_if_cluster_join_fails: true
  advertise_addr: 0.0.0.0
  advertise_port: 7946
  bind_addr: ["0.0.0.0"]
  bind_port: 7946
  compression_enabled: true
  dead_node_reclaim_time: 30s
  gossip_interval: 100ms
  gossip_nodes: 3
  gossip_to_dead_nodes_time: 3
  # join:
  leave_timeout: 15s
  left_ingesters_timeout: 3m0s 
  max_join_backoff: 1m0s
  max_join_retries: 5
  message_history_buffer_bytes: 4096
  min_join_backoff: 2s
  # node_name: miyamoto
  packet_dial_timeout: 5s
  packet_write_timeout: 5s 
  pull_push_interval: 100ms
  rejoin_interval: 10s
  tls_enabled: false
  tls_insecure_skip_verify: true
schema_config:
  configs:
  - from: "2020-10-24"
    index:
      period: 24h
      prefix: index_
    object_store: filesystem
    schema: v11
    store: boltdb-shipper
    chunks:
      period: 168h
    row_shards: 32
table_manager:
  retention_deletes_enabled: false
  retention_period: 0s
  throughput_updates_disabled: false
  poll_interval: 3m0s
  creation_grace_period: 20m
  index_tables_provisioning:
    provisioned_write_throughput: 1000
    provisioned_read_throughput: 500
    inactive_write_throughput: 4
    inactive_read_throughput: 300
    inactive_write_scale_lastn: 50 
    enable_inactive_throughput_on_demand_mode: true
    enable_ondemand_throughput_mode: true
    inactive_read_scale_lastn: 10 
    write_scale:
      enabled: true
      target: 80
      # role_arn:
      out_cooldown: 1800
      min_capacity: 3000
      max_capacity: 6000
      in_cooldown: 1800
    inactive_write_scale:
      enabled: true
      target: 80
      out_cooldown: 1800
      min_capacity: 3000
      max_capacity: 6000
      in_cooldown: 1800
    read_scale:
      enabled: true
      target: 80
      out_cooldown: 1800
      min_capacity: 3000
      max_capacity: 6000
      in_cooldown: 1800
    inactive_read_scale:
      enabled: true
      target: 80
      out_cooldown: 1800
      min_capacity: 3000
      max_capacity: 6000
      in_cooldown: 1800
  chunk_tables_provisioning:
    enable_inactive_throughput_on_demand_mode: true
    enable_ondemand_throughput_mode: true
    provisioned_write_throughput: 1000
    provisioned_read_throughput: 300
    inactive_write_throughput: 1
    inactive_write_scale_lastn: 50
    inactive_read_throughput: 300
    inactive_read_scale_lastn: 10
    write_scale:
      enabled: true
      target: 80
      out_cooldown: 1800
      min_capacity: 3000
      max_capacity: 6000
      in_cooldown: 1800
    inactive_write_scale:
      enabled: true
      target: 80
      out_cooldown: 1800
      min_capacity: 3000
      max_capacity: 6000
      in_cooldown: 1800
    read_scale:
      enabled: true
      target: 80
      out_cooldown: 1800
      min_capacity: 3000
      max_capacity: 6000
      in_cooldown: 1800
    inactive_read_scale:
      enabled: true
      target: 80
      out_cooldown: 1800
      min_capacity: 3000
      max_capacity: 6000
      in_cooldown: 1800
tracing:
  enabled: true

注意：

ingester.lifecycler.ring.replication_factor 的值在單實(shí)例的情況下為1
ingester.lifecycler.min_ready_duration的值為15s，在啟動后默認(rèn)會顯示15秒將狀態(tài)變?yōu)?code style="margin-right: 3px;margin-left: 3px;padding-right: 5px;padding-left: 5px;font-family: ui-monospace, SFMono-Regular, SF Mono, Menlo, Consolas, Liberation Mono, monospace, sans-serif;font-size: 12px;line-height: 1.8;display: inline-block;overflow-x: auto;vertical-align: middle;border-radius: 3px;background-color: rgb(251, 229, 225);color: rgb(192, 52, 29);border-width: initial !important;border-style: none !important;border-color: initial !important;">ready
memberlist.node_name的值可以不用設(shè)置，默認(rèn)是當(dāng)前主機(jī)的名稱
memberlist.join_members是一個列表，在有多個實(shí)例的情況下需要添加各個節(jié)點(diǎn)的主機(jī)名/IP地址。在k8s里面可以設(shè)置成一個service綁定到StatefulSets
query_range.results_cache.cache.enable_fifocache建議設(shè)置為false，我這里設(shè)置成了true
instance_interface_names是一個列表，默認(rèn)的為["en0","eth0"]，可以根據(jù)需要設(shè)置對應(yīng)的網(wǎng)卡名稱，一般不需要進(jìn)行特殊設(shè)置。

1.3.1.2 創(chuàng)建configmap

將上面的內(nèi)容寫入到一個文件——>loki-all.yaml，把它作為一個configmap寫入k8s集群?？梢允褂萌缦旅顒?chuàng)建：

kubectl create configmap --from-file ./loki-all.yaml loki-all

可以通過命令查看到已經(jīng)創(chuàng)建好的configmap，具體操作詳見下圖

1.3.1.3 創(chuàng)建持久化存儲

在k8s里面我們的數(shù)據(jù)是需要進(jìn)行持久化的。Loki收集起來的日志信息對于業(yè)務(wù)來說是至關(guān)重要的，因此需要在容器重啟的時候日志能夠保留下來。
那么就需要用到pv、pvc，后端存儲可以使用nfs、glusterfs、hostPath、azureDisk、cephfs等20種支持類型，這里因?yàn)闆]有對應(yīng)的環(huán)境就采用了hostPath方式。

apiVersion: v1
kind: PersistentVolume
metadata:
  name: loki
  namespace: default
spec:
  hostPath:
    path: /glusterfs/loki
    type: DirectoryOrCreate
  capacity:
    storage: 1Gi
  accessModes:
    - ReadWriteMany
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: loki
  namespace: default
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 1Gi
  volumeName: loki

1.3.1.4 創(chuàng)建應(yīng)用

準(zhǔn)備好k8s的StatefulSet部署文件后就可以直接在集群里面創(chuàng)建應(yīng)用了。

apiVersion: apps/v1
kind: StatefulSet
metadata:
  labels:
    app: loki
  name: loki
  namespace: default
spec:
  podManagementPolicy: OrderedReady
  replicas: 1
  selector:
    matchLabels:
      app: loki
  template:
    metadata:
      annotations:
        prometheus.io/port: http-metrics
        prometheus.io/scrape: "true"
      labels:
        app: loki
    spec:
      containers:
      - args:
        - -config.file=/etc/loki/loki-all.yaml
        image: grafana/loki:2.5.0
        imagePullPolicy: IfNotPresent
        livenessProbe:
          failureThreshold: 3
          httpGet:
            path: /ready
            port: http-metrics
            scheme: HTTP
          initialDelaySeconds: 45
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1
        name: loki
        ports:
        - containerPort: 3100
          name: http-metrics
          protocol: TCP
        - containerPort: 9095
          name: grpc
          protocol: TCP
        - containerPort: 7946
          name: memberlist-port
          protocol: TCP
        readinessProbe:
          failureThreshold: 3
          httpGet:
            path: /ready
            port: http-metrics
            scheme: HTTP
          initialDelaySeconds: 45
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1
        resources:
          requests:
            cpu: 500m
            memory: 500Mi
          limits:
            cpu: 500m
            memory: 500Mi
        securityContext:
          readOnlyRootFilesystem: true
        volumeMounts:
        - mountPath: /etc/loki
          name: config
        - mountPath: /data
          name: storage
      restartPolicy: Always
      securityContext:
        fsGroup: 10001
        runAsGroup: 10001
        runAsNonRoot: true
        runAsUser: 10001
      serviceAccount: loki
      serviceAccountName: loki
      volumes:
      - emptyDir: {}
        name: tmp
      - name: config
        configMap:
          name: loki
      - persistentVolumeClaim:
          claimName: loki
        name: storage
---
kind: Service
apiVersion: v1
metadata:
  name: loki-memberlist
  namespace: default
spec:
  ports:
    - name: loki-memberlist
      protocol: TCP
      port: 7946
      targetPort: 7946
  selector:
    kubepi.org/name: loki
---
kind: Service
apiVersion: v1
metadata:
  name: loki
  namespace: default
spec:
  ports:
    - name: loki
      protocol: TCP
      port: 3100
      targetPort: 3100
  selector:
    kubepi.org/name: loki

在上面的配置文件中我添加了一些pod級別的安全策略，這些安全策略還有集群級別的PodSecurityPolicy，防止因?yàn)槁┒吹脑蛟斐杉旱恼麄€崩潰

1.3.1.5 驗(yàn)證部署結(jié)果

當(dāng)看到上面的Running狀態(tài)時可以通過API的方式看一下分發(fā)器是不是正常工作，當(dāng)顯示Active時正常才會正常分發(fā)日志流到收集器（ingester）

1.3.2 裸機(jī)部署

將loki放到系統(tǒng)的/bin/目錄下，準(zhǔn)備grafana-loki.service控制文件重載系統(tǒng)服務(wù)列表

[Unit]
Description=Grafana Loki Log Ingester
Documentation=https://grafana.com/logs/
After=network-online.target

[Service]
ExecStart=/bin/loki --config.file /etc/loki/loki-all.yaml
ExecReload=/bin/kill -s HUP $MAINPID
ExecStop=/bin/kill -s TERM $MAINPID

[Install]
WantedBy=multi-user.target

重載系統(tǒng)列表命令，可以直接系統(tǒng)自動管理服務(wù)：

systemctl daemon-reload
# 啟動服務(wù)
systemctl start grafana-loki
# 停止服務(wù)
systemctl stop grafana-loki
# 重載應(yīng)用
systemctl reload grafana-loki

1.4 Promtail部署

部署客戶端收集日志時也需要創(chuàng)建一個配置文件，按照上面創(chuàng)建服務(wù)端的步驟創(chuàng)建。不同的是需要把日志內(nèi)容push到服務(wù)端

1.4.1 k8s部署

1.4.1.1 創(chuàng)建配置文件

server:
  log_level: info
  http_listen_port: 3101
clients:
  - url: http://loki:3100/loki/api/v1/push
positions:
  filename: /run/promtail/positions.yaml
scrape_configs:
  - job_name: kubernetes-pods
    pipeline_stages:
      - cri: {}
    kubernetes_sd_configs:
      - role: pod
    relabel_configs:
      - source_labels:
          - __meta_kubernetes_pod_controller_name
        regex: ([0-9a-z-.]+?)(-[0-9a-f]{8,10})?
        action: replace
        target_label: __tmp_controller_name
      - source_labels:
          - __meta_kubernetes_pod_label_app_kubernetes_io_name
          - __meta_kubernetes_pod_label_app
          - __tmp_controller_name
          - __meta_kubernetes_pod_name
        regex: ^;*([^;]+)(;.*)?$
        action: replace
        target_label: app
      - source_labels:
          - __meta_kubernetes_pod_label_app_kubernetes_io_instance
          - __meta_kubernetes_pod_label_release
        regex: ^;*([^;]+)(;.*)?$
        action: replace
        target_label: instance
      - source_labels:
          - __meta_kubernetes_pod_label_app_kubernetes_io_component
          - __meta_kubernetes_pod_label_component
        regex: ^;*([^;]+)(;.*)?$
        action: replace
        target_label: component
      - action: replace
        source_labels:
        - __meta_kubernetes_pod_node_name
        target_label: node_name
      - action: replace
        source_labels:
        - __meta_kubernetes_namespace
        target_label: namespace
      - action: replace
        replacement: $1
        separator: /
        source_labels:
        - namespace
        - app
        target_label: job
      - action: replace
        source_labels:
        - __meta_kubernetes_pod_name
        target_label: pod
      - action: replace
        source_labels:
        - __meta_kubernetes_pod_container_name
        target_label: container
      - action: replace
        replacement: /var/log/pods/*$1/*.log
        separator: /
        source_labels:
        - __meta_kubernetes_pod_uid
        - __meta_kubernetes_pod_container_name
        target_label: __path__
      - action: replace
        regex: true/(.*)
        replacement: /var/log/pods/*$1/*.log
        separator: /
        source_labels:
        - __meta_kubernetes_pod_annotationpresent_kubernetes_io_config_hash
        - __meta_kubernetes_pod_annotation_kubernetes_io_config_hash
        - __meta_kubernetes_pod_container_name
        target_label: __path__

用上面的內(nèi)容創(chuàng)建一個configMap，方法同上

1.4.1.2 創(chuàng)建DaemonSet文件

Promtail是一個無狀態(tài)應(yīng)用不需要進(jìn)行持久化存儲只需要部署到集群里面就可以了，還是同樣的準(zhǔn)備DaemonSets創(chuàng)建文件。

kind: DaemonSet
apiVersion: apps/v1
metadata:
  name: promtail
  namespace: default
  labels:
    app.kubernetes.io/instance: promtail
    app.kubernetes.io/name: promtail
    app.kubernetes.io/version: 2.5.0
spec:
  selector:
    matchLabels:
      app.kubernetes.io/instance: promtail
      app.kubernetes.io/name: promtail
  template:
    metadata:
      labels:
        app.kubernetes.io/instance: promtail
        app.kubernetes.io/name: promtail
    spec:
      volumes:
        - name: config
          configMap:
            name: promtail
        - name: run
          hostPath:
            path: /run/promtail
        - name: containers
          hostPath:
            path: /var/lib/docker/containers
        - name: pods
          hostPath:
            path: /var/log/pods
      containers:
        - name: promtail
          image: docker.io/grafana/promtail:2.3.0
          args:
            - '-config.file=/etc/promtail/promtail.yaml'
          ports:
            - name: http-metrics
              containerPort: 3101
              protocol: TCP
          env:
            - name: HOSTNAME
              valueFrom:
                fieldRef:
                  apiVersion: v1
                  fieldPath: spec.nodeName
          volumeMounts:
            - name: config
              mountPath: /etc/promtail
            - name: run
              mountPath: /run/promtail
            - name: containers
              readOnly: true
              mountPath: /var/lib/docker/containers
            - name: pods
              readOnly: true
              mountPath: /var/log/pods
          readinessProbe:
            httpGet:
              path: /ready
              port: http-metrics
              scheme: HTTP
            initialDelaySeconds: 10
            timeoutSeconds: 1
            periodSeconds: 10
            successThreshold: 1
            failureThreshold: 5
          imagePullPolicy: IfNotPresent
          securityContext:
            capabilities:
              drop:
                - ALL
            readOnlyRootFilesystem: false
            allowPrivilegeEscalation: false
      restartPolicy: Always
      serviceAccountName: promtail
      serviceAccount: promtail
      tolerations:
        - key: node-role.kubernetes.io/master
          operator: Exists
          effect: NoSchedule
        - key: node-role.kubernetes.io/control-plane
          operator: Exists
          effect: NoSchedule

1.4.1.3 創(chuàng)建promtail應(yīng)用

kubectl apply -f promtail.yaml

使用上面這個命令創(chuàng)建后就可以看到服務(wù)已經(jīng)創(chuàng)建好了。接下來就是在Grafana里面添加DataSource查看數(shù)據(jù)了。

1.4.2 裸機(jī)部署

如果是裸機(jī)部署的情況下，需要對上面的配置文件做一下稍微的改動，更改clients的地址就可以，文件存放到/etc/loki/下，例如改成：

clients:
  - url: http://ipaddress:port/loki/api/v1/push

添加系統(tǒng)開機(jī)啟動配置，service配置文件存放位置/usr/lib/systemd/system/loki-promtail.service內(nèi)容如下

[Unit]
Description=Grafana Loki Log Ingester
Documentation=https://grafana.com/logs/
After=network-online.target

[Service]
ExecStart=/bin/promtail --config.file /etc/loki/loki-promtail.yaml
ExecReload=/bin/kill -s HUP $MAINPID
ExecStop=/bin/kill -s TERM $MAINPID

[Install]
WantedBy=multi-user.target

啟動方式同上面服務(wù)端部署內(nèi)容

1.5 數(shù)據(jù)源

添加數(shù)據(jù)源，具體步驟: Grafana->Setting->DataSources->AddDataSource->Loki

注意：http的URL地址，應(yīng)用、服務(wù)部署在哪個namespace下，就需要指定它的FQDN地址，它的格式是ServiceName.namespace。如果默認(rèn)在default下、創(chuàng)建的端口號是3100，就需要填寫為http://loki:3100，這里為什么不寫IP地址而寫成服務(wù)的名字，是因?yàn)樵趉8s集群里面有個dns服務(wù)器會自動解析這個地址。

查找日志信息

1.6 其他客戶端配置

1.6.1 Logstash作為日志收集客戶端

在啟動Logstash后我們需要安裝一個插件，可以通過這個命令安裝loki的輸出插件，安裝完成之后可以在logstash的output中添加信息。

bin/logstash-plugin install logstash-output-loki

添加配置進(jìn)行測試
完整的logstash配置信息，可以參考官網(wǎng)給出的內(nèi)容LogstashConfigFile

output {
  loki {
    [url => "" | default = none | required=true]
    [tenant_id => string | default = nil | required=false]
    [message_field => string | default = "message" | required=false]
    [include_fields => array | default = [] | required=false]
    [batch_wait => number | default = 1(s) | required=false]
    [batch_size => number | default = 102400(bytes) | required=false]
    [min_delay => number | default = 1(s) | required=false]
    [max_delay => number | default = 300(s) | required=false]
    [retries => number | default = 10 | required=false]
    [username => string | default = nil | required=false]
    [password => secret | default = nil | required=false]
    [cert => path | default = nil | required=false]
    [key => path | default = nil| required=false]
    [ca_cert => path | default = nil | required=false]
    [insecure_skip_verify => boolean | default = false | required=false]
  }
}

或者采用logstash的http輸出模塊，配置如下：

output {
    http {
        format => "json"
        http_method => "post"
        content_type => "application/json"
        connect_timeout => 10
        url => "http://loki:3100/loki/api/v1/push"
        message => '"message":"%{message}"}'
    }
}

1.7 Helm安裝

如果想簡便安裝的話，可以采用helm來安裝。helm將所有的安裝步驟都進(jìn)行了封裝，簡化了安裝步驟。

對于想詳細(xì)了解k8s的人來說，helm不太適合。因?yàn)樗庋b后自動執(zhí)行，k8s管理員不知道各組件之間是如何依賴的，可能會造成誤區(qū)。

廢話不多說，下面開始helm安裝：

添加repo源
helm repo add grafana https://grafana.github.io/helm-charts
更新源
helm repo update
部署
默認(rèn)配置
helm upgrade --install loki grafana/loki-simple-scalable
自定義namespace
helm upgrade --install loki --namespace=loki grafana/loki-simple-scalable
自定義配置信息
helm upgrade --install loki grafana/loki-simple-scalable --set "key1=val1,key2=val2,..."

1.8 故障解決方案

1.8.1 502 BadGateWay

loki的地址填寫不正確
在k8s里面，地址填寫錯誤造成了502。檢查一下loki的地址是否是以下內(nèi)容：

http://LokiServiceName
http://LokiServiceName.namespace
http://LokiServiceName.namespace:ServicePort

grafana和loki在不同的節(jié)點(diǎn)上，檢查一下節(jié)點(diǎn)間網(wǎng)絡(luò)通信狀態(tài)、防火墻策略

1.8.2 Ingester not ready: instance xx:9095 in state JOINING

耐心等待一會，因?yàn)槭?code style="margin-right: 3px;margin-left: 3px;padding-right: 5px;padding-left: 5px;font-family: ui-monospace, SFMono-Regular, SF Mono, Menlo, Consolas, Liberation Mono, monospace, sans-serif;font-size: 12px;line-height: 1.8;display: inline-block;overflow-x: auto;vertical-align: middle;border-radius: 3px;background-color: rgb(251, 229, 225);color: rgb(192, 52, 29);border-width: initial !important;border-style: none !important;border-color: initial !important;">allInOne模式程序啟動需要一定的時間。

1.8.3 too many unhealthy instances in the ring

將ingester.lifecycler.replication_factor改為1，是因?yàn)檫@個設(shè)置不正確造成的。這個在啟動的時候會設(shè)置為多個復(fù)制源，但當(dāng)前只部署了一個所以在查看label的時候提示這個

1.8.4 Data source connected

Data source connected, but no labels received. Verify that Loki and Promtail is configured properly

promtail無法將收集到的日志發(fā)送給loki，許可檢查一下promtail的輸出是不是正常
promtail在loki還沒有準(zhǔn)備就緒的時候把日志發(fā)送過來了，但loki沒有接收到。如果需要重新接收日志，需要刪除positions.yaml文件，具體路徑可以用find查找一下位置
promtail忽略了目標(biāo)日志文件或者配置文件錯誤造成的無法正常啟動
promtail無法在指定的位置發(fā)現(xiàn)日志文件

鏈接：https://www.cnblogs.com/jingzh/p/17998082

（版權(quán)歸原作者所有，侵刪）

深入解析日志管理神器Loki：從基礎(chǔ)到實(shí)戰(zhàn)應(yīng)用