深入解析日志管理神器Loki:從基礎(chǔ)到實(shí)戰(zhàn)應(yīng)用
共 27041字,需瀏覽 55分鐘
·
2024-07-23 16:41
目錄
1 Loki
1.8.1 502 BadGateWay
1.8.2 Ingester not ready: instance xx:9095 in state JOINING
1.8.3 too many unhealthy instances in the ring
1.8.4 Data source connected
1.6.1 Logstash作為日志收集客戶端
1.4.1 k8s部署
1.4.2 裸機(jī)部署
1.4.1.1 創(chuàng)建配置文件
1.4.1.2 創(chuàng)建DaemonSet文件
1.4.1.3 創(chuàng)建promtail應(yīng)用
1.3.1 AllInOne部署模式
1.3.2 裸機(jī)部署
1.3.1.1 k8s部署
1.3.1.2 創(chuàng)建configmap
1.3.1.3 創(chuàng)建持久化存儲
1.3.1.4 創(chuàng)建應(yīng)用
1.3.1.5 驗(yàn)證部署結(jié)果
1.2.1 日志解析格式
1.2.2 日志搜集架構(gòu)模式
1.2.3 Loki部署模式
1.1 引言
1.2 Loki工作方式
1.3 服務(wù)端部署
1.4 Promtail部署
1.5 數(shù)據(jù)源
1.6 其他客戶端配置
1.7 Helm安裝
1.8 故障解決方案
1 Loki
1.1 引言
Loki 是一個輕量級的日志收集、分析的應(yīng)用,采用的是promtail的方式來獲取日志內(nèi)容并送到loki里面進(jìn)行存儲,最終在grafana的datasource里面添加數(shù)據(jù)源進(jìn)行日志的展示、查詢。
官方文檔:https://kubernetes.io/docs/concepts/security/pod-security-policy
loki的持久化存儲支持azure、gcs、s3、swift、local這5中類型,其中常用的是s3、local。另外,它還支持很多種日志搜集類型,像最常用的logstash、fluentbit也在官方支持的列表中。
優(yōu)點(diǎn):
支持的客戶端,如
Promtail,F(xiàn)luentbit,F(xiàn)luentd,Vector,Logstash和Grafana Agent首選代理
Promtail,可以多來源提取日志,包括本地日志文件,systemd,Windows事件日志,Docker日志記錄驅(qū)動程序等沒有日志格式要求,包括
JSON,XML,CSV,logfmt,非結(jié)構(gòu)化文本使用與查詢指標(biāo)相同的語法查詢?nèi)罩?/p>
日志查詢時允許動態(tài)篩選和轉(zhuǎn)換日志行
可以輕松地計(jì)算日志中的需要的指標(biāo)
引入時的最小索引意味著您可以在查詢時動態(tài)地對日志進(jìn)行切片和切塊,以便在出現(xiàn)新問題時回答它們
云原生支持,使用
Prometheus形式抓取數(shù)據(jù)
各日志收集組件簡單對比
| 名稱 | 安裝的組件 | 優(yōu)點(diǎn) |
|---|---|---|
| ELK/EFK | elasticsearch、logstash, kibana、filebeat、kafka/redis | 支持自定義grok正則解析復(fù)雜日志內(nèi)容;dashboard支持主富的可視化展示 |
| Loki | grafana、loki、promtail | 占用資源??;grafana原生支持;查詢速度快 |
1.2 Loki工作方式
1.2.1 日志解析格式
從上面的圖中我們可以看到,它在解析日志的時候是以index為主的,index包括時間戳和pod的部分label(其他label為filename、containers等),其余的是日志內(nèi)容。具體查詢效果如下:{app="loki",namespace="kube-public"}為索引
1.2.2 日志搜集架構(gòu)模式
在使用過程中,官方推薦使用promtail做為agent以DaemonSet方式部署在kubernetes的worker節(jié)點(diǎn)上搜集日志。另外也可以用上面提到的其他日志收集工具來收取,這篇文章在結(jié)尾處會附上其他工具的配置方式。
1.2.3 Loki部署模式
Loki由許多組件微服務(wù)構(gòu)建而成,微服務(wù)組件有5個。在這5個里面添加緩存用來把數(shù)據(jù)放起來加快查詢。數(shù)據(jù)放在共享存儲里面配置memberlist_config部分并在實(shí)例之間共享狀態(tài),將Loki進(jìn)行無限橫向擴(kuò)展。
在配置完memberlist_config部分后采用輪詢的方式查找數(shù)據(jù)。為了使用方便官方把所有的微服務(wù)編譯成一個二進(jìn)制,可以通過命令行參數(shù)-target控制,支持all、read、write,我們在部署時根據(jù)日志量的大小可以指定不同模式
all(讀寫模式)
服務(wù)啟動后,我們做的數(shù)據(jù)查詢、數(shù)據(jù)寫入都是來自這一個節(jié)點(diǎn)read/write(讀寫分離模式)
在讀寫分離模式下運(yùn)行時fronted-query查詢會將流量轉(zhuǎn)發(fā)到read節(jié)點(diǎn)上。讀節(jié)點(diǎn)上保留了querier、ruler、fronted,寫節(jié)點(diǎn)上保留了distributor、ingester微服務(wù)模式運(yùn)行
微服務(wù)模式運(yùn)行下,通過不同的配置參數(shù)啟動為不同的角色,每一個進(jìn)程都引用它的目標(biāo)角色服務(wù)。
| 組件名稱 | 功能 |
|---|---|
| 分發(fā)器/調(diào)度器(distributor) | 驗(yàn)證數(shù)據(jù)合規(guī):數(shù)據(jù)排序; hash一致性, QPS限制, 轉(zhuǎn)發(fā),數(shù)據(jù)副本保證不丟失 |
| 收集器(ingester) | 時間戳排序: 文件系統(tǒng)支持: WAL預(yù)寫 |
| 查詢前端 (query frontend) | 提供頁面操作,向后端存儲發(fā)出數(shù)據(jù)查詢;查詢隊(duì)列 (query-queueing) 能夠防止大數(shù)據(jù)量查詢時觸發(fā)0OM;查詢分割 (query-split) 可以分割大批量查詢最后進(jìn)行數(shù)據(jù)聚臺 |
| 查詢器Querier | 使用loggl語言在后端存儲中查詢?nèi)罩?/td> |
| 緩存 | 將查詢到的日志緩存起來共后續(xù)使用,如果數(shù)據(jù)不完整重新查詢?nèi)笔У臄?shù)據(jù) |
1.3 服務(wù)端部署
在部署之前需要準(zhǔn)備好一個k8s集群才行哦
| 應(yīng)用 | 鏡像 |
|---|---|
| loki | grafana/loki:2.5.0 |
| promtail | grafana/promtail:2.5.0 |
1.3.1 AllInOne部署模式
1.3.1.1 k8s部署
我們從github上下載的程序是沒有配置文件的,需要提前將文件準(zhǔn)備一份。這里提供了一份完整的allInOne配置文件,部分內(nèi)容進(jìn)行了優(yōu)化。
配置文件內(nèi)容如下所示
auth_enabled: false
target: all
ballast_bytes: 20480
server:
grpc_listen_port: 9095
http_listen_port: 3100
graceful_shutdown_timeout: 20s
grpc_listen_address: "0.0.0.0"
grpc_listen_network: "tcp"
grpc_server_max_concurrent_streams: 100
grpc_server_max_recv_msg_size: 4194304
grpc_server_max_send_msg_size: 4194304
http_server_idle_timeout: 2m
http_listen_address: "0.0.0.0"
http_listen_network: "tcp"
http_server_read_timeout: 30s
http_server_write_timeout: 20s
log_source_ips_enabled: true
# http_path_prefix如果需要更改,在推送日志的時候前綴都需要加指定的內(nèi)容
# http_path_prefix: "/"
register_instrumentation: true
log_format: json
log_level: info
distributor:
ring:
heartbeat_timeout: 3s
kvstore:
prefix: collectors/
store: memberlist
# 需要提前創(chuàng)建好consul集群
# consul:
# http_client_timeout: 20s
# consistent_reads: true
# host: 127.0.0.1:8500
# watch_burst_size: 2
# watch_rate_limit: 2
querier:
engine:
max_look_back_period: 20s
timeout: 3m0s
extra_query_delay: 100ms
max_concurrent: 10
multi_tenant_queries_enabled: true
query_ingester_only: false
query_ingesters_within: 3h0m0s
query_store_only: false
query_timeout: 5m0s
tail_max_duration: 1h0s
query_scheduler:
max_outstanding_requests_per_tenant: 2048
grpc_client_config:
max_recv_msg_size: 104857600
max_send_msg_size: 16777216
grpc_compression: gzip
rate_limit: 0
rate_limit_burst: 0
backoff_on_ratelimits: false
backoff_config:
min_period: 50ms
max_period: 15s
max_retries: 5
use_scheduler_ring: true
scheduler_ring:
kvstore:
store: memberlist
prefix: "collectors/"
heartbeat_period: 30s
heartbeat_timeout: 1m0s
# 默認(rèn)第一個網(wǎng)卡的名稱
# instance_interface_names
# instance_addr: 127.0.0.1
# 默認(rèn)server.grpc-listen-port
instance_port: 9095
frontend:
max_outstanding_per_tenant: 4096
querier_forget_delay: 1h0s
compress_responses: true
log_queries_longer_than: 2m0s
max_body_size: 104857600
query_stats_enabled: true
scheduler_dns_lookup_period: 10s
scheduler_worker_concurrency: 15
query_range:
align_queries_with_step: true
cache_results: true
parallelise_shardable_queries: true
max_retries: 3
results_cache:
cache:
enable_fifocache: false
default_validity: 30s
background:
writeback_buffer: 10000
redis:
endpoint: 127.0.0.1:6379
timeout: 1s
expiration: 0s
db: 9
pool_size: 128
password: 1521Qyx6^
tls_enabled: false
tls_insecure_skip_verify: true
idle_timeout: 10s
max_connection_age: 8h
ruler:
enable_api: true
enable_sharding: true
alertmanager_refresh_interval: 1m
disable_rule_group_label: false
evaluation_interval: 1m0s
flush_period: 3m0s
for_grace_period: 20m0s
for_outage_tolerance: 1h0s
notification_queue_capacity: 10000
notification_timeout: 4s
poll_interval: 10m0s
query_stats_enabled: true
remote_write:
config_refresh_period: 10s
enabled: false
resend_delay: 2m0s
rule_path: /rulers
search_pending_for: 5m0s
storage:
local:
directory: /data/loki/rulers
type: configdb
sharding_strategy: default
wal_cleaner:
period: 240h
min_age: 12h0m0s
wal:
dir: /data/loki/ruler_wal
max_age: 4h0m0s
min_age: 5m0s
truncate_frequency: 1h0m0s
ring:
kvstore:
store: memberlist
prefix: "collectors/"
heartbeat_period: 5s
heartbeat_timeout: 1m0s
# instance_addr: "127.0.0.1"
# instance_id: "miyamoto.en0"
# instance_interface_names: ["en0","lo0"]
instance_port: 9500
num_tokens: 100
ingester_client:
pool_config:
health_check_ingesters: false
client_cleanup_period: 10s
remote_timeout: 3s
remote_timeout: 5s
ingester:
autoforget_unhealthy: true
chunk_encoding: gzip
chunk_target_size: 1572864
max_transfer_retries: 0
sync_min_utilization: 3.5
sync_period: 20s
flush_check_period: 30s
flush_op_timeout: 10m0s
chunk_retain_period: 1m30s
chunk_block_size: 262144
chunk_idle_period: 1h0s
max_returned_stream_errors: 20
concurrent_flushes: 3
index_shards: 32
max_chunk_age: 2h0m0s
query_store_max_look_back_period: 3h30m30s
wal:
enabled: true
dir: /data/loki/wal
flush_on_shutdown: true
checkpoint_duration: 15m
replay_memory_ceiling: 2GB
lifecycler:
ring:
kvstore:
store: memberlist
prefix: "collectors/"
heartbeat_timeout: 30s
replication_factor: 1
num_tokens: 128
heartbeat_period: 5s
join_after: 5s
observe_period: 1m0s
# interface_names: ["en0","lo0"]
final_sleep: 10s
min_ready_duration: 15s
storage_config:
boltdb:
directory: /data/loki/boltdb
boltdb_shipper:
active_index_directory: /data/loki/active_index
build_per_tenant_index: true
cache_location: /data/loki/cache
cache_ttl: 48h
resync_interval: 5m
query_ready_num_days: 5
index_gateway_client:
grpc_client_config:
filesystem:
directory: /data/loki/chunks
chunk_store_config:
chunk_cache_config:
enable_fifocache: true
default_validity: 30s
background:
writeback_buffer: 10000
redis:
endpoint: 192.168.3.56:6379
timeout: 1s
expiration: 0s
db: 8
pool_size: 128
password: 1521Qyx6^
tls_enabled: false
tls_insecure_skip_verify: true
idle_timeout: 10s
max_connection_age: 8h
fifocache:
ttl: 1h
validity: 30m0s
max_size_items: 2000
max_size_bytes: 500MB
write_dedupe_cache_config:
enable_fifocache: true
default_validity: 30s
background:
writeback_buffer: 10000
redis:
endpoint: 127.0.0.1:6379
timeout: 1s
expiration: 0s
db: 7
pool_size: 128
password: 1521Qyx6^
tls_enabled: false
tls_insecure_skip_verify: true
idle_timeout: 10s
max_connection_age: 8h
fifocache:
ttl: 1h
validity: 30m0s
max_size_items: 2000
max_size_bytes: 500MB
cache_lookups_older_than: 10s
# 壓縮碎片索引
compactor:
shared_store: filesystem
shared_store_key_prefix: index/
working_directory: /data/loki/compactor
compaction_interval: 10m0s
retention_enabled: true
retention_delete_delay: 2h0m0s
retention_delete_worker_count: 150
delete_request_cancel_period: 24h0m0s
max_compaction_parallelism: 2
# compactor_ring:
frontend_worker:
match_max_concurrent: true
parallelism: 10
dns_lookup_duration: 5s
# runtime_config 這里沒有配置任何信息
# runtime_config:
common:
storage:
filesystem:
chunks_directory: /data/loki/chunks
fules_directory: /data/loki/rulers
replication_factor: 3
persist_tokens: false
# instance_interface_names: ["en0","eth0","ens33"]
analytics:
reporting_enabled: false
limits_config:
ingestion_rate_strategy: global
ingestion_rate_mb: 100
ingestion_burst_size_mb: 18
max_label_name_length: 2096
max_label_value_length: 2048
max_label_names_per_series: 60
enforce_metric_name: true
max_entries_limit_per_query: 5000
reject_old_samples: true
reject_old_samples_max_age: 168h
creation_grace_period: 20m0s
max_global_streams_per_user: 5000
unordered_writes: true
max_chunks_per_query: 200000
max_query_length: 721h
max_query_parallelism: 64
max_query_series: 700
cardinality_limit: 100000
max_streams_matchers_per_query: 1000
max_concurrent_tail_requests: 10
ruler_evaluation_delay_duration: 3s
ruler_max_rules_per_rule_group: 0
ruler_max_rule_groups_per_tenant: 0
retention_period: 700h
per_tenant_override_period: 20s
max_cache_freshness_per_query: 2m0s
max_queriers_per_tenant: 0
per_stream_rate_limit: 6MB
per_stream_rate_limit_burst: 50MB
max_query_lookback: 0
ruler_remote_write_disabled: false
min_sharding_lookback: 0s
split_queries_by_interval: 10m0s
max_line_size: 30mb
max_line_size_truncate: false
max_streams_per_user: 0
# memberlist_conig模塊配置gossip用于在分發(fā)服務(wù)器、攝取器和查詢器之間發(fā)現(xiàn)和連接。
# 所有三個組件的配置都是唯一的,以確保單個共享環(huán)。
# 至少定義了1個join_members配置后,將自動為分發(fā)服務(wù)器、攝取器和ring 配置memberlist類型的kvstore
memberlist:
randomize_node_name: true
stream_timeout: 5s
retransmit_factor: 4
join_members:
- 'loki-memberlist'
abort_if_cluster_join_fails: true
advertise_addr: 0.0.0.0
advertise_port: 7946
bind_addr: ["0.0.0.0"]
bind_port: 7946
compression_enabled: true
dead_node_reclaim_time: 30s
gossip_interval: 100ms
gossip_nodes: 3
gossip_to_dead_nodes_time: 3
# join:
leave_timeout: 15s
left_ingesters_timeout: 3m0s
max_join_backoff: 1m0s
max_join_retries: 5
message_history_buffer_bytes: 4096
min_join_backoff: 2s
# node_name: miyamoto
packet_dial_timeout: 5s
packet_write_timeout: 5s
pull_push_interval: 100ms
rejoin_interval: 10s
tls_enabled: false
tls_insecure_skip_verify: true
schema_config:
configs:
- from: "2020-10-24"
index:
period: 24h
prefix: index_
object_store: filesystem
schema: v11
store: boltdb-shipper
chunks:
period: 168h
row_shards: 32
table_manager:
retention_deletes_enabled: false
retention_period: 0s
throughput_updates_disabled: false
poll_interval: 3m0s
creation_grace_period: 20m
index_tables_provisioning:
provisioned_write_throughput: 1000
provisioned_read_throughput: 500
inactive_write_throughput: 4
inactive_read_throughput: 300
inactive_write_scale_lastn: 50
enable_inactive_throughput_on_demand_mode: true
enable_ondemand_throughput_mode: true
inactive_read_scale_lastn: 10
write_scale:
enabled: true
target: 80
# role_arn:
out_cooldown: 1800
min_capacity: 3000
max_capacity: 6000
in_cooldown: 1800
inactive_write_scale:
enabled: true
target: 80
out_cooldown: 1800
min_capacity: 3000
max_capacity: 6000
in_cooldown: 1800
read_scale:
enabled: true
target: 80
out_cooldown: 1800
min_capacity: 3000
max_capacity: 6000
in_cooldown: 1800
inactive_read_scale:
enabled: true
target: 80
out_cooldown: 1800
min_capacity: 3000
max_capacity: 6000
in_cooldown: 1800
chunk_tables_provisioning:
enable_inactive_throughput_on_demand_mode: true
enable_ondemand_throughput_mode: true
provisioned_write_throughput: 1000
provisioned_read_throughput: 300
inactive_write_throughput: 1
inactive_write_scale_lastn: 50
inactive_read_throughput: 300
inactive_read_scale_lastn: 10
write_scale:
enabled: true
target: 80
out_cooldown: 1800
min_capacity: 3000
max_capacity: 6000
in_cooldown: 1800
inactive_write_scale:
enabled: true
target: 80
out_cooldown: 1800
min_capacity: 3000
max_capacity: 6000
in_cooldown: 1800
read_scale:
enabled: true
target: 80
out_cooldown: 1800
min_capacity: 3000
max_capacity: 6000
in_cooldown: 1800
inactive_read_scale:
enabled: true
target: 80
out_cooldown: 1800
min_capacity: 3000
max_capacity: 6000
in_cooldown: 1800
tracing:
enabled: true
注意:
ingester.lifecycler.ring.replication_factor的值在單實(shí)例的情況下為1ingester.lifecycler.min_ready_duration的值為15s,在啟動后默認(rèn)會顯示15秒將狀態(tài)變?yōu)?code style="margin-right: 3px;margin-left: 3px;padding-right: 5px;padding-left: 5px;font-family: ui-monospace, SFMono-Regular, SF Mono, Menlo, Consolas, Liberation Mono, monospace, sans-serif;font-size: 12px;line-height: 1.8;display: inline-block;overflow-x: auto;vertical-align: middle;border-radius: 3px;background-color: rgb(251, 229, 225);color: rgb(192, 52, 29);border-width: initial !important;border-style: none !important;border-color: initial !important;">readymemberlist.node_name的值可以不用設(shè)置,默認(rèn)是當(dāng)前主機(jī)的名稱memberlist.join_members是一個列表,在有多個實(shí)例的情況下需要添加各個節(jié)點(diǎn)的主機(jī)名/IP地址。在k8s里面可以設(shè)置成一個service綁定到StatefulSetsquery_range.results_cache.cache.enable_fifocache建議設(shè)置為false,我這里設(shè)置成了trueinstance_interface_names是一個列表,默認(rèn)的為["en0","eth0"],可以根據(jù)需要設(shè)置對應(yīng)的網(wǎng)卡名稱,一般不需要進(jìn)行特殊設(shè)置。
1.3.1.2 創(chuàng)建configmap
將上面的內(nèi)容寫入到一個文件——>loki-all.yaml,把它作為一個configmap寫入k8s集群??梢允褂萌缦旅顒?chuàng)建:
kubectl create configmap --from-file ./loki-all.yaml loki-all
可以通過命令查看到已經(jīng)創(chuàng)建好的configmap,具體操作詳見下圖
1.3.1.3 創(chuàng)建持久化存儲
在k8s里面我們的數(shù)據(jù)是需要進(jìn)行持久化的。Loki收集起來的日志信息對于業(yè)務(wù)來說是至關(guān)重要的,因此需要在容器重啟的時候日志能夠保留下來。
那么就需要用到pv、pvc,后端存儲可以使用nfs、glusterfs、hostPath、azureDisk、cephfs等20種支持類型,這里因?yàn)闆]有對應(yīng)的環(huán)境就采用了hostPath方式。
apiVersion: v1
kind: PersistentVolume
metadata:
name: loki
namespace: default
spec:
hostPath:
path: /glusterfs/loki
type: DirectoryOrCreate
capacity:
storage: 1Gi
accessModes:
- ReadWriteMany
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: loki
namespace: default
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 1Gi
volumeName: loki
1.3.1.4 創(chuàng)建應(yīng)用
準(zhǔn)備好k8s的StatefulSet部署文件后就可以直接在集群里面創(chuàng)建應(yīng)用了。
apiVersion: apps/v1
kind: StatefulSet
metadata:
labels:
app: loki
name: loki
namespace: default
spec:
podManagementPolicy: OrderedReady
replicas: 1
selector:
matchLabels:
app: loki
template:
metadata:
annotations:
prometheus.io/port: http-metrics
prometheus.io/scrape: "true"
labels:
app: loki
spec:
containers:
- args:
- -config.file=/etc/loki/loki-all.yaml
image: grafana/loki:2.5.0
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 3
httpGet:
path: /ready
port: http-metrics
scheme: HTTP
initialDelaySeconds: 45
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
name: loki
ports:
- containerPort: 3100
name: http-metrics
protocol: TCP
- containerPort: 9095
name: grpc
protocol: TCP
- containerPort: 7946
name: memberlist-port
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /ready
port: http-metrics
scheme: HTTP
initialDelaySeconds: 45
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
resources:
requests:
cpu: 500m
memory: 500Mi
limits:
cpu: 500m
memory: 500Mi
securityContext:
readOnlyRootFilesystem: true
volumeMounts:
- mountPath: /etc/loki
name: config
- mountPath: /data
name: storage
restartPolicy: Always
securityContext:
fsGroup: 10001
runAsGroup: 10001
runAsNonRoot: true
runAsUser: 10001
serviceAccount: loki
serviceAccountName: loki
volumes:
- emptyDir: {}
name: tmp
- name: config
configMap:
name: loki
- persistentVolumeClaim:
claimName: loki
name: storage
---
kind: Service
apiVersion: v1
metadata:
name: loki-memberlist
namespace: default
spec:
ports:
- name: loki-memberlist
protocol: TCP
port: 7946
targetPort: 7946
selector:
kubepi.org/name: loki
---
kind: Service
apiVersion: v1
metadata:
name: loki
namespace: default
spec:
ports:
- name: loki
protocol: TCP
port: 3100
targetPort: 3100
selector:
kubepi.org/name: loki
在上面的配置文件中我添加了一些pod級別的安全策略,這些安全策略還有集群級別的PodSecurityPolicy,防止因?yàn)槁┒吹脑蛟斐杉旱恼麄€崩潰
1.3.1.5 驗(yàn)證部署結(jié)果
當(dāng)看到上面的Running狀態(tài)時可以通過API的方式看一下分發(fā)器是不是正常工作,當(dāng)顯示Active時正常才會正常分發(fā)日志流到收集器(ingester)
1.3.2 裸機(jī)部署
將loki放到系統(tǒng)的/bin/目錄下,準(zhǔn)備grafana-loki.service控制文件重載系統(tǒng)服務(wù)列表
[Unit]
Description=Grafana Loki Log Ingester
Documentation=https://grafana.com/logs/
After=network-online.target
[Service]
ExecStart=/bin/loki --config.file /etc/loki/loki-all.yaml
ExecReload=/bin/kill -s HUP $MAINPID
ExecStop=/bin/kill -s TERM $MAINPID
[Install]
WantedBy=multi-user.target
重載系統(tǒng)列表命令,可以直接系統(tǒng)自動管理服務(wù):
systemctl daemon-reload
# 啟動服務(wù)
systemctl start grafana-loki
# 停止服務(wù)
systemctl stop grafana-loki
# 重載應(yīng)用
systemctl reload grafana-loki
1.4 Promtail部署
部署客戶端收集日志時也需要創(chuàng)建一個配置文件,按照上面創(chuàng)建服務(wù)端的步驟創(chuàng)建。不同的是需要把日志內(nèi)容push到服務(wù)端
1.4.1 k8s部署
1.4.1.1 創(chuàng)建配置文件
server:
log_level: info
http_listen_port: 3101
clients:
- url: http://loki:3100/loki/api/v1/push
positions:
filename: /run/promtail/positions.yaml
scrape_configs:
- job_name: kubernetes-pods
pipeline_stages:
- cri: {}
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels:
- __meta_kubernetes_pod_controller_name
regex: ([0-9a-z-.]+?)(-[0-9a-f]{8,10})?
action: replace
target_label: __tmp_controller_name
- source_labels:
- __meta_kubernetes_pod_label_app_kubernetes_io_name
- __meta_kubernetes_pod_label_app
- __tmp_controller_name
- __meta_kubernetes_pod_name
regex: ^;*([^;]+)(;.*)?$
action: replace
target_label: app
- source_labels:
- __meta_kubernetes_pod_label_app_kubernetes_io_instance
- __meta_kubernetes_pod_label_release
regex: ^;*([^;]+)(;.*)?$
action: replace
target_label: instance
- source_labels:
- __meta_kubernetes_pod_label_app_kubernetes_io_component
- __meta_kubernetes_pod_label_component
regex: ^;*([^;]+)(;.*)?$
action: replace
target_label: component
- action: replace
source_labels:
- __meta_kubernetes_pod_node_name
target_label: node_name
- action: replace
source_labels:
- __meta_kubernetes_namespace
target_label: namespace
- action: replace
replacement: $1
separator: /
source_labels:
- namespace
- app
target_label: job
- action: replace
source_labels:
- __meta_kubernetes_pod_name
target_label: pod
- action: replace
source_labels:
- __meta_kubernetes_pod_container_name
target_label: container
- action: replace
replacement: /var/log/pods/*$1/*.log
separator: /
source_labels:
- __meta_kubernetes_pod_uid
- __meta_kubernetes_pod_container_name
target_label: __path__
- action: replace
regex: true/(.*)
replacement: /var/log/pods/*$1/*.log
separator: /
source_labels:
- __meta_kubernetes_pod_annotationpresent_kubernetes_io_config_hash
- __meta_kubernetes_pod_annotation_kubernetes_io_config_hash
- __meta_kubernetes_pod_container_name
target_label: __path__
用上面的內(nèi)容創(chuàng)建一個configMap,方法同上
1.4.1.2 創(chuàng)建DaemonSet文件
Promtail是一個無狀態(tài)應(yīng)用不需要進(jìn)行持久化存儲只需要部署到集群里面就可以了,還是同樣的準(zhǔn)備DaemonSets創(chuàng)建文件。
kind: DaemonSet
apiVersion: apps/v1
metadata:
name: promtail
namespace: default
labels:
app.kubernetes.io/instance: promtail
app.kubernetes.io/name: promtail
app.kubernetes.io/version: 2.5.0
spec:
selector:
matchLabels:
app.kubernetes.io/instance: promtail
app.kubernetes.io/name: promtail
template:
metadata:
labels:
app.kubernetes.io/instance: promtail
app.kubernetes.io/name: promtail
spec:
volumes:
- name: config
configMap:
name: promtail
- name: run
hostPath:
path: /run/promtail
- name: containers
hostPath:
path: /var/lib/docker/containers
- name: pods
hostPath:
path: /var/log/pods
containers:
- name: promtail
image: docker.io/grafana/promtail:2.3.0
args:
- '-config.file=/etc/promtail/promtail.yaml'
ports:
- name: http-metrics
containerPort: 3101
protocol: TCP
env:
- name: HOSTNAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: spec.nodeName
volumeMounts:
- name: config
mountPath: /etc/promtail
- name: run
mountPath: /run/promtail
- name: containers
readOnly: true
mountPath: /var/lib/docker/containers
- name: pods
readOnly: true
mountPath: /var/log/pods
readinessProbe:
httpGet:
path: /ready
port: http-metrics
scheme: HTTP
initialDelaySeconds: 10
timeoutSeconds: 1
periodSeconds: 10
successThreshold: 1
failureThreshold: 5
imagePullPolicy: IfNotPresent
securityContext:
capabilities:
drop:
- ALL
readOnlyRootFilesystem: false
allowPrivilegeEscalation: false
restartPolicy: Always
serviceAccountName: promtail
serviceAccount: promtail
tolerations:
- key: node-role.kubernetes.io/master
operator: Exists
effect: NoSchedule
- key: node-role.kubernetes.io/control-plane
operator: Exists
effect: NoSchedule
1.4.1.3 創(chuàng)建promtail應(yīng)用
kubectl apply -f promtail.yaml
使用上面這個命令創(chuàng)建后就可以看到服務(wù)已經(jīng)創(chuàng)建好了。接下來就是在Grafana里面添加DataSource查看數(shù)據(jù)了。
1.4.2 裸機(jī)部署
如果是裸機(jī)部署的情況下,需要對上面的配置文件做一下稍微的改動,更改clients的地址就可以,文件存放到/etc/loki/下,例如改成:
clients:
- url: http://ipaddress:port/loki/api/v1/push
添加系統(tǒng)開機(jī)啟動配置,service配置文件存放位置/usr/lib/systemd/system/loki-promtail.service內(nèi)容如下
[Unit]
Description=Grafana Loki Log Ingester
Documentation=https://grafana.com/logs/
After=network-online.target
[Service]
ExecStart=/bin/promtail --config.file /etc/loki/loki-promtail.yaml
ExecReload=/bin/kill -s HUP $MAINPID
ExecStop=/bin/kill -s TERM $MAINPID
[Install]
WantedBy=multi-user.target
啟動方式同上面服務(wù)端部署內(nèi)容
1.5 數(shù)據(jù)源
添加數(shù)據(jù)源,具體步驟: Grafana->Setting->DataSources->AddDataSource->Loki
注意:http的URL地址,應(yīng)用、服務(wù)部署在哪個namespace下,就需要指定它的FQDN地址,它的格式是ServiceName.namespace。如果默認(rèn)在default下、創(chuàng)建的端口號是3100,就需要填寫為http://loki:3100,這里為什么不寫IP地址而寫成服務(wù)的名字,是因?yàn)樵趉8s集群里面有個dns服務(wù)器會自動解析這個地址。
查找日志信息
1.6 其他客戶端配置
1.6.1 Logstash作為日志收集客戶端
在啟動Logstash后我們需要安裝一個插件,可以通過這個命令安裝loki的輸出插件,安裝完成之后可以在logstash的output中添加信息。
bin/logstash-plugin install logstash-output-loki
添加配置進(jìn)行測試
完整的logstash配置信息,可以參考官網(wǎng)給出的內(nèi)容LogstashConfigFile
output {
loki {
[url => "" | default = none | required=true]
[tenant_id => string | default = nil | required=false]
[message_field => string | default = "message" | required=false]
[include_fields => array | default = [] | required=false]
[batch_wait => number | default = 1(s) | required=false]
[batch_size => number | default = 102400(bytes) | required=false]
[min_delay => number | default = 1(s) | required=false]
[max_delay => number | default = 300(s) | required=false]
[retries => number | default = 10 | required=false]
[username => string | default = nil | required=false]
[password => secret | default = nil | required=false]
[cert => path | default = nil | required=false]
[key => path | default = nil| required=false]
[ca_cert => path | default = nil | required=false]
[insecure_skip_verify => boolean | default = false | required=false]
}
}
或者采用logstash的http輸出模塊,配置如下:
output {
http {
format => "json"
http_method => "post"
content_type => "application/json"
connect_timeout => 10
url => "http://loki:3100/loki/api/v1/push"
message => '"message":"%{message}"}'
}
}
1.7 Helm安裝
如果想簡便安裝的話,可以采用helm來安裝。helm將所有的安裝步驟都進(jìn)行了封裝,簡化了安裝步驟。
對于想詳細(xì)了解k8s的人來說,helm不太適合。因?yàn)樗庋b后自動執(zhí)行,k8s管理員不知道各組件之間是如何依賴的,可能會造成誤區(qū)。
廢話不多說,下面開始helm安裝:
添加repo源
helm repo add grafana https://grafana.github.io/helm-charts更新源
helm repo update部署
默認(rèn)配置helm upgrade --install loki grafana/loki-simple-scalable
自定義namespacehelm upgrade --install loki --namespace=loki grafana/loki-simple-scalable
自定義配置信息helm upgrade --install loki grafana/loki-simple-scalable --set "key1=val1,key2=val2,..."
1.8 故障解決方案
1.8.1 502 BadGateWay
loki的地址填寫不正確
在k8s里面,地址填寫錯誤造成了502。檢查一下loki的地址是否是以下內(nèi)容:
http://LokiServiceName
http://LokiServiceName.namespace
http://LokiServiceName.namespace:ServicePort
grafana和loki在不同的節(jié)點(diǎn)上,檢查一下節(jié)點(diǎn)間網(wǎng)絡(luò)通信狀態(tài)、防火墻策略
1.8.2 Ingester not ready: instance xx:9095 in state JOINING
耐心等待一會,因?yàn)槭?code style="margin-right: 3px;margin-left: 3px;padding-right: 5px;padding-left: 5px;font-family: ui-monospace, SFMono-Regular, SF Mono, Menlo, Consolas, Liberation Mono, monospace, sans-serif;font-size: 12px;line-height: 1.8;display: inline-block;overflow-x: auto;vertical-align: middle;border-radius: 3px;background-color: rgb(251, 229, 225);color: rgb(192, 52, 29);border-width: initial !important;border-style: none !important;border-color: initial !important;">allInOne模式程序啟動需要一定的時間。
1.8.3 too many unhealthy instances in the ring
將ingester.lifecycler.replication_factor改為1,是因?yàn)檫@個設(shè)置不正確造成的。這個在啟動的時候會設(shè)置為多個復(fù)制源,但當(dāng)前只部署了一個所以在查看label的時候提示這個
1.8.4 Data source connected
Data source connected, but no labels received. Verify that Loki and Promtail is configured properly
promtail無法將收集到的日志發(fā)送給loki,許可檢查一下promtail的輸出是不是正常promtail在loki還沒有準(zhǔn)備就緒的時候把日志發(fā)送過來了,但loki沒有接收到。如果需要重新接收日志,需要刪除positions.yaml文件,具體路徑可以用find查找一下位置promtail忽略了目標(biāo)日志文件或者配置文件錯誤造成的無法正常啟動promtail無法在指定的位置發(fā)現(xiàn)日志文件
鏈接:https://www.cnblogs.com/jingzh/p/17998082
(版權(quán)歸原作者所有,侵刪)
