由于新的系统需要做多个服务器的集群。为了方便查看和管理服务器以及项目日志,偿试采用elastic家族的filebeat+logstash+elasticsearch+kibana搭建日志管理平台。
Logstash本身也可以做日志的收集并且可以做日志解析,但是由于Logstash服务比较占服务器资源,所以很多时候我们采用filebeat做日志收集。
filebeat是一个轻量级的日志收集服务,据网上相关资料显示,一个Logstash运行时占用内存在480M,而一个filebeat运行时只占40M,可见filebeat比Logstash更节省服务器资源。所以在网上看到一些用法是,在分布式集群的多台生产服务器上安装filebeat,然后在一个独立的服务器上安装logstash,filebeat收集当前服务器上的日志后发送到Logstash,由Logstash解析后再保存到Elasticsearch。
当然 filebeat也可以直接将日志输出给Elasticsearch而不经过Logstash,我觉得将日志输出给Logstash再存入Elasticsearch应该是为了对日志内容做一次解析,并且将日志格式化为标准格式再存入elasticsearch。
由于是开发环境,为了便于其他开发人员使用同样的环境做开发,本地采用了docker,并且已经有elasticsearch在docker中运行。这里测试filebeat的时候,我暂时先用docker的elasticsearch做数据存储,filebeat也采用docker运行模式。最终需要布署到服务器时,elasticsearch将采用云Elasticsearch,这样可以减少维护成本,同时服务稳定性也不用有过多的担心;而filebeat则是需要在各个服务器上单独安装。
filebeat下载地址: https://www.elastic.co/cn/products/beats/filebeat
#版本号
version: "3"
#服务
services:
logstash:
depends_on:
- elasticsearch
volumes:
- ./logstash.conf:/config/logstash.conf
image: logstash:5.6.5
restart: always
command: "/usr/share/logstash/bin/logstash -f /config/logstash.conf"
filebeat:
depends_on:
- logstash
image: docker.elastic.co/beats/filebeat:5.6.5
volumes:
- ./filebeat.yml:/usr/share/filebeat/filebeat.yml
- ./log:/tmp
restart: always
privileged: true
nginx:
volumes:
- ./log/:/var/log/nginx
- ./nginx.conf:/etc/nginx/nginx.conf
image: nginx
ports:
- 888:80
elasticsearch:
image: elasticsearch:5.6.5
restart: always
volumes:
- ./jvm.options:/etc/elasticsearch/jvm.options
kibana:
ports:
- 5601:5601
environment:
ELASTICSEARCH_URL: "http://elasticsearch:9200"
depends_on:
- elasticsearch
image: kibana:5.6.5
restart: always
另外,准备一些配置文件:
- filebeat.yml
filebeat的配置文件,配置需要读取的日志文件以及输出的接收地址,在这个测试中,我们读取nginx的access.log文件,并输出到logstash。然后再由logstash解析日志,并提交到elasticsearch保存。
- jvm.options
这个是elasticsearch的jvm配置文件,文件是在网上找的,具体是些什么配置还没有深入研究
- logstash.conf
这个是logstash的配置文件,在这里添加对filebeat提交过的日志做解析,并且把解析后的数据提交到指定的elasticsearch的指定index中保存。
- nginx.conf
这个是docker中nginx的配置文件,为了测试的独立性,在这个测试中单独拉起了一个nginx容器,并且在容器外挂了一个nginx日志目录
nginx.conf
user nginx;
worker_processes 1;
error_log /var/log/nginx/error.log warn;
pid /var/run/nginx.pid;
events {
worker_connections 1024;
}
http {
include /etc/nginx/mime.types;
default_type application/octet-stream;
log_format main '$remote_addr [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" $request_time "$upstream_addr" $upstream_response_time '
'"$http_user_agent" "$http_x_forwarded_for"';
access_log /var/log/nginx/access.log main;
sendfile on;
#tcp_nopush on;
keepalive_timeout 65;
#gzip on;
include /etc/nginx/conf.d/*.conf;
}
在nginx配置文件中,主要注意一下日志的输出格式定义,在logstash的配置文件中,我们还需要基于这个格式对日志记录做解析。
logstash.conf
# Sample Logstash configuration for creating a simple
# Beats -> Logstash -> Elasticsearch pipeline.
input {
beats {
port => 5044
}
}
filter {
if 'nginx-access' in [tags]{
grok {
match =>{
"message" => "^%{IPV4:remote_addr} \[%{HTTPDATE:timestamp}\] \"%{WORD:verb} %{DATA:request} HTTP/%{NUMBER:httpversion}\" %{INT:status} %{INT:body_bytes_sent} \"%{NOTSPACE:http_referer}\" %{NUMBER:request_time} \"%{DATA:upstream_addr}%{DATA:upstream_port}\" %{DATA:upstream_response_time} \"%{DATA:http_user_agent}\" \"%{NOTSPACE:http_x_forwarded_for}\""
}
remove_field => ["message"]
}
}
if 'nginx-error' in [tags]{
grok {
match =>{
"message" => "^%{SYSLOGPROG:ues} %{TIME:time} \[error\] %{DATA:code_code}: %{DATA:code_num} %{DATA:operate} \"%{DATA:file}\" %{DATA:err_desc} %{UNIXPATH:decription} %{PROG:prog}\", host\: \"%{DATA:host}\", referrer: \"%{DATA:referrer}\""
}
remove_field => ["message"]
}
}
}
output {
elasticsearch {
hosts => ["http://elasticsearch:9200"]
index => '%{[fields][log_index]}'
}
}
在logstash的配置文件中,由三部分组成,input,filter,output;其中filter是对提交数据的过滤解析。
Nginx的日志必须和logstash的正则匹配。否则无法正常切割。
Log stash grok正则表达式在线调试网址:http://grokdebug.herokuapp.com/
在这个测试中,为了测试在同一个服务器上收集不同的日志,并将不日志存到elasticsearch不同的索引中,我除了收集access.log还对error.log也做了收集。并且在elashticsearch中建立两个相应的索引来保存。在logstash的配置参数中可以看到 output.elasticsearch中的index并不是一个写死的索引名称。而是接收filebeat传过来的fields参数的log_index。
filebeat.yml
filebeat:
prospectors:
- input_type: log
paths: # 这里是容器内的path
- /tmp/access.log
tags: ["nginx-access"]
fields:
log_index: nginx-access
- input-type: log
paths:
- /tmp/error.log
tags: ["nginx-error"]
fields:
log_index: nginx-error
registry_file: /usr/share/filebeat/data/registry/registry # 这个文件记录日志读取的位置,如果容器重启,可以从记录的位置开始取日志
output:
logstash:
hosts: ["logstash:5044"]
kibana分析nginx访问日志:https://www.centos.bz/2018/04/%E4%BD%BF%E7%94%A8kibana%E5%88%86%E6%9E%90nginx%E8%AE%BF%E9%97%AE%E6%97%A5%E5%BF%97/
jvm.options
## JVM configuration
################################################################
## IMPORTANT: JVM heap size
################################################################
##
## You should always set the min and max JVM heap
## size to the same value. For example, to set
## the heap to 4 GB, set:
##
## -Xms4g
## -Xmx4g
##
## See https://www.elastic.co/guide/en/elasticsearch/reference/current/heap-size.html
## for more information
##
################################################################
# Xms represents the initial size of total heap space
# Xmx represents the maximum size of total heap space
-Xms512m
-Xmx512m
################################################################
## Expert settings
################################################################
##
## All settings below this section are considered
## expert settings. Don't tamper with them unless
## you understand what you are doing
##
################################################################
## GC configuration
-XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly
## optimizations
# pre-touch memory pages used by the JVM during initialization
-XX:+AlwaysPreTouch
## basic
# force the server VM (remove on 32-bit client JVMs)
-server
# explicitly set the stack size (reduce to 320k on 32-bit client JVMs)
-Xss1m
# set to headless, just in case
-Djava.awt.headless=true
# ensure UTF-8 encoding by default (e.g. filenames)
-Dfile.encoding=UTF-8
# use our provided JNA always versus the system one
-Djna.nosys=true
# use old-style file permissions on JDK9
-Djdk.io.permissionsUseCanonicalPath=true
# flags to configure Netty
-Dio.netty.noUnsafe=true
-Dio.netty.noKeySetOptimization=true
-Dio.netty.recycler.maxCapacityPerThread=0
# log4j 2
-Dlog4j.shutdownHookEnabled=false
-Dlog4j2.disable.jmx=true
-Dlog4j.skipJansi=true
## heap dumps
# generate a heap dump when an allocation from the Java heap fails
# heap dumps are created in the working directory of the JVM
-XX:+HeapDumpOnOutOfMemoryError
# specify an alternative path for heap dumps
# ensure the directory exists and has sufficient space
#-XX:HeapDumpPath=${heap.dump.path}
## GC logging
#-XX:+PrintGCDetails
#-XX:+PrintGCTimeStamps
#-XX:+PrintGCDateStamps
#-XX:+PrintClassHistogram
#-XX:+PrintTenuringDistribution
#-XX:+PrintGCApplicationStoppedTime
# log GC status to a file with time stamps
# ensure the directory exists
#-Xloggc:${loggc}
# By default, the GC log file will not rotate.
# By uncommenting the lines below, the GC log file
# will be rotated every 128MB at most 32 times.
#-XX:+UseGCLogFileRotation
#-XX:NumberOfGCLogFiles=32
#-XX:GCLogFileSize=128M
# Elasticsearch 5.0.0 will throw an exception on unquoted field names in JSON.
# If documents were already indexed with unquoted fields in a previous version
# of Elasticsearch, some operations may throw errors.
#
# WARNING: This option will be removed in Elasticsearch 6.0.0 and is provided
# only for migration purposes.
#-Delasticsearch.json.allow_unquoted_field_names=true