·
MySQL
=> Databases => Tables => Columns/Rows
·
Elasticsearch
=> Indices => Types => Documents with Properties
Elasticsearch has to store the data somewhere.
This functionality is stored into shards, which are either
the Primary or Replica
ELK
Stack Installation:
ELK stack
components being used are:
·
filebeat version 5.5.2
·
logstash 5.5.2
·
elasticsearch 5.5.2
·
kibana 5.5.2
filebeat
Beats needs to be
installed on all the host machines from which you want to read your logs.
Select the appropriate product and
version and download the RPM. In the directory execute the
sudo
yum install
filebeat
in all the host machines.sudo
chmod
755 filebeat
Logstash
Needs to be installed on the host machine/ edge node. Download RPM and
sudo
yum install
logstash
To test your
installation,
cd /usr/share/logstash/
sudo /usr/share/logstash/bin/logstash -e 'input
{ stdin { } } output { stdout {} }'
# After starting Logstash, wait until
you see "Pipeline main started" and then enter hello world at the
command prompt
ElasticSearch
Needs to be installed on the machine which is
going to Elasticsearch filesystem. Download RPM and
sudo
yum install
elasticsearch
To test your
installation
curl -XGET 'localhost:9200/?pretty'
|
Kibana
sudo
yum install
kibana
vi
/etc/kibana/kibana.yml
edit,enable server.port: and server.host:
sudo
service kibana start
To test your installation
Use a browser to open http:[hostname]:5601
Configuration
filebeat
Edit filebeat config file to add the log files to be scanned and shipped
to logstash.
###################### Filebeat Configuration Example
#########################
# This file is an example configuration file highlighting only the most
common
# options. The filebeat.full.yml file from the same directory contains
all the
# supported options with more comments. You can use it as a reference.
#
# You can find the full configuration reference here:
# https://www.elastic.co/guide/en/beats/filebeat/index.html
#=========================== Filebeat prospectors
=============================
filebeat.prospectors:
# Each - is a prospector. Most options can be set at the prospector
level, so
# you can use different prospectors for various configurations.
# Below are the prospector specific configurations.
- input_type: log
# Paths that should be crawled
and fetched. Glob based paths.
paths:
#- /home/sraja005/flume.log
-
/var/log/flume-ng/flume-ng-agent.log
fields:
log_type: flumeLog
#----------------------------- Logstash output
--------------------------------
output.logstash:
# The Logstash hosts
hosts:
["tsb1.devlab.motive.com:5044"]
Logstash
Create a logstash
configuration file and place it in the folder mentioned below
cd /etc/logstash/conf.d/
#Here is a sample conf file.
vi flumetest.conf
input {
beats {
port
=> "5044"
codec => multiline {
# Grok
pattern names are valid! :)
pattern
=> "^(%{MONTHDAY} %{MONTH} %{YEAR}
%{TIME}|%{YEAR}-%{MONTHNUM})"
negate
=> true
what
=> "previous"
}
}
}
filter {
if ([fields][log_type]
== "flumeLog") {
grok
{
match
=> { "message" => "%{MONTHDAY:logDate}
%{MONTH:logMonth} %{YEAR:logYear} %{TIME:logTime} %{LOGLEVEL:logLevel}
%{GREEDYDATA:message}"}
}
}
}
output {
elasticsearch
{
hosts
=> [ "localhost:9200" ]
}
}
|
Issues and Points:
1. Source location and
Index of the message can be viewed on Message dropdown.
2.
For log with starting content,
[12/Oct/2017
09:05:51 ] supervisor ERROR Exception in supervisor main loop
In config file
under /etc/logstash/conf.d . Add
grok as, where \[ and \] represent []
if
([fields][log_type] == "hueLog") {
grok {
match => {
"message" =>
"\[%{MONTHDAY:logDate}/%{MONTH:logMonth}/%{YEAR:logYear} %{TIME:logTime}
\] %{LOGLEVEL:logLevel} %{GREEDYDATA:message}"}
}
Add | \[ in
pattern,
pattern =>
"^(%{MONTHDAY} %{MONTH} %{YEAR} %{TIME}|%{YEAR}-%{MONTHNUM}|\[| )"
Filter for, 17/10/26
13:37:59 ERROR TaskSchedulerImpl: Lost an executor driver (already removed):
Executor heartbeat timed out after 239118 ms
if
([fields][log_type] == "sparkLog") {
grok {
match => { "message"
=> "%{YEAR:logYear}/%{MONTHNUM:logMonth}/%{MONTHDAY:logDate}
%{TIME:logTime} %{LOGLEVEL:logLevel} %{GREEDYDATA:message}"}
}
}
3. By default, an
index would be created for every day. To have the data into a single index add
index in output of the logstash config file.
elasticsearch {
index =>
"blogs2"
hosts => [
"localhost:9200" ]
}
To have output in multiple indexes,
output {
elasticsearch {
index =>
"blogs2"
hosts => [
"localhost:9200" ]
}
elasticsearch {
index =>
"blogs"
hosts => [
"localhost:9200" ]
}
}
4. To list the
indexes,
curl 'localhost:9200/_cat/indices?v'
5.
To get info of a particular Index
‘blogs2’
curl -XGET localhost:9200/blogs2
7.
For combining timestamp and date
value of the log follow https://stackoverflow.com/questions/40385107/logstash-grok-how-to-parse-timestamp-field-using-httpderror-date-pattern
8.
To see the health of cluster
curl
'localhost:9200/_cluster/health?pretty'
9. To create new Index,
in kibana -> Dev Tools execute the command to create blogs index,
PUT /blogs
{
"settings" : {
"number_of_shards" : 3,
"number_of_replicas" : 1
}
}
10. Create Index Patterns,
In Kibana ->
Management -> Index Patterns -> Create Index pattern and provide the
index name or pattern as ‘blogs*’ -> Create.
Extract values from existing field and create new field in logstash.
2 Approaches for this:
1. copy of source by creating a temp variable,
if ([fields][log_type] == "yarnHive2kafkaLog") {
grok {
match => { "message" => "%{YEAR:logYear}-%{MONTHNUM:logMonth}-%{MONTHDAY:logDate} %{TIME:logTime} \!%{SPACE}%{LOGLEVEL:logLevel}%{SPACE}\! %{GREEDYDATA:message}"}
}
mutate {
copy => { "source" => "source_tmp" }
}
mutate {
split => ["source_tmp", "/"]
add_field => { "applicationID" => "%{source_tmp[4]}" }
}
}
2. grok filter on source
if ([fields][log_type] == "yarnHive2kafkaLog") {
grok {
match => { "message" => "%{YEAR:logYear}-%{MONTHNUM:logMonth}-%{MONTHDAY:logDate} %{TIME:logTime} \!%{SPACE}%{LOGLEVEL:logLevel}%{SPACE}\! %{GREEDYDATA:message}"}
}
grok {
match => { "source" => "/%{GREEDYDATA:primaryDir}/%{GREEDYDATA:subDir1}/%{GREEDYDATA:subDir2}/%{GREEDYDATA:subDir3}/%{GREEDYDATA:containerID}/%{GREEDYDATA:fileName}"}
}
mutate {
add_field => { "applicationID" => "%{subDir3}" }
}
}
Follow, https://discuss.elastic.co/t/split-source-value-and-create-a-custom-field-with-splitted-one/110334/5
Spaces not proper while applying grok
2017-11-15 09:21:06,578 ! ERROR ! [Driver] ! imps.CuratorFrameworkImpl ! Background
2017-11-20 03:35:17,730 ! WARN ! [Reporter] ! impl.AMRMClientImpl ! ApplicationMaster
In the above 2 logs the space is not indented in the same manner for ERROR and WARN. To handle this use %{SPACE}, which is equivalent to 0 or many spaces
grok {
match => { "message" => "%{YEAR:logYear}-%{MONTHNUM:logMonth}-%{MONTHDAY:logDate} %{TIME:logTime} !%{SPACE}%{LOGLEVEL:logLevel}%{SPACE}! %{GREEDYDATA:message}"}
}
1. Remove trailing
white space in logstash filter
1. Remove trailing
white space in logstash filter
Approach 1: using something like NOTSPACE instead of GREEDYDATA.
For log,
[24/Oct/2017 15:04:53 ] cluster WARNING Picking RM HA: ha
[%{MONTHDAY:logDate}/%{MONTH:logMonth}/%{YEAR:logYear}
%{TIME:logTime} ]%{SPACE}%{GREEDYDATA:platformType}
+\s %{SPACE}%{LOGLEVEL:logLevel}%{SPACE}%{GREEDYDATA:message}
The above filter leads to trailing white space
for cluster platformType.
\[%{MONTHDAY:logDate}/%{MONTH:logMonth}/%{YEAR:logYear}
%{TIME:logTime} \]%{SPACE}%{NOTSPACE:platformType}%{SPACE}%{LOGLEVEL:logLevel}%{SPACE}%{GREEDYDATA:message}
GREEDYDATA
when replaced with NOTSPACE resolves this Issue.
Approach2:
Place
after grok to strip whitespaces,
mutate {
strip =>
["platformType"]
}