·
MySQL
=> Databases => Tables => Columns/Rows
·
Elasticsearch
=> Indices => Types => Documents with Properties
Elasticsearch has to store the data somewhere.
This functionality is stored into shards, which are either
the Primary or Replica
ELK
Stack Installation:
ELK stack
components being used are:
·
filebeat version 5.5.2
·
logstash 5.5.2
·
elasticsearch 5.5.2
·
kibana 5.5.2
filebeat
Beats needs to be
installed on all the host machines from which you want to read your logs.
Select the appropriate product and
version and download the RPM. In the directory execute the
sudo
yum install
filebeat
in all the host machines.sudo
chmod
755 filebeat
Logstash
Needs to be installed on the host machine/ edge node. Download RPM and
sudo
yum install
logstash
To test your
installation,
cd /usr/share/logstash/
sudo /usr/share/logstash/bin/logstash -e 'input
{ stdin { } } output { stdout {} }'
# After starting Logstash, wait until
you see "Pipeline main started" and then enter hello world at the
command prompt
ElasticSearch
Needs to be installed on the machine which is
going to Elasticsearch filesystem. Download RPM and
sudo
yum install
elasticsearch
To test your
installation
curl -XGET 'localhost:9200/?pretty'
|
Kibana
sudo
yum install
kibana
vi
/etc/kibana/kibana.yml
edit,enable server.port: and server.host:
sudo
service kibana start
To test your installation
Use a browser to open http:[hostname]:5601
Configuration
filebeat
Edit filebeat config file to add the log files to be scanned and shipped
to logstash.
###################### Filebeat Configuration Example
#########################
# This file is an example configuration file highlighting only the most
common
# options. The filebeat.full.yml file from the same directory contains
all the
# supported options with more comments. You can use it as a reference.
#
# You can find the full configuration reference here:
# https://www.elastic.co/guide/en/beats/filebeat/index.html
#=========================== Filebeat prospectors
=============================
filebeat.prospectors:
# Each - is a prospector. Most options can be set at the prospector
level, so
# you can use different prospectors for various configurations.
# Below are the prospector specific configurations.
- input_type: log
# Paths that should be crawled
and fetched. Glob based paths.
paths:
#- /home/sraja005/flume.log
-
/var/log/flume-ng/flume-ng-agent.log
fields:
log_type: flumeLog
#----------------------------- Logstash output
--------------------------------
output.logstash:
# The Logstash hosts
hosts:
["tsb1.devlab.motive.com:5044"]
Logstash
Create a logstash
configuration file and place it in the folder mentioned below
cd /etc/logstash/conf.d/
#Here is a sample conf file.
vi flumetest.conf
input {
beats {
port
=> "5044"
codec => multiline {
# Grok
pattern names are valid! :)
pattern
=> "^(%{MONTHDAY} %{MONTH} %{YEAR}
%{TIME}|%{YEAR}-%{MONTHNUM})"
negate
=> true
what
=> "previous"
}
}
}
filter {
if ([fields][log_type]
== "flumeLog") {
grok
{
match
=> { "message" => "%{MONTHDAY:logDate}
%{MONTH:logMonth} %{YEAR:logYear} %{TIME:logTime} %{LOGLEVEL:logLevel}
%{GREEDYDATA:message}"}
}
}
}
output {
elasticsearch
{
hosts
=> [ "localhost:9200" ]
}
}
|
Issues and Points:
1. Source location and
Index of the message can be viewed on Message dropdown.
2.
For log with starting content,
[12/Oct/2017
09:05:51 ] supervisor ERROR Exception in supervisor main loop
In config file
under /etc/logstash/conf.d . Add
grok as, where \[ and \] represent []
if
([fields][log_type] == "hueLog") {
grok {
match => {
"message" =>
"\[%{MONTHDAY:logDate}/%{MONTH:logMonth}/%{YEAR:logYear} %{TIME:logTime}
\] %{LOGLEVEL:logLevel} %{GREEDYDATA:message}"}
}
Add | \[ in
pattern,
pattern =>
"^(%{MONTHDAY} %{MONTH} %{YEAR} %{TIME}|%{YEAR}-%{MONTHNUM}|\[| )"
Filter for, 17/10/26
13:37:59 ERROR TaskSchedulerImpl: Lost an executor driver (already removed):
Executor heartbeat timed out after 239118 ms
if
([fields][log_type] == "sparkLog") {
grok {
match => { "message"
=> "%{YEAR:logYear}/%{MONTHNUM:logMonth}/%{MONTHDAY:logDate}
%{TIME:logTime} %{LOGLEVEL:logLevel} %{GREEDYDATA:message}"}
}
}
3. By default, an
index would be created for every day. To have the data into a single index add
index in output of the logstash config file.
elasticsearch {
index =>
"blogs2"
hosts => [
"localhost:9200" ]
}
To have output in multiple indexes,
output {
elasticsearch {
index =>
"blogs2"
hosts => [
"localhost:9200" ]
}
elasticsearch {
index =>
"blogs"
hosts => [
"localhost:9200" ]
}
}
4. To list the
indexes,
curl 'localhost:9200/_cat/indices?v'
5.
To get info of a particular Index
‘blogs2’
curl -XGET localhost:9200/blogs2
7.
For combining timestamp and date
value of the log follow https://stackoverflow.com/questions/40385107/logstash-grok-how-to-parse-timestamp-field-using-httpderror-date-pattern
8.
To see the health of cluster
curl
'localhost:9200/_cluster/health?pretty'
9. To create new Index,
in kibana -> Dev Tools execute the command to create blogs index,
PUT /blogs
{
"settings" : {
"number_of_shards" : 3,
"number_of_replicas" : 1
}
}
10. Create Index Patterns,
In Kibana ->
Management -> Index Patterns -> Create Index pattern and provide the
index name or pattern as ‘blogs*’ -> Create.
Extract values from existing field and create new field in logstash.
2 Approaches for this:
1. copy of source by creating a temp variable,
if ([fields][log_type] == "yarnHive2kafkaLog") {
grok {
match => { "message" => "%{YEAR:logYear}-%{MONTHNUM:logMonth}-%{MONTHDAY:logDate} %{TIME:logTime} \!%{SPACE}%{LOGLEVEL:logLevel}%{SPACE}\! %{GREEDYDATA:message}"}
}
mutate {
copy => { "source" => "source_tmp" }
}
mutate {
split => ["source_tmp", "/"]
add_field => { "applicationID" => "%{source_tmp[4]}" }
}
}
2. grok filter on source
if ([fields][log_type] == "yarnHive2kafkaLog") {
grok {
match => { "message" => "%{YEAR:logYear}-%{MONTHNUM:logMonth}-%{MONTHDAY:logDate} %{TIME:logTime} \!%{SPACE}%{LOGLEVEL:logLevel}%{SPACE}\! %{GREEDYDATA:message}"}
}
grok {
match => { "source" => "/%{GREEDYDATA:primaryDir}/%{GREEDYDATA:subDir1}/%{GREEDYDATA:subDir2}/%{GREEDYDATA:subDir3}/%{GREEDYDATA:containerID}/%{GREEDYDATA:fileName}"}
}
mutate {
add_field => { "applicationID" => "%{subDir3}" }
}
}
Follow, https://discuss.elastic.co/t/split-source-value-and-create-a-custom-field-with-splitted-one/110334/5
Spaces not proper while applying grok
2017-11-15 09:21:06,578 ! ERROR ! [Driver] ! imps.CuratorFrameworkImpl ! Background
2017-11-20 03:35:17,730 ! WARN ! [Reporter] ! impl.AMRMClientImpl ! ApplicationMaster
In the above 2 logs the space is not indented in the same manner for ERROR and WARN. To handle this use %{SPACE}, which is equivalent to 0 or many spaces
grok {
match => { "message" => "%{YEAR:logYear}-%{MONTHNUM:logMonth}-%{MONTHDAY:logDate} %{TIME:logTime} !%{SPACE}%{LOGLEVEL:logLevel}%{SPACE}! %{GREEDYDATA:message}"}
}
1. Remove trailing
white space in logstash filter
1. Remove trailing
white space in logstash filter
Approach 1: using something like NOTSPACE instead of GREEDYDATA.
For log,
[24/Oct/2017 15:04:53 ] cluster WARNING Picking RM HA: ha
[%{MONTHDAY:logDate}/%{MONTH:logMonth}/%{YEAR:logYear}
%{TIME:logTime} ]%{SPACE}%{GREEDYDATA:platformType}
+\s %{SPACE}%{LOGLEVEL:logLevel}%{SPACE}%{GREEDYDATA:message}
The above filter leads to trailing white space
for cluster platformType.
\[%{MONTHDAY:logDate}/%{MONTH:logMonth}/%{YEAR:logYear}
%{TIME:logTime} \]%{SPACE}%{NOTSPACE:platformType}%{SPACE}%{LOGLEVEL:logLevel}%{SPACE}%{GREEDYDATA:message}
GREEDYDATA
when replaced with NOTSPACE resolves this Issue.
Approach2:
Place
after grok to strip whitespaces,
mutate {
strip =>
["platformType"]
}
nice blog
ReplyDeleteThanks for sharing the more valuable information to share with us. For more information please visit our website. Hadoop Training in Ameerpet
ReplyDeleteI feel happy to see your webpage and looking forward for more updates.
ReplyDeleteBlue Prism Training in Chennai
UiPath Training in Chennai
UiPath Training Institutes in Chennai
RPA Training in Chennai
Data Science Course in Chennai
Blue Prism Training in Velachery
Blue Prism Training in Tambaram
Very nice Blog.Keep posting more Blogs.
ReplyDeletehadoop admin training
hadoop admin online course
hadoop admin online training
Very nice post with very unique content.
ReplyDeletekeep updating more Posts.
hadoop admin online course
hadoop admin online training
hadoop admin certification
Thankyou for the valuable content.It was really helpful in understanding the concept.50 High Quality Backlinks for just 50 INR
ReplyDelete2000 Backlink at cheapest
5000 Backlink at cheapest
Boost DA upto 15+ at cheapest
Boost DA upto 25+ at cheapest
ReplyDeleteTitle:
Grab Oracle Certification in Chennai | Infycle Technologies
Description:
Want to get Oracle Certification with the job opportunities? Infycle is with you for this! Infycle Technologies gives the most trustworthy training for the Oracle Certification in Chennai, which will be guided by professional tutors in the field. Along with that, the mock interviews will be assigned for the candidates, so that, they can meet the job interviews with full confidence. To transform your career to the next level, call 7502633633 to Infycle Technologies and grab a free demo to get more.
Best traaiining in Chennia
Set your career goal towards Oracle for a wealthy future with Infycle. Infycle Technologies is the best software training institute in Chennai, which gives the most trusted and best Oracle DBA Training in Chennai with hands-on practical training that will be guided by professional tutors in the field. In addition to this, the mock interviews will be given to the candidates, so that they can face the interviews with full confidence. Apart from all, the candidates will be placed in the top MNC's with a great salary package. To get it all, call 7502633633 and make this happen for your happy life.
ReplyDeleteBEST TRAINING IN CHENNAI
Family hear seat maintain section different. Item product economic above both. House your fire truth sign.education
ReplyDelete