Hadoop and Spark by Leela Prasad: ELK

· MySQL => Databases => Tables => Columns/Rows

· Elasticsearch => Indices => Types => Documents with Properties

Elasticsearch has to store the data somewhere. This functionality is stored into shards, which are either the Primary or Replica

ELK Stack Installation:

ELK stack components being used are:

· filebeat version 5.5.2

· logstash 5.5.2

· elasticsearch 5.5.2

· kibana 5.5.2

filebeat

Beats needs to be installed on all the host machines from which you want to read your logs.

To get specific version of ELK browse to https://www.elastic.co/downloads/past-releases

Select the appropriate product and version and download the RPM. In the directory execute the sudo yum install filebeat in all the host machines.

sudo chmod 755 filebeat

Logstash

Needs to be installed on the host machine/ edge node. Download RPM and

sudo yum install logstash

To test your installation,

cd /usr/share/logstash/

sudo /usr/share/logstash/bin/logstash -e 'input { stdin { } } output { stdout {} }'

# After starting Logstash, wait until you see "Pipeline main started" and then enter hello world at the command prompt

ElasticSearch

Needs to be installed on the machine which is going to Elasticsearch filesystem. Download RPM and

sudo yum install elasticsearch

To test your installation

curl -XGET 'localhost:9200/?pretty'

Kibana

sudo yum install kibana

vi /etc/kibana/kibana.yml

edit,enable server.port: and server.host:

sudo service kibana start

To test your installation

Use a browser to open http:[hostname]:5601

Configuration

filebeat

Edit filebeat config file to add the log files to be scanned and shipped to logstash.

###################### Filebeat Configuration Example #########################

# This file is an example configuration file highlighting only the most common

# options. The filebeat.full.yml file from the same directory contains all the

# supported options with more comments. You can use it as a reference.

# You can find the full configuration reference here:

# https://www.elastic.co/guide/en/beats/filebeat/index.html

#=========================== Filebeat prospectors =============================

filebeat.prospectors:

# Each - is a prospector. Most options can be set at the prospector level, so

# you can use different prospectors for various configurations.

# Below are the prospector specific configurations.

- input_type: log

# Paths that should be crawled and fetched. Glob based paths.

paths:

#- /home/sraja005/flume.log

- /var/log/flume-ng/flume-ng-agent.log

fields:

log_type: flumeLog

#----------------------------- Logstash output --------------------------------

output.logstash:

# The Logstash hosts

hosts: ["tsb1.devlab.motive.com:5044"]

Logstash

Create a logstash configuration file and place it in the folder mentioned below

cd /etc/logstash/conf.d/

#Here is a sample conf file.

vi flumetest.conf

input {

beats {

port => "5044"

codec => multiline {

# Grok pattern names are valid! :)

pattern => "^(%{MONTHDAY} %{MONTH} %{YEAR} %{TIME}|%{YEAR}-%{MONTHNUM})"

negate => true

what => "previous"

}

filter {

if ([fields][log_type] == "flumeLog") {

grok {

match => { "message" => "%{MONTHDAY:logDate} %{MONTH:logMonth} %{YEAR:logYear} %{TIME:logTime} %{LOGLEVEL:logLevel} %{GREEDYDATA:message}"}

}

output {

elasticsearch {

hosts => [ "localhost:9200" ]

}

Issues and Points:

1. Source location and Index of the message can be viewed on Message dropdown.

2. For log with starting content,

[12/Oct/2017 09:05:51 ] supervisor ERROR Exception in supervisor main loop

In config file under /etc/logstash/conf.d . Add grok as, where \[ and \] represent []

if ([fields][log_type] == "hueLog") {

grok {

match => { "message" => "\[%{MONTHDAY:logDate}/%{MONTH:logMonth}/%{YEAR:logYear} %{TIME:logTime} \] %{LOGLEVEL:logLevel} %{GREEDYDATA:message}"}

}

Add | \[ in pattern,

pattern => "^(%{MONTHDAY} %{MONTH} %{YEAR} %{TIME}|%{YEAR}-%{MONTHNUM}|\[| )"

Filter for, 17/10/26 13:37:59 ERROR TaskSchedulerImpl: Lost an executor driver (already removed): Executor heartbeat timed out after 239118 ms

if ([fields][log_type] == "sparkLog") {

grok {

match => { "message" => "%{YEAR:logYear}/%{MONTHNUM:logMonth}/%{MONTHDAY:logDate} %{TIME:logTime} %{LOGLEVEL:logLevel} %{GREEDYDATA:message}"}

}

3. By default, an index would be created for every day. To have the data into a single index add index in output of the logstash config file.

elasticsearch {

index => "blogs2"

hosts => [ "localhost:9200" ]

}

To have output in multiple indexes,

output {

elasticsearch {

index => "blogs2"

hosts => [ "localhost:9200" ]

}

elasticsearch {

index => "blogs"

hosts => [ "localhost:9200" ]

}

4. To list the indexes,

curl 'localhost:9200/_cat/indices?v'

5. To get info of a particular Index ‘blogs2’

curl -XGET localhost:9200/blogs2

6. To check the filter of GROK on the text? https://grokdebug.herokuapp.com/

7. For combining timestamp and date value of the log follow https://stackoverflow.com/questions/40385107/logstash-grok-how-to-parse-timestamp-field-using-httpderror-date-pattern

8. To see the health of cluster

curl 'localhost:9200/_cluster/health?pretty'

9. To create new Index, in kibana -> Dev Tools execute the command to create blogs index,

PUT /blogs

{

"settings" : {

"number_of_shards" : 3,

"number_of_replicas" : 1

}

10. Create Index Patterns,

In Kibana -> Management -> Index Patterns -> Create Index pattern and provide the index name or pattern as ‘blogs*’ -> Create.

Extract values from existing field and create new field in logstash.

2 Approaches for this:

1. copy of source by creating a temp variable,

 if ([fields][log_type] == "yarnHive2kafkaLog") {
    grok {
            match => { "message" => "%{YEAR:logYear}-%{MONTHNUM:logMonth}-%{MONTHDAY:logDate} %{TIME:logTime} \!%{SPACE}%{LOGLEVEL:logLevel}%{SPACE}\! %{GREEDYDATA:message}"}
         }
    mutate {
            copy => { "source" => "source_tmp" }
           }
    mutate {
            split => ["source_tmp", "/"]
            add_field => { "applicationID" => "%{source_tmp[4]}" }
           }                       
            }

2. grok filter on source

 if ([fields][log_type] == "yarnHive2kafkaLog") {
    grok {
            match => { "message" => "%{YEAR:logYear}-%{MONTHNUM:logMonth}-%{MONTHDAY:logDate} %{TIME:logTime} \!%{SPACE}%{LOGLEVEL:logLevel}%{SPACE}\! %{GREEDYDATA:message}"}
         }
    grok {
            match => { "source" => "/%{GREEDYDATA:primaryDir}/%{GREEDYDATA:subDir1}/%{GREEDYDATA:subDir2}/%{GREEDYDATA:subDir3}/%{GREEDYDATA:containerID}/%{GREEDYDATA:fileName}"}
            }
    mutate {
           add_field => { "applicationID" => "%{subDir3}" }
           }                       
            }

Follow, https://discuss.elastic.co/t/split-source-value-and-create-a-custom-field-with-splitted-one/110334/5

Spaces not proper while applying grok

2017-11-15 09:21:06,578 ! ERROR ! [Driver] ! imps.CuratorFrameworkImpl ! Background

2017-11-20 03:35:17,730 ! WARN ! [Reporter] ! impl.AMRMClientImpl ! ApplicationMaster

In the above 2 logs the space is not indented in the same manner for ERROR and WARN. To handle this use %{SPACE}, which is equivalent to 0 or many spaces

grok {

match => { "message" => "%{YEAR:logYear}-%{MONTHNUM:logMonth}-%{MONTHDAY:logDate} %{TIME:logTime} !%{SPACE}%{LOGLEVEL:logLevel}%{SPACE}! %{GREEDYDATA:message}"}

}

1. Remove trailing white space in logstash filter

Follow https://discuss.elastic.co/t/remove-trailing-white-space-in-logstash-filter/110819

Approach 1: using something like NOTSPACE instead of GREEDYDATA.

For log,

[24/Oct/2017 15:04:53 ] cluster WARNING Picking RM HA: ha

[%{MONTHDAY:logDate}/%{MONTH:logMonth}/%{YEAR:logYear} %{TIME:logTime} ]%{SPACE}%{GREEDYDATA:platformType} +\s %{SPACE}%{LOGLEVEL:logLevel}%{SPACE}%{GREEDYDATA:message}

The above filter leads to trailing white space for cluster platformType.

\[%{MONTHDAY:logDate}/%{MONTH:logMonth}/%{YEAR:logYear} %{TIME:logTime} \]%{SPACE}%{NOTSPACE:platformType}%{SPACE}%{LOGLEVEL:logLevel}%{SPACE}%{GREEDYDATA:message}

GREEDYDATA when replaced with NOTSPACE resolves this Issue.

Approach2:

Place after grok to strip whitespaces,

mutate {

strip => ["platformType"]

}

9 comments:

amarJune 4, 2018 at 2:59 AM
nice blog
UnknownJuly 31, 2018 at 4:39 AM
Thanks for sharing the more valuable information to share with us. For more information please visit our website. Hadoop Training in Ameerpet

Sadhana RathoreJanuary 27, 2019 at 10:51 PM
I feel happy to see your webpage and looking forward for more updates.
Blue Prism Training in Chennai
UiPath Training in Chennai
UiPath Training Institutes in Chennai
RPA Training in Chennai
Data Science Course in Chennai
Blue Prism Training in Velachery
Blue Prism Training in Tambaram
veera cynixitJuly 8, 2020 at 5:43 AM
Very nice Blog.Keep posting more Blogs.

hadoop admin training
hadoop admin online course
hadoop admin online training
veera cynixitJuly 16, 2020 at 4:15 AM
Very nice post with very unique content.

keep updating more Posts.

hadoop admin online course
hadoop admin online training
hadoop admin certification
Buy SEO ServiceMarch 24, 2021 at 3:24 AM
Thankyou for the valuable content.It was really helpful in understanding the concept.50 High Quality Backlinks for just 50 INR
2000 Backlink at cheapest
5000 Backlink at cheapest
Boost DA upto 15+ at cheapest
Boost DA upto 25+ at cheapest
Rajendra CholanJuly 20, 2021 at 3:30 AM

Title:
Grab Oracle Certification in Chennai | Infycle Technologies

Description:
Want to get Oracle Certification with the job opportunities? Infycle is with you for this! Infycle Technologies gives the most trustworthy training for the Oracle Certification in Chennai, which will be guided by professional tutors in the field. Along with that, the mock interviews will be assigned for the candidates, so that, they can meet the job interviews with full confidence. To transform your career to the next level, call 7502633633 to Infycle Technologies and grab a free demo to get more.
Best traaiining in Chennia
Rajendra CholanSeptember 14, 2021 at 3:04 AM
Set your career goal towards Oracle for a wealthy future with Infycle. Infycle Technologies is the best software training institute in Chennai, which gives the most trusted and best Oracle DBA Training in Chennai with hands-on practical training that will be guided by professional tutors in the field. In addition to this, the mock interviews will be given to the candidates, so that they can face the interviews with full confidence. Apart from all, the candidates will be placed in the top MNC's with a great salary package. To get it all, call 7502633633 and make this happen for your happy life.
BEST TRAINING IN CHENNAI
ramJune 28, 2023 at 4:54 PM
Family hear seat maintain section different. Item product economic above both. House your fire truth sign.education

Hadoop and Spark by Leela Prasad

Thursday, November 30, 2017

ELK

Extract values from existing field and create new field in logstash.

Spaces not proper while applying grok

1. Remove trailing white space in logstash filter

9 comments:

Popular Posts