Thursday, November 30, 2017


·         MySQL => Databases => Tables => Columns/Rows
·         Elasticsearch => Indices => Types => Documents with Properties

Elasticsearch has to store the data somewhere. This functionality is stored into shards, which are either the Primary or Replica

ELK Stack Installation:
ELK stack components being used are:
·         filebeat version 5.5.2
·         logstash 5.5.2
·         elasticsearch 5.5.2
·         kibana 5.5.2
Beats needs to be installed on all the host machines from which you want to read your logs.
To get specific version of ELK browse to
Select the appropriate product and version and download the RPM. In the directory execute the sudo yum install filebeat in all the host machines.
sudo chmod 755 filebeat
Needs to be installed on the host machine/ edge node. Download RPM and
sudo yum install logstash
To test your installation,
cd /usr/share/logstash/
sudo /usr/share/logstash/bin/logstash -e 'input { stdin { } } output { stdout {} }'
# After starting Logstash, wait until you see "Pipeline main started" and then enter hello world at the command prompt

Needs to be installed on the machine which is going to Elasticsearch filesystem. Download RPM and
sudo yum install elasticsearch
To test your installation
curl -XGET 'localhost:9200/?pretty'


sudo yum install kibana

vi /etc/kibana/kibana.yml
 edit,enable server.port: and

sudo service kibana start

To test your installation
Use a browser to open http:[hostname]:5601


Edit filebeat config file to add the log files to be scanned and shipped to logstash.

###################### Filebeat Configuration Example #########################

# This file is an example configuration file highlighting only the most common
# options. The filebeat.full.yml file from the same directory contains all the
# supported options with more comments. You can use it as a reference.
# You can find the full configuration reference here:

#=========================== Filebeat prospectors =============================


# Each - is a prospector. Most options can be set at the prospector level, so
# you can use different prospectors for various configurations.
# Below are the prospector specific configurations.

- input_type: log

  # Paths that should be crawled and fetched. Glob based paths.
    #- /home/sraja005/flume.log
    - /var/log/flume-ng/flume-ng-agent.log
     log_type: flumeLog

#----------------------------- Logstash output --------------------------------
  # The Logstash hosts
  hosts: [""]

Create a logstash configuration file and place it in the folder mentioned below
cd /etc/logstash/conf.d/

#Here is a sample conf file.

vi flumetest.conf
input {
  beats {
    port => "5044"
    codec => multiline {
      # Grok pattern names are valid! :)
      pattern => "^(%{MONTHDAY} %{MONTH} %{YEAR} %{TIME}|%{YEAR}-%{MONTHNUM})"
      negate => true
      what => "previous"


filter {
        if ([fields][log_type] == "flumeLog") {
        grok {
                match => { "message" => "%{MONTHDAY:logDate} %{MONTH:logMonth} %{YEAR:logYear} %{TIME:logTime} %{LOGLEVEL:logLevel} %{GREEDYDATA:message}"}
output {
        elasticsearch {
        hosts => [ "localhost:9200" ]

Issues and Points:

1.     Source location and Index of the message can be viewed on Message dropdown.
2.     For log with starting content,

[12/Oct/2017 09:05:51 ] supervisor   ERROR    Exception in supervisor main loop

In config file under /etc/logstash/conf.d . Add grok as, where \[ and \] represent []

if ([fields][log_type] == "hueLog") {
        grok {
                match => { "message" => "\[%{MONTHDAY:logDate}/%{MONTH:logMonth}/%{YEAR:logYear} %{TIME:logTime} \] %{LOGLEVEL:logLevel} %{GREEDYDATA:message}"}

Add | \[ in pattern,

pattern => "^(%{MONTHDAY} %{MONTH} %{YEAR} %{TIME}|%{YEAR}-%{MONTHNUM}|\[| )"

Filter for, 17/10/26 13:37:59 ERROR TaskSchedulerImpl: Lost an executor driver (already removed): Executor heartbeat timed out after 239118 ms

if ([fields][log_type] == "sparkLog") {
        grok {
                match => { "message" => "%{YEAR:logYear}/%{MONTHNUM:logMonth}/%{MONTHDAY:logDate} %{TIME:logTime} %{LOGLEVEL:logLevel} %{GREEDYDATA:message}"}

3.     By default, an index would be created for every day. To have the data into a single index add index in output of the logstash config file.
        elasticsearch {
        index => "blogs2"
        hosts => [ "localhost:9200" ]

To have output in multiple indexes,
output {
        elasticsearch {
        index => "blogs2"
        hosts => [ "localhost:9200" ]
        elasticsearch {
        index => "blogs"
        hosts => [ "localhost:9200" ]

4.     To list the indexes,
curl 'localhost:9200/_cat/indices?v'

5.     To get info of a particular Index ‘blogs2’
curl -XGET localhost:9200/blogs2

6.     To check the filter of GROK on the text?

7.     For combining timestamp and date value of the log follow
8.     To see the health of cluster
curl 'localhost:9200/_cluster/health?pretty'

9.     To create new Index, in kibana -> Dev Tools execute the command to create blogs index,
PUT /blogs
   "settings" : {
      "number_of_shards" : 3,
      "number_of_replicas" : 1
10.  Create Index Patterns,

In Kibana -> Management -> Index Patterns -> Create Index pattern and provide the index name or pattern as ‘blogs*’ -> Create.

Extract values from existing field and create new field in logstash.

2 Approaches for this:

1. copy of source by creating a temp variable,
 if ([fields][log_type] == "yarnHive2kafkaLog") {
    grok {
            match => { "message" => "%{YEAR:logYear}-%{MONTHNUM:logMonth}-%{MONTHDAY:logDate} %{TIME:logTime} \!%{SPACE}%{LOGLEVEL:logLevel}%{SPACE}\! %{GREEDYDATA:message}"}
    mutate {
            copy => { "source" => "source_tmp" }
    mutate {
            split => ["source_tmp", "/"]
            add_field => { "applicationID" => "%{source_tmp[4]}" }
2. grok filter on source
 if ([fields][log_type] == "yarnHive2kafkaLog") {
    grok {
            match => { "message" => "%{YEAR:logYear}-%{MONTHNUM:logMonth}-%{MONTHDAY:logDate} %{TIME:logTime} \!%{SPACE}%{LOGLEVEL:logLevel}%{SPACE}\! %{GREEDYDATA:message}"}
    grok {
            match => { "source" => "/%{GREEDYDATA:primaryDir}/%{GREEDYDATA:subDir1}/%{GREEDYDATA:subDir2}/%{GREEDYDATA:subDir3}/%{GREEDYDATA:containerID}/%{GREEDYDATA:fileName}"}
    mutate {
           add_field => { "applicationID" => "%{subDir3}" }

2017-11-15 09:21:06,578 ! ERROR ! [Driver] ! imps.CuratorFrameworkImpl ! Background 
2017-11-20 03:35:17,730 !  WARN ! [Reporter] ! impl.AMRMClientImpl ! ApplicationMaster 

In the above 2 logs the space is not indented in the same manner for ERROR and WARN. To handle this use %{SPACE}, which is equivalent to 0 or many spaces

grok {
match => { "message" => "%{YEAR:logYear}-%{MONTHNUM:logMonth}-%{MONTHDAY:logDate} %{TIME:logTime} !%{SPACE}%{LOGLEVEL:logLevel}%{SPACE}! %{GREEDYDATA:message}"}

1.     Remove trailing white space in logstash filter

Approach 1: using something like NOTSPACE instead of GREEDYDATA.

For log,
[24/Oct/2017 15:04:53 ] cluster       WARNING Picking RM HA: ha

[%{MONTHDAY:logDate}/%{MONTH:logMonth}/%{YEAR:logYear} %{TIME:logTime} ]%{SPACE}%{GREEDYDATA:platformType} +\s %{SPACE}%{LOGLEVEL:logLevel}%{SPACE}%{GREEDYDATA:message}

The above filter leads to trailing white space for cluster    platformType.

\[%{MONTHDAY:logDate}/%{MONTH:logMonth}/%{YEAR:logYear} %{TIME:logTime} \]%{SPACE}%{NOTSPACE:platformType}%{SPACE}%{LOGLEVEL:logLevel}%{SPACE}%{GREEDYDATA:message}

GREEDYDATA when replaced with NOTSPACE resolves this Issue.


Place after grok to strip whitespaces,

              mutate {
                    strip => ["platformType"]



  1. Thanks for sharing the more valuable information to share with us. For more information please visit our website. Hadoop Training in Ameerpet


  2. Title:
    Grab Oracle Certification in Chennai | Infycle Technologies

    Want to get Oracle Certification with the job opportunities? Infycle is with you for this! Infycle Technologies gives the most trustworthy training for the Oracle Certification in Chennai, which will be guided by professional tutors in the field. Along with that, the mock interviews will be assigned for the candidates, so that, they can meet the job interviews with full confidence. To transform your career to the next level, call 7502633633 to Infycle Technologies and grab a free demo to get more.
    Best traaiining in Chennia

  3. Set your career goal towards Oracle for a wealthy future with Infycle. Infycle Technologies is the best software training institute in Chennai, which gives the most trusted and best Oracle DBA Training in Chennai with hands-on practical training that will be guided by professional tutors in the field. In addition to this, the mock interviews will be given to the candidates, so that they can face the interviews with full confidence. Apart from all, the candidates will be placed in the top MNC's with a great salary package. To get it all, call 7502633633 and make this happen for your happy life.

  4. Family hear seat maintain section different. Item product economic above both. House your fire truth
