Thursday, November 30, 2017

ELK

·         MySQL => Databases => Tables => Columns/Rows
·         Elasticsearch => Indices => Types => Documents with Properties

Elasticsearch has to store the data somewhere. This functionality is stored into shards, which are either the Primary or Replica

ELK Stack Installation:
ELK stack components being used are:
·         filebeat version 5.5.2
·         logstash 5.5.2
·         elasticsearch 5.5.2
·         kibana 5.5.2
filebeat
Beats needs to be installed on all the host machines from which you want to read your logs.
To get specific version of ELK browse to https://www.elastic.co/downloads/past-releases
Select the appropriate product and version and download the RPM. In the directory execute the sudo yum install filebeat in all the host machines.
sudo chmod 755 filebeat
Logstash
Needs to be installed on the host machine/ edge node. Download RPM and
sudo yum install logstash
To test your installation,
cd /usr/share/logstash/
sudo /usr/share/logstash/bin/logstash -e 'input { stdin { } } output { stdout {} }'
# After starting Logstash, wait until you see "Pipeline main started" and then enter hello world at the command prompt

ElasticSearch
Needs to be installed on the machine which is going to Elasticsearch filesystem. Download RPM and
sudo yum install elasticsearch
To test your installation
curl -XGET 'localhost:9200/?pretty'

Kibana

sudo yum install kibana

vi /etc/kibana/kibana.yml
 edit,enable server.port: and server.host:

sudo service kibana start

To test your installation
Use a browser to open http:[hostname]:5601

Configuration

filebeat
Edit filebeat config file to add the log files to be scanned and shipped to logstash.

###################### Filebeat Configuration Example #########################

# This file is an example configuration file highlighting only the most common
# options. The filebeat.full.yml file from the same directory contains all the
# supported options with more comments. You can use it as a reference.
#
# You can find the full configuration reference here:
# https://www.elastic.co/guide/en/beats/filebeat/index.html

#=========================== Filebeat prospectors =============================

filebeat.prospectors:

# Each - is a prospector. Most options can be set at the prospector level, so
# you can use different prospectors for various configurations.
# Below are the prospector specific configurations.

- input_type: log

  # Paths that should be crawled and fetched. Glob based paths.
  paths:
    #- /home/sraja005/flume.log
    - /var/log/flume-ng/flume-ng-agent.log
  fields:
     log_type: flumeLog

#----------------------------- Logstash output --------------------------------
output.logstash:
  # The Logstash hosts
  hosts: ["tsb1.devlab.motive.com:5044"]


Logstash
Create a logstash configuration file and place it in the folder mentioned below
cd /etc/logstash/conf.d/

#Here is a sample conf file.

vi flumetest.conf
input {
  beats {
    port => "5044"
    codec => multiline {
      # Grok pattern names are valid! :)
      pattern => "^(%{MONTHDAY} %{MONTH} %{YEAR} %{TIME}|%{YEAR}-%{MONTHNUM})"
      negate => true
      what => "previous"
    }
  }

  }

filter {
        if ([fields][log_type] == "flumeLog") {
        grok {
                match => { "message" => "%{MONTHDAY:logDate} %{MONTH:logMonth} %{YEAR:logYear} %{TIME:logTime} %{LOGLEVEL:logLevel} %{GREEDYDATA:message}"}
              }
                }
}
output {
        elasticsearch {
        hosts => [ "localhost:9200" ]
    }
}


Issues and Points:

1.     Source location and Index of the message can be viewed on Message dropdown.
2.     For log with starting content,

[12/Oct/2017 09:05:51 ] supervisor   ERROR    Exception in supervisor main loop

In config file under /etc/logstash/conf.d . Add grok as, where \[ and \] represent []

if ([fields][log_type] == "hueLog") {
        grok {
                match => { "message" => "\[%{MONTHDAY:logDate}/%{MONTH:logMonth}/%{YEAR:logYear} %{TIME:logTime} \] %{LOGLEVEL:logLevel} %{GREEDYDATA:message}"}
              }

Add | \[ in pattern,

pattern => "^(%{MONTHDAY} %{MONTH} %{YEAR} %{TIME}|%{YEAR}-%{MONTHNUM}|\[| )"

Filter for, 17/10/26 13:37:59 ERROR TaskSchedulerImpl: Lost an executor driver (already removed): Executor heartbeat timed out after 239118 ms

if ([fields][log_type] == "sparkLog") {
        grok {
                match => { "message" => "%{YEAR:logYear}/%{MONTHNUM:logMonth}/%{MONTHDAY:logDate} %{TIME:logTime} %{LOGLEVEL:logLevel} %{GREEDYDATA:message}"}
              }
                }

3.     By default, an index would be created for every day. To have the data into a single index add index in output of the logstash config file.
        elasticsearch {
        index => "blogs2"
        hosts => [ "localhost:9200" ]
    }

To have output in multiple indexes,
output {
        elasticsearch {
        index => "blogs2"
        hosts => [ "localhost:9200" ]
    }
        elasticsearch {
        index => "blogs"
        hosts => [ "localhost:9200" ]
    }
}

4.     To list the indexes,
curl 'localhost:9200/_cat/indices?v'

5.     To get info of a particular Index ‘blogs2’
curl -XGET localhost:9200/blogs2

6.     To check the filter of GROK on the text? https://grokdebug.herokuapp.com/

7.     For combining timestamp and date value of the log follow https://stackoverflow.com/questions/40385107/logstash-grok-how-to-parse-timestamp-field-using-httpderror-date-pattern
8.     To see the health of cluster
curl 'localhost:9200/_cluster/health?pretty'

9.     To create new Index, in kibana -> Dev Tools execute the command to create blogs index,
PUT /blogs
{
   "settings" : {
      "number_of_shards" : 3,
      "number_of_replicas" : 1
   }
}
10.  Create Index Patterns,


In Kibana -> Management -> Index Patterns -> Create Index pattern and provide the index name or pattern as ‘blogs*’ -> Create.


Extract values from existing field and create new field in logstash.

2 Approaches for this:

1. copy of source by creating a temp variable,
 if ([fields][log_type] == "yarnHive2kafkaLog") {
    grok {
            match => { "message" => "%{YEAR:logYear}-%{MONTHNUM:logMonth}-%{MONTHDAY:logDate} %{TIME:logTime} \!%{SPACE}%{LOGLEVEL:logLevel}%{SPACE}\! %{GREEDYDATA:message}"}
         }
    mutate {
            copy => { "source" => "source_tmp" }
           }
    mutate {
            split => ["source_tmp", "/"]
            add_field => { "applicationID" => "%{source_tmp[4]}" }
           }                       
            }  
2. grok filter on source
 if ([fields][log_type] == "yarnHive2kafkaLog") {
    grok {
            match => { "message" => "%{YEAR:logYear}-%{MONTHNUM:logMonth}-%{MONTHDAY:logDate} %{TIME:logTime} \!%{SPACE}%{LOGLEVEL:logLevel}%{SPACE}\! %{GREEDYDATA:message}"}
         }
    grok {
            match => { "source" => "/%{GREEDYDATA:primaryDir}/%{GREEDYDATA:subDir1}/%{GREEDYDATA:subDir2}/%{GREEDYDATA:subDir3}/%{GREEDYDATA:containerID}/%{GREEDYDATA:fileName}"}
            }
    mutate {
           add_field => { "applicationID" => "%{subDir3}" }
           }                       
            }


2017-11-15 09:21:06,578 ! ERROR ! [Driver] ! imps.CuratorFrameworkImpl ! Background 
2017-11-20 03:35:17,730 !  WARN ! [Reporter] ! impl.AMRMClientImpl ! ApplicationMaster 

In the above 2 logs the space is not indented in the same manner for ERROR and WARN. To handle this use %{SPACE}, which is equivalent to 0 or many spaces

grok {
match => { "message" => "%{YEAR:logYear}-%{MONTHNUM:logMonth}-%{MONTHDAY:logDate} %{TIME:logTime} !%{SPACE}%{LOGLEVEL:logLevel}%{SPACE}! %{GREEDYDATA:message}"}
}

1.     Remove trailing white space in logstash filter


Approach 1: using something like NOTSPACE instead of GREEDYDATA.

For log,
[24/Oct/2017 15:04:53 ] cluster       WARNING Picking RM HA: ha

[%{MONTHDAY:logDate}/%{MONTH:logMonth}/%{YEAR:logYear} %{TIME:logTime} ]%{SPACE}%{GREEDYDATA:platformType} +\s %{SPACE}%{LOGLEVEL:logLevel}%{SPACE}%{GREEDYDATA:message}

The above filter leads to trailing white space for cluster    platformType.

\[%{MONTHDAY:logDate}/%{MONTH:logMonth}/%{YEAR:logYear} %{TIME:logTime} \]%{SPACE}%{NOTSPACE:platformType}%{SPACE}%{LOGLEVEL:logLevel}%{SPACE}%{GREEDYDATA:message}

GREEDYDATA when replaced with NOTSPACE resolves this Issue.

Approach2:

Place after grok to strip whitespaces,

              mutate {
                    strip => ["platformType"]

                      }

Monday, November 20, 2017

Kafka - Zookeeper configuration

https://www.youtube.com/watch?v=SxHsnNYxcww

Basically there are 2 types of zookeeper configurations:

1. Single Node - Only 1 Zookeeper server - Single point of failure
2. Zookeeper Ensemble - cluster of zookeeper nodes - more robust and no single point of failure.

In Zookeeper Ensemble case, even if one of the zookeeper node is down still the zookeeper can maintain the cluster state because of the remaining zookeeper servers running on other nodes.

Setting Zookeeper Ensemble - Kafka 3 brokers setup.

changes in zookeeper side:

mach1 - in /usr/lib/zookeeper/conf/zoo1.cfg

server.1=mach1:2888:3888
server.2=mach2:2889:3889
server.3=mach3:2890:3890

Here, the above 3 entries will specify the Ensemble of zookeeper servers cluster.

Start zookeeper server with this config file,
zookeeper-server-start.sh config/zoo1.cfg

update the same numbering sequence in "dataDir" property as this will help in numbering the zookeeper ensemble cluser.

clientPort=2181 . This port represents the clients to connect to this zookeeper server.

The entries in the config file would be same in other 2 machines apart from clientPort.
They might have clientPort=2182 and clientPort=2183 respectively.

Changes in kafka side:

mach1 in /opt/kafka-2.11-0.10.1.1/config/server.properties specify the client connection ports of all the 3 zookeeper server in Ensemble

zookeeper.connect=mach1IP:2181,mach2IP:2182,mach3IP:2183

configure the server.properties file all the kafka brokers and remember to have different port port and Brokerids for each of them.

Start kafka with this server.properties file. Now this kafka cluster state is being watched by 3 zookepers Ensemble

Eg: bin/kafka-topics.sh --list --zookeeper xvzw160.xdev.motive.com:2181,xvzw161.xdev.motive.com:2181,xvzw162.xdev.motive.com:2181

/bin/kafka-console-producer --broker-list kafka02.example.com:9092,kafka03.example.com:9092 --topic t

Note: In case of Single node zookeeper the zookeeper.connect will have only 1 zookeeper server entry.