Installation
& Configuration of
NiFi
on AWS-EMR cluster
Author
|
Date
|
Version
|
Devender
Chauhan
|
14/06/2018
|
1
|
This
documentation describes the process required to install and configure NIFI
application which helps the user to set up the environment correctly.
Apache NiFi is an easy to use, powerful, and reliable system to
process and distribute data.
Apache NiFi is based on technology previously called “Niagara
Files” that was in development and used at scale within the NSA for the last 8
years and was made available to the Apache Software Foundation through the NSA
Technology Transfer Program. Some of the use cases include, but are not limited
to:
·
Big Data Ingest– Offers a simple, reliable and secure way to
collect data streams.
·
IoAT Optimization– Allows organizations to overcome real world
constraints such as limited or expensive bandwidth while ensuring data quality
and reliability.
·
Compliance– Enables organizations to understand everything that
happens to data in motion from its creation to its final resting place, which
is particularly important for regulated industries that must retain and report
on chain of custody.
·
Digital Security– Helps organizations collect large volumes of
data from many sources and prioritize which data is brought back for analysis
first, a critical capability given the time sensitivity of identifying security
breache
=> Create EMR cluster with 1 master and 2 datanodes :
=> Here is the current configuration of our EMR cluster :
1 MasterNode : IP : 172.30.0.240,
Hostname : ip-172-30-0-240.ec2.internal
2 DataNodes : IP
: 172.30.0.67, Hostname : ip-172-30-0-67.ec2.internal
IP : 172.30.0.10, Hostname : ip-172-30-0-10.ec2.internal
·
Operating system :
◦
Centos/RHEL 6/7 or Amazon Linux
·
Java : 1.7 or higher version.
Run below
command on all nodes to install java (By Default it’s installed on EMR
Machines).
Command :
#yum
install -y java-1.8.0-openjdk java-1.8.0-openjdk-devel
Here
we are installing NIFI on 3 machines namely :-
172.30.0.240 ----- ip-172-30-0-240.ec2.internal ----- Master Node
172.30.0.67 ------ ip-172-30-0-67.ec2.internal -----
Slave Node1
172.30.0.10 ------ ip-172-30-0-10.ec2.internal
----- Slave Node2
·
Passwordless SSH on all nodes of NIFI
cluster using root user.
Generate
Public key and Private key using below command on both nodes on NIFI cluster.
#ssh-keygen -t dsa (Press Enter)
[root@ip-172-30-0-240 .ssh]# ssh-keygen -t dsa
Generating public/private dsa key pair.
Enter file in which to save the key (/root/.ssh/id_dsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_dsa.
Your public key has been saved in /root/.ssh/id_dsa.pub.
The key fingerprint is:
SHA256:mqpkzsLOUxJomwoqcGOzLAMwW9ccYUB69mScgaeIkdk
root@ip-172-30-0-240.ec2.internal
The key's randomart image is:
+---[DSA 1024]----+
| + .oo+. |
|+ E..ooo |
|.o..o==. |
|=oooo+o |
|o++. . S |
|=+=. o |
|B+=+ o |
|OBo . |
|o*=.. |
+----[SHA256]-----+
Now
private and public key is generated for root user in /root/.ssh directory.
Upload public key of 172.30.0.240 into
authorized_keys of 172.30.0.240 (on same node) using below command.
#cat /root/.ssh/id_rsa.pub >> /root/.ssh/authorized_keys
Use SSH
from node 172.30.0.240 and upload new generated public key (id_rsa.pub) on slave nodes
(172.30.0.67 & 172.30.0.10) under root's .ssh directory as a file
name authorized_keys
#cat .ssh/id_rsa.pub | ssh 172.30.0.67 'cat >>
.ssh/authorized_keys'
#cat .ssh/id_rsa.pub | ssh 172.30.0.10 'cat >>
.ssh/authorized_keys'
·
Disable Firewall :
Do
the below changes on all nodes :
#service iptables stop
·
Set Hostname :
#hostname ip-172-30-0-240.ec2.internal (Set on masternode1)
#vim /etc/sysconfig/network
NETWORKING=yes
HOSTNAME=ip-172-30-0-240.ec2.internal
NOZEROCONF=yes
##hostname ip-172-30-0-67.ec2.internal (Set on Slavenode1)
#vim /etc/sysconfig/network
NETWORKING=yes
HOSTNAME=ip-172-30-0-67.ec2.internal
NOZEROCONF=yes
#hostname ip-172-30-0-10.ec2.internal (Set on Slavenode2)
#vim /etc/sysconfig/network
NETWORKING=yes
HOSTNAME=ip-172-30-0-10.ec2.internal
NOZEROCONF=yes
·
Set Swappiness
Change
default Swappiness on all nodes :
#sysctl vm.swappiness=10
#echo "vm.swappiness = 10" >> /etc/sysctl.conf
·
Set /etc/hosts file :
[root@ip-172-30-0-240 .ssh]# cat
/etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4
localhost4.localdomain4
::1 localhost6 localhost6.localdomain6
172.30.0.240
ip-172-30-0-240.ec2.internal ip-172-30-0-240
172.30.0.67 ip-172-30-0-67.ec2.internal ip-172-30-0-67
172.30.0.10 ip-172-30-0-10.ec2.internal ip-172-30-0-10
=> scp /etc/hosts ip-172-30-0-67.ec2.internal:/etc/
=> scp /etc/hosts ip-172-30-0-10.ec2.internal:/etc/
On
all nodes :
To
download tar file we need to install wget command if it's not available on
system.
Command:
#yum
install -y wget
#cd /opt
#wget http://public-repo-1.hortonworks.com/HDF/2.0.1.0/HDF-2.0.1.0-12.tar.gz
It will
download HDF-2.0.1.0-12.tar.gz file, now we need to extract this file.
#tar
-xzf HDF-2.0.1.0-12.tar.gz
#ls
HDF-2.0.1.0
#cd /opt/HDF-2.0.1.0/nifi/
we need to
do below configuration for nifi cluster in nifi.properties file.
#vim
conf/nifi.properties
(Do below changes in nifi.properties file)
Below
changes need to do in #web properties# section
# web
properties#
nifi.web.http.host=ip-172-30-0-240.ec2.internal
nifi.web.http.port=7070
Below
changes need to do in #Cluster node properties# section
# Cluster
node properties (only configure for cluster nodes) #
nifi.cluster.is.node=true
nifi.cluster.node.address=ip-172-30-0-240.ec2.internal
nifi.cluster.node.protocol.port=12000
Also add
below two properties in same section
nifi.cluster.node.unicast.manager.address=ip-172-30-0-240.ec2.internal
nifi.cluster.node.unicast.manager.protocol.port=12001
Add
#Cluster Manager properties# section after this.
# Cluster
Manager properties#
nifi.cluster.is.manager=true
nifi.cluster.manager.address=ip-172-30-0-240.ec2.internal
nifi.cluster.manager.protocol.port=12001
nifi.cluster.manager.protocol.threads=10
nifi.cluster.manager.node.event.history.size=25
nifi.cluster.manager.node.api.connection.timeout=5
sec
nifi.cluster.manager.node.api.read.timeout=5
sec
nifi.cluster.manager.node.firewall.file=
nifi.cluster.flow.election.max.wait.time=5
mins
nifi.cluster.flow.election.max.candidates=
Below
changes need to do in #zookeeper properties# section
#zookeeper
properties, used for cluster management#
nifi.zookeeper.connect.string=ip-172-30-0-240.ec2.internal:2181,ip-172-30-0-67.ec2.internal:2181,ip-172-30-0-10.ec2.internal:2181
we need to
do below configuration for nifi cluster in state-management.xml file
#vim
conf/state-management.xml
(add the highlighted text in state-management.xml file)
<cluster-provider>
<id>zk-provider</id>
<class>org.apache.nifi.controller.state.providers.zookeeper.ZooKeeperStateProvider</class>
<property name="Connect
String">ip-172-30-0-240.ec2.internal:2181,ip-172-30-0-67.ec2.internal:2181,ip-172-30-0-10.ec2.internal:2181</property>
<property name="Root
Node">/nifi</property>
<property name="Session
Timeout">10 seconds</property>
<property name="Access
Control">Open</property>
</cluster-provider>
4.
Configuration
of NIFI on All Slavenodes (ip-172-30-0-67.ec2.internal &
ip-172-30-0-10.ec2.internal)
Run
below command from MasterNode to Slavenode1 : (ip-172-30-0-67.ec2.internal)
Copy Nifi
directory from ip-172-30-0-240.ec2.internal to ip-172-30-0-67.ec2.internal
#scp -r
/opt/HDF-2.0.1.0/nifi/* ip-172-30-0-67.ec2.internal:/opt/HDF-2.0.1.0/nifi/
Now
login on Slavenmode1 using root user and do the following nifi configuration :
we need to
do below configuration for nifi cluster in nifi.properties file.
#cd /opt/HDF-2.0.1.0/nifi/
#vim
conf/nifi.properties
(Do below changes in nifi.properties file)
Below
changes need to do in #web properties# section
# web
properties#
nifi.web.http.host=ip-172-30-0-67.ec2.internal
nifi.web.http.port=7070
Below
changes need to do in #Cluster node properties# section
# Cluster
node properties (only configure for cluster nodes) #
nifi.cluster.is.node=true
nifi.cluster.node.address=ip-172-30-0-67.ec2.internal
nifi.cluster.node.protocol.port=12000
nifi.cluster.node.unicast.manager.address=ip-172-30-0-67.ec2.internal
nifi.cluster.node.unicast.manager.protocol.port=12001
Delete
or Comment #Cluster Manager properties#
section in slave node.
# Cluster
Manager properties#
#nifi.cluster.is.manager=true
#nifi.cluster.manager.address=ip-172-30-0-67.ec2.internal
#nifi.cluster.manager.protocol.port=12001
#nifi.cluster.manager.protocol.threads=10
#nifi.cluster.manager.node.event.history.size=25
#nifi.cluster.manager.node.api.connection.timeout=5
sec
#nifi.cluster.manager.node.api.read.timeout=5
sec
#nifi.cluster.manager.node.firewall.file=
#nifi.cluster.flow.election.max.wait.time=5
mins
#nifi.cluster.flow.election.max.candidates=
Below
changes need to do in #zookeeper properties# section
#zookeeper
properties, used for cluster management#
nifi.zookeeper.connect.string=ip-172-30-0-240.ec2.internal:2181,ip-172-30-0-67.ec2.internal:2181,ip-172-30-0-10.ec2.internal:2181
===========
Run
below command from MasterNode to Slavenode2 : ip-172-30-0-10.ec2.internal)
Copy Nifi
directory from ip-172-30-0-240.ec2.internal to ip-172-30-0-10.ec2.internal
#scp -r /opt/HDF-2.0.1.0/nifi/*
ip-172-30-0-10.ec2.internal:/opt/HDF-2.0.1.0/nifi
Now
login on Slavenmode1 using root user and do the following nifi configuration :
we need to
do below configuration for nifi cluster in nifi.properties file.
#cd /opt/HDF-2.0.1.0/nifi/
#vim
conf/nifi.properties
(Do below changes in nifi.properties file)
Below
changes need to do in #web properties# section
# web
properties#
nifi.web.http.host=ip-172-30-0-10.ec2.internal
nifi.web.http.port=7070
Below
changes need to do in #Cluster node properties# section
# Cluster
node properties (only configure for cluster nodes) #
nifi.cluster.is.node=true
nifi.cluster.node.address=ip-172-30-0-10.ec2.internal
nifi.cluster.node.protocol.port=12000
nifi.cluster.node.unicast.manager.address=ip-172-30-0-10.ec2.internal
nifi.cluster.node.unicast.manager.protocol.port=12001
Delete
or Comment #Cluster Manager properties#
section in slave node.
# Cluster
Manager properties#
#nifi.cluster.is.manager=true
#nifi.cluster.manager.address=ip-172-30-0-10.ec2.internal
#nifi.cluster.manager.protocol.port=12001
#nifi.cluster.manager.protocol.threads=10
#nifi.cluster.manager.node.event.history.size=25
#nifi.cluster.manager.node.api.connection.timeout=5
sec
#nifi.cluster.manager.node.api.read.timeout=5
sec
#nifi.cluster.manager.node.firewall.file=
#nifi.cluster.flow.election.max.wait.time=5
mins
#nifi.cluster.flow.election.max.candidates=
Below
changes need to do in #zookeeper properties# section
#zookeeper
properties, used for cluster management#
nifi.zookeeper.connect.string=ip-172-30-0-240.ec2.internal:2181,ip-172-30-0-67.ec2.internal:2181,ip-172-30-0-10.ec2.internal:2181
5.
Start &
Stop NIFI Services on all Nodes
Run below
command on all nodes to start NIFI service
#/opt/HDF-2.0.1.0/nifi/bin/nifi.sh
start
Run below
command check status of NIFI service
#/opt/HDF-2.0.1.0/nifi/bin/nifi.sh
status
6.
Login to NIFI
Application
Open web
browser :