Hadoop and Spark by Leela Prasad: Linux Commands

Nano Text Editor

To create a new file
nano 1.txt

Enter the desired text and press ctrl+x -> Press Y for save changes -> option to change file name and Press Enter.

vi editor

To create a new file use,
touch <FILENAME>

vi test

ESC and press I and now Enter/Modify the text. ->To save press ESC ->:wq!

To find files in Linux

locate "*.png"

sudo locate "*.xml" > 1.txt //to write the results to a file

To delete a Directory and files inside it in Linux

sudo rm -rf cdh5-repository_1.0_all.deb

to remove a file

rm <FILENAME>

To rename a Directory

mv /home/user/oldname /home/user/newname

To get host name

hostname -f

Command to check memory usage on linux

free -m

To give permissions to a Directory

chmod 777 Leela/

Merge 2 files in local
cat quicktechie.txt hadoopexam.txt > MergedEmployee.txt

Adds a newline character at the end

echo "" >> file.txt

To Download a file

wget http://www-eu.apache.org/dist/nifi/1.3.0/nifi-toolkit-1.3.0-bin.tar.gz /home/hadoop_1/Leela/Nifi

To Extract the downloaded file

tar -xzvf nifi-toolkit-1.3.0-bin.tar.gz

To know the memory usage in the machine

free -m

To Free up cached memory

sudo sysctl -w vm.drop_caches=3

File permissions in Linux

Basically there will be 3 sets of users Owner,Group and Others.

read - 4

write - 2

execute - 1

ls -ltr will display the access rights for the 3 sets of users.

eg: -rw-rw-r-- 1 cloudera cloudera 20136 Sep 10 23:41 derby.log

rw to Owner, rw to Group and r to others

sudo chmod 754 <filename>

The above command will grant RWX to Owner, RW to Group and R to Others.

sudo chmod -R 754 Leela_files

//-R will recursively grant the permissions to all the files in the directory

Few Linux commands:

set nu

yum erase logstash

yum clean all

VIM find

/ and type the word to search and press enter. subsequently press n.

ps -ef | grep processname

ps - list processes
-e - show all processes, not just those belonging to the user
-f - show processes in full format (more detailed than default)
command 1 | command 2 - pass output of command 1 as input to command 2
grep find lines containing a pattern
processname - the pattern for grep to search for in the output of ps -ef
So altogether

ps -ef | grep processname

eg: ps -ef | grep logstash

initctl start logstash

tail -50f logstash-plain.log

service elasticsearch start

yum install kibana-5.5.2-x86_64.rpm

To Find specific text in files in a directory(find in files Notepad++)

Searches for 'sqltext' in all the .hql files in /opt directory.

find / -type f | xargs grep 'text-to-find-here'

Eg:

find /opt/ -name "*.hql" | xargs grep -i sqltext

Searches for sqltext in all*.hql files under /opt/hal/installers/hal-batch-jobs/scripts

grep sqltext /opt/hal/installers/hal-batch-jobs/scripts/*.hql

To find a specific file under a primary directory

Below command will search recursively for file with name 'stderr' under directory whose name starts with /data like /data01, /data02,data03 etc.

find /data* -name "stderr"

grep History commands

history | grep "yarn*"

Get the status of the services

service --status-all

Set Proxy in the Linux machine.

Need to set Proxy server name in the below 4 files, this would set the Proxy server and the RPMs would be downloaded via this Proxy. This is useful in cases where the cluster is secured from internet connection and packages for installation are allowed only via Proxy server.

/etc/yum.conf

/etc/environment

/etc/profile.d

/etc/proxy.sh

Get the list of files and their details in the directory

ls -ltr

Get the list of Running processes

ps -A
Search with a specific name tomcat
ps -A | grep tomcat

Kill a running Process

Get pid of the process to be killed using above command. If the process id is 3184, then

kill -9 3184

OR
kill -SIGKILL 3184

Kill with Process name.

pkill mysql

Diff between 2 files in Linux

diff <file1> <file2>

Get CPU Information

lscpu

RAM Info

cat /proc/meminfo

Hard Disk Info

df -h

SCP command to copy a Folder.

The below command will copy all the contents in directory /home/umar/kafkaconsumer/ on remote machine with host name "xvzw160" to local filesystem "/home/gorrepat".
sudo scp -r gorrepat@xvzw160:/home/umar/kafkaconsumer/ /home/gorrepat

In case of file -r is not required.

cp command for copying folder,

cp -r kafkaconsumer/ kafkaconsumer_bk

To know the list of processes running by a particular user 'hadoop'
ps -ef | grep hadoop

The top command allows users to monitor processes and system resource usage on Linux
top

list all processes and their status and resource usage
ps aux

filters hive named directories with ls -lrt

ls -lrt | grep hive

To get List of running Processes:
ps -aef

To Kill a process
kill -9 31767

To search running processes based on a name:
ps -aef|grep rdbms_extract|awk -F" " '{print $2}'

To save the PIDs of running processes based on name(here pids with spark-shell are filtered)
ps -aef|grep spark-shell|awk -F" " '{print $2}' > myProcess.pid

To kill multiple PIDs those are running
ps -aef| grep spark-shell|awk -F" " '{print $2}'|xargs kill -9

Some shell script commands:

Get current dir path of the shell script from which it is triggeted
FILE_PATH="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
echo "FILE_PATH path is" $FILE_PATH

Run query in a HQL file in beeline from shell script
beeline --silent=true --showHeader=false --outputformat=csv2 -u "$(beelineConnection}" -f ${Hive QueryHQLFile}

Pass arguments to HQL file along with the above Hive query in a file
beeline --silent=true --showHeader=false --outputformat=csv2 -u "$(beelineConnection}" -f ${Hive QueryHQLFile} --hivevar hivedbname="rawDB" --hivevar hivestagedb="processedDB"

Inside the HQL file ${hivedbname} and ${hivestagedb} can be used and these will have the values passed.

Hadoop and Spark by Leela Prasad

Thursday, July 6, 2017

Linux Commands